A Mixed Effects Randomized Item Response Model
ERIC Educational Resources Information Center
Fox, J.-P.; Wyrick, Cheryl
2008-01-01
The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…
ERIC Educational Resources Information Center
Crino, Michael D.; And Others
1985-01-01
The random response technique was compared to a direct questionnaire, administered to college students, to investigate whether or not the responses predicted the social desirability of the item. Results suggest support for the hypothesis. A 33-item version of the Marlowe-Crowne Social Desirability Scale which was used is included. (GDC)
Randomized Item Response Theory Models
ERIC Educational Resources Information Center
Fox, Jean-Paul
2005-01-01
The randomized response (RR) technique is often used to obtain answers on sensitive questions. A new method is developed to measure latent variables using the RR technique because direct questioning leads to biased results. Within the RR technique is the probability of the true response modeled by an item response theory (IRT) model. The RR…
Massof, Robert W
2014-10-01
A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Sample Invariance of the Structural Equation Model and the Item Response Model: A Case Study.
ERIC Educational Resources Information Center
Breithaupt, Krista; Zumbo, Bruno D.
2002-01-01
Evaluated the sample invariance of item discrimination statistics in a case study using real data, responses of 10 random samples of 500 people to a depression scale. Results lend some support to the hypothesized superiority of a two-parameter item response model over the common form of structural equation modeling, at least when responses are…
Concise evaluation of decision aids.
Stalmeier, Peep F M; Roosmalen, Marielle S
2009-01-01
Decision aids purport to help patients make treatment related choices. Several instruments exist to evaluate decision aids. Our aim is to compare the responsiveness of several instruments. Two different decision aids were randomized in patients at high risk for breast and ovarian cancer. Treatment choices were between prophylactic surgery and screening. Effect sizes were calculated to compare the responsiveness of the measures. One decision aid was randomized in 390 women, the other in 91 ensuing mutation carriers. Three factors were identified related to Information, Well-being and Decision Making. Within each factor, single item measures were as responsive as multi-item measures. Four single items, 'the amount of information received for decision making,' 'strength of preference,' 'I weighed the pros and cons,' and 'General Health,' were adequately responsive to the decision aids. These items might be considered for inclusion in questionnaires to evaluate decision aids.
Kronberg, J.W.
1993-04-20
An apparatus for selecting at random one item of N items on the average comprising counter and reset elements for counting repeatedly between zero and N, a number selected by the user, a circuit for activating and deactivating the counter, a comparator to determine if the counter stopped at a count of zero, an output to indicate an item has been selected when the count is zero or not selected if the count is not zero. Randomness is provided by having the counter cycle very often while varying the relatively longer duration between activation and deactivation of the count. The passive circuit components of the activating/deactivating circuit and those of the counter are selected for the sensitivity of their response to variations in temperature and other physical characteristics of the environment so that the response time of the circuitry varies. Additionally, the items themselves, which may be people, may vary in shape or the time they press a pushbutton, so that, for example, an ultrasonic beam broken by the item or person passing through it will add to the duration of the count and thus to the randomness of the selection.
Kronberg, James W.
1993-01-01
An apparatus for selecting at random one item of N items on the average comprising counter and reset elements for counting repeatedly between zero and N, a number selected by the user, a circuit for activating and deactivating the counter, a comparator to determine if the counter stopped at a count of zero, an output to indicate an item has been selected when the count is zero or not selected if the count is not zero. Randomness is provided by having the counter cycle very often while varying the relatively longer duration between activation and deactivation of the count. The passive circuit components of the activating/deactivating circuit and those of the counter are selected for the sensitivity of their response to variations in temperature and other physical characteristics of the environment so that the response time of the circuitry varies. Additionally, the items themselves, which may be people, may vary in shape or the time they press a pushbutton, so that, for example, an ultrasonic beam broken by the item or person passing through it will add to the duration of the count and thus to the randomness of the selection.
Optimal Item Selection with Credentialing Examinations.
ERIC Educational Resources Information Center
Hambleton, Ronald K.; And Others
The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…
Empirical Histograms in Item Response Theory with Ordinal Data
ERIC Educational Resources Information Center
Woods, Carol M.
2007-01-01
The purpose of this research is to describe, test, and illustrate a new implementation of the empirical histogram (EH) method for ordinal items. The EH method involves the estimation of item response model parameters simultaneously with the approximation of the distribution of the random latent variable (theta) as a histogram. Software for the EH…
ERIC Educational Resources Information Center
Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem
2016-01-01
The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
What You Don't Know Can Hurt You: Missing Data and Partial Credit Model Estimates
Thomas, Sarah L.; Schmidt, Karen M.; Erbacher, Monica K.; Bergeman, Cindy S.
2017-01-01
The authors investigated the effect of Missing Completely at Random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates. PMID:26784376
Magis, David
2014-11-01
In item response theory, the classical estimators of ability are highly sensitive to response disturbances and can return strongly biased estimates of the true underlying ability level. Robust methods were introduced to lessen the impact of such aberrant responses on the estimation process. The computation of asymptotic (i.e., large-sample) standard errors (ASE) for these robust estimators, however, has not yet been fully considered. This paper focuses on a broad class of robust ability estimators, defined by an appropriate selection of the weight function and the residual measure, for which the ASE is derived from the theory of estimating equations. The maximum likelihood (ML) and the robust estimators, together with their estimated ASEs, are then compared in a simulation study by generating random guessing disturbances. It is concluded that both the estimators and their ASE perform similarly in the absence of random guessing, while the robust estimator and its estimated ASE are less biased and outperform their ML counterparts in the presence of random guessing with large impact on the item response process. © 2013 The British Psychological Society.
Dental responsibility loadings and the relative value of dental services.
Teusner, D N; Ju, X; Brennan, D S
2017-09-01
To estimate responsibility loadings for a comprehensive list of dental services, providing a standardized unit of clinical work effort. Dentists (n = 2500) randomly sampled from the Australian Dental Association membership (2011) were randomly assigned to one of 25 panels. Panels were surveyed by questionnaires eliciting responsibility loadings for eight common dental services (core items) and approximately 12 other items unique to that questionnaire. In total, loadings were elicited for 299 items listed in the Australian Dental Schedule 9th Edition. Data were weighted to reflect the age and sex distribution of the workforce. To assess reliability, regression models assessed differences in core item loadings by panel assignment. Estimated loadings were described by reporting the median and mean. Response rate was 37%. Panel composition did not vary by practitioner characteristics. Core item loadings did not vary by panel assignment. Oral surgery and endodontic service areas had the highest proportion (91%) of services with median loadings ≥1.5, followed by prosthodontics (78%), periodontics (76%), orthodontics (63%), restorative (62%) and diagnostic services (31%). Preventive services had median loadings ≤1.25. Dental responsibility loadings estimated by this study can be applied in the development of relative value scales. © 2017 Australian Dental Association.
A model for incomplete longitudinal multivariate ordinal data.
Liu, Li C
2008-12-30
In studies where multiple outcome items are repeatedly measured over time, missing data often occur. A longitudinal item response theory model is proposed for analysis of multivariate ordinal outcomes that are repeatedly measured. Under the MAR assumption, this model accommodates missing data at any level (missing item at any time point and/or missing time point). It allows for multiple random subject effects and the estimation of item discrimination parameters for the multiple outcome items. The covariates in the model can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is described utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher-scoring solution, which provides standard errors for all model parameters, is used. A data set from a longitudinal prevention study is used to motivate the application of the proposed model. In this study, multiple ordinal items of health behavior are repeatedly measured over time. Because of a planned missing design, subjects answered only two-third of all items at a given point. Copyright 2008 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Huynh, Huynh
By noting that a Rasch or two parameter logistic (2PL) item belongs to the exponential family of random variables and that the probability density function (pdf) of the correct response (X=1) and the incorrect response (X=0) are symmetric with respect to the vertical line at the item location, it is shown that the conjugate prior for ability is…
Extended Mixed-Efects Item Response Models with the MH-RM Algorithm
ERIC Educational Resources Information Center
Chalmers, R. Philip
2015-01-01
A mixed-effects item response theory (IRT) model is presented as a logical extension of the generalized linear mixed-effects modeling approach to formulating explanatory IRT models. Fixed and random coefficients in the extended model are estimated using a Metropolis-Hastings Robbins-Monro (MH-RM) stochastic imputation algorithm to accommodate for…
ERIC Educational Resources Information Center
Hol, A. Michiel; Vorst, Harrie C. M.; Mellenbergh, Gideon J.
2007-01-01
In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible…
Dual representation of item positions in verbal short-term memory: Evidence for two access modes.
Lange, Elke B; Verhaeghen, Paul; Cerella, John
Memory sets of N = 1~5 digits were exposed sequentially from left-to-right across the screen, followed by N recognition probes. Probes had to be compared to memory list items on identity only (Sternberg task) or conditional on list position. Positions were probed randomly or in left-to-right order. Search functions related probe response times to set size. Random probing led to ramped, "Sternbergian" functions whose intercepts were elevated by the location requirement. Sequential probing led to flat search functions-fast responses unaffected by set size. These results suggested that items in STM could be accessed either by a slow search-on-identity followed by recovery of an associated location tag, or in a single step by following item-to-item links in study order. It is argued that this dual coding of location information occurs spontaneously at study, and that either code can be utilised at retrieval depending on test demands.
A mixed-effects regression model for longitudinal multivariate ordinal data.
Liu, Li C; Hedeker, Donald
2006-03-01
A mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multivariate ordinal outcomes in longitudinal studies. This model allows for the estimation of different item factor loadings (item discrimination parameters) for the multiple outcomes. The covariates in the model do not have to follow the proportional odds assumption and can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is proposed utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher scoring solution, which provides standard errors for all model parameters, is used. An analysis of a longitudinal substance use data set, where four items of substance use behavior (cigarette use, alcohol use, marijuana use, and getting drunk or high) are repeatedly measured over time, is used to illustrate application of the proposed model.
Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris
2016-04-01
The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed. © The Author(s) 2015.
ERIC Educational Resources Information Center
Michaelides, Michalis P.; Haertel, Edward H.
2014-01-01
The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…
Free-Response and Multiple-Choice Items: Measures of the Same Ability?
ERIC Educational Resources Information Center
Bennett, Randy Elliot; And Others
This study examined the relationship of multiple-choice and free-response items contained on the College Board's Advanced Placement Computer Science (APCS) examination. Subjects were two samples of 1,000 randomly drawn from the population of 7,372 high school students taking the 1988 examination of the APCS "AB" form. Most were high…
ROYAL, KENNETH D.; STOCKDALE, MYRAH R.
2017-01-01
Introduction: Research has asserted MCQ items using three response options (one correct answer with two distractors) is comparable to, and possibly preferable over, traditional MCQ item formats consisting of four response options (e.g., one correct answer with three distractors), or five response options (e.g., one correct answer with four distractors). Some medical educators have also adopted the practice of using 3-option responses on MCQ exams as a response to the difficulty experienced in generating additional plausible distractors. To date, however, little work has explored how 3-option responses might impact validity threats stemming from random guessing strategies, and what impact 3-option responses might have on cut-score determinations, particularly in the context of medical education classroom assessments. The purpose of this work is to further explore these critically important considerations that largely have gone ignored in the medical education literature to this point. Methods: A cumulative binomial distribution formula was used to calculate the probability that an examinee will answer at random a given number of items correctly on any exam (of any length). By way of a demonstration, a variety of scenarios were presented to illustrate how examination length and the number of response options impact examinees’ chances of passing a given examination, and how subsequent cut-score decisions may be impacted by these factors. Results: As a general rule, classroom assessments containing fewer items should utilize traditional 4-option or 5-option responses, whereas assessments of greater length are afforded greater flexibility in potentially utilizing 3-option responses. Conclusions: More research on items with 3-option responses is needed to better understand what value, if any, 3-option responses truly add to classroom assessments, and in what contexts potential benefits might be discernible. PMID:28367465
Royal, Kenneth D; Stockdale, Myrah R
2017-04-01
Research has asserted MCQ items using three response options (one correct answer with two distractors) is comparable to, and possibly preferable over, traditional MCQ item formats consisting of four response options (e.g., one correct answer with three distractors), or five response options (e.g., one correct answer with four distractors). Some medical educators have also adopted the practice of using 3-option responses on MCQ exams as a response to the difficulty experienced in generating additional plausible distractors. To date, however, little work has explored how 3-option responses might impact validity threats stemming from random guessing strategies, and what impact 3-option responses might have on cut-score determinations, particularly in the context of medical education classroom assessments. The purpose of this work is to further explore these critically important considerations that largely have gone ignored in the medical education literature to this point. A cumulative binomial distribution formula was used to calculate the probability that an examinee will answer at random a given number of items correctly on any exam (of any length). By way of a demonstration, a variety of scenarios were presented to illustrate how examination length and the number of response options impact examinees' chances of passing a given examination, and how subsequent cut-score decisions may be impacted by these factors. As a general rule, classroom assessments containing fewer items should utilize traditional 4-option or 5-option responses, whereas assessments of greater length are afforded greater flexibility in potentially utilizing 3-option responses. More research on items with 3-option responses is needed to better understand what value, if any, 3-option responses truly add to classroom assessments, and in what contexts potential benefits might be discernible.
A Procedure to Detect Item Bias Present Simultaneously in Several Items
1991-04-25
exhibit a coherent and major biasing influence at the test level. In partic- ular, this can be true even if each individual item displays only a minor...response functions (IRFs) without the use of item parameter estimation algorithms when the sample size is too small for their use. Thissen, Steinberg...convention). A random sample of examinees is drawn from each group, and a test of N items is administered to them. Typically it is suspected that a
ERIC Educational Resources Information Center
Bockenholt, Ulf; Van Der Heijden, Peter G. M.
2007-01-01
Randomized response (RR) is a well-known method for measuring sensitive behavior. Yet this method is not often applied because: (i) of its lower efficiency and the resulting need for larger sample sizes which make applications of RR costly; (ii) despite its privacy-protection mechanism the RR design may not be followed by every respondent; and…
ERIC Educational Resources Information Center
Khaksefidi, Saman
2017-01-01
This study investigates the psychological effect of a wrong question with wrong items on answering to the next question in a test of structure. Forty students selected through stratified random sampling are given 15 questions of a standardized test namely a TOEFL structure test in which questions number 7 and number 11 are wrong and their answers…
The Psychometric Properties of Classroom Response System Data: A Case Study
NASA Astrophysics Data System (ADS)
Kortemeyer, Gerd
2016-08-01
Classroom response systems (often referred to as "clickers") have slowly gained adoption over the recent decade; however, critics frequently doubt their pedagogical value starting with the validity of the gathered responses: There is concern that students simply "click" random answers. This case study looks at different measures of response reliability, starting from a global look at correlations between formative clicker responses and summative examination performance to how clicker questions are used in context. It was found that clicker performance is a moderate indicator of course performance as a whole, and that while the psychometric properties of clicker items are more erratic than those of examination data, they still have acceptable internal consistency and include items with high discrimination. It was also found that clicker responses and item properties do provide highly meaningful feedback within a lecture context, i.e., when their position and function within lecture sessions are taken into consideration. Within this framework, conceptual questions provide measurably more meaningful feedback than items that require calculations.
Antrobus, Emma; Elffers, Henk; White, Gentry; Mazerolle, Lorraine
2013-01-01
The goal of this article is to examine whether or not the results of the Queensland Community Engagement Trial (QCET)-a randomized controlled trial that tested the impact of procedural justice policing on citizen attitudes toward police-were affected by different types of nonresponse bias. We use two methods (Cochrane and Elffers methods) to explore nonresponse bias: First, we assess the impact of the low response rate by examining the effects of nonresponse group differences between the experimental and control conditions and pooled variance under different scenarios. Second, we assess the degree to which item response rates are influenced by the control and experimental conditions. Our analysis of the QCET data suggests that our substantive findings are not influenced by the low response rate in the trial. The results are robust even under extreme conditions, and statistical significance of the results would only be compromised in cases where the pooled variance was much larger for the nonresponse group and the difference between experimental and control conditions was greatly diminished. We also find that there were no biases in the item response rates across the experimental and control conditions. RCTs that involve field survey responses-like QCET-are potentially compromised by low response rates and how item response rates might be influenced by the control or experimental conditions. Our results show that the QCET results were not sensitive to the overall low response rate across the experimental and control conditions and the item response rates were not significantly different across the experimental and control groups. Overall, our analysis suggests that the results of QCET are robust and any biases in the survey responses do not significantly influence the main experimental findings.
Bayes Factor Covariance Testing in Item Response Models.
Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip
2017-12-01
Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
Peyre, Hugo; Leplège, Alain; Coste, Joël
2011-03-01
Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias <2%) in all studied situations. Whereas multiple imputation and full information maximum likelihood are confirmed as reference methods, the personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.
Feveile, Helene; Olsen, Ole; Hogh, Annie
2007-01-01
Background Data for health surveys are often collected using either mailed questionnaires, telephone interviews or a combination. Mode of data collection can affect the propensity to refuse to respond and result in different patterns of responses. The objective of this paper is to examine and quantify effects of mode of data collection in health surveys. Methods A stratified sample of 4,000 adults residing in Denmark was randomised to mailed questionnaires or computer-assisted telephone interviews. 45 health-related items were analyzed; four concerning behaviour and 41 concerning self assessment. Odds ratios for more positive answers and more frequent use of extreme response categories (both positive and negative) among telephone respondents compared to questionnaire respondents were estimated. Tests were Bonferroni corrected. Results For the four health behaviour items there were no significant differences in the response patterns. For 32 of the 41 health self assessment items the response pattern was statistically significantly different and extreme response categories were used more frequently among telephone respondents (Median estimated odds ratio: 1.67). For a majority of these mode sensitive items (26/32), a more positive reporting was observed among telephone respondents (Median estimated odds ratio: 1.73). The overall response rate was similar among persons randomly assigned to questionnaires (58.1%) and to telephone interviews (56.2%). A differential nonresponse bias for age and gender was observed. The rate of missing responses was higher for questionnaires (0.73 – 6.00%) than for telephone interviews (0 – 0.51%). The "don't know" option was used more often by mail respondents (10 – 24%) than by telephone respondents (2 – 4%). Conclusion The mode of data collection affects the reporting of self assessed health items substantially. In epidemiological studies, the method effect may be as large as the effects under investigation. Caution is needed when comparing prevalences across surveys or when studying time trends. PMID:17592653
ERIC Educational Resources Information Center
Goodwin, Amanda P.; Gilbert, Jennifer K.; Cho, Sun-Joo; Kearns, Devin M.
2014-01-01
The current study models reader, item, and word contributions to the lexical representations of 39 morphologically complex words for 172 middle school students using a crossed random-effects item response model with multiple outcomes. We report 3 findings. First, results suggest that lexical representations can be characterized by separate but…
de Jong, Martijn G; Pieters, Rik; Stremersch, Stefan
2012-09-01
Answers to sensitive questions are prone to social desirability bias. If not properly addressed, the validity of the research can be suspect. This article presents multigroup item randomized response theory (MIRRT) to measure self-reported sensitive topics across cultures. The method was specifically developed to reduce social desirability bias by making an a priori change in the design of the survey. The change involves the use of a randomization device (e.g., a die) that preserves participants' privacy at the item level. In cases where multiple items measure a higher level theoretical construct, the researcher could still make inferences at the individual level. The method can correct for under- and overreporting, even if both occur in a sample of individuals or across nations. We present and illustrate MIRRT in a nontechnical manner, provide WinBugs software code so that researchers can directly implement it, and present 2 cross-national studies in which it was applied. The first study compared nonstudent samples from 2 countries (total n = 927) on permissive sexual attitudes and risky sexual behavior and related these to individual-level characteristics such as the Big Five personality traits. The second study compared nonstudent samples from 17 countries (total n = 6,195) on risky sexual behavior and related these to individual-level characteristics, such as gender and age, and to country-level characteristics, such as sex ratio.
Exploring the Validity of the Affect Balance Scale With a Sample of Family Caregivers
Perkinson, Margaret A.; Albert, Steven M.; Luborsky, Mark; Moss, Miriam; Glicksman, Allen
2014-01-01
Open-ended responses of caregiving daughters and daughters-in-law were generated by a modified random probe technique to investigate the construct validity of the two subscales of the Affect Balance Scale (ABS), i.e., the 5-item Positive Affect Scale (PAS) and the 5-item Negative Affect Scale (NAS). A set of criteria were developed to distinguish between responses that did and did not correspond to Bradburn’s assumptions concerning affect. While most responses met at least one of the criteria, very few met all. In exploring the nature of affect, we found that positive affect was based to a large extent on personal accomplishments and the recognition of others. The assessment of negative affect was a more interior, or self-focused process. For a significant subset of the sample, a negative response to a closed-ended PAS or NAS item implied disagreement or discontent with the wording or the implications of the item itself, rather than an absence of affect. Not all of the ABS items were equally valid measures of affect. PMID:8056955
Multilevel and Latent Variable Modeling with Composite Links and Exploded Likelihoods
ERIC Educational Resources Information Center
Rabe-Hesketh, Sophia; Skrondal, Anders
2007-01-01
Composite links and exploded likelihoods are powerful yet simple tools for specifying a wide range of latent variable models. Applications considered include survival or duration models, models for rankings, small area estimation with census information, models for ordinal responses, item response models with guessing, randomized response models,…
Controlling for Response Order Effects in Ranking Items Using Latent Choice Factor Modeling
ERIC Educational Resources Information Center
Vriens, Ingrid; Moors, Guy; Gelissen, John; Vermunt, Jeroen K.
2017-01-01
Measuring values in sociological research sometimes involves the use of ranking data. A disadvantage of a ranking assignment is that the order in which the items are presented might influence the choice preferences of respondents regardless of the content being measured. The standard procedure to rule out such effects is to randomize the order of…
NASA Astrophysics Data System (ADS)
Gönülateş, Emre; Kortemeyer, Gerd
2017-04-01
Homework is an important component of most physics courses. One of the functions it serves is to provide meaningful formative assessment in preparation for examinations. However, correlations between homework and examination scores tend to be low, likely due to unproductive student behavior such as copying and random guessing of answers. In this study, we attempt to model these two counterproductive learner behaviors within the framework of Item Response Theory in order to provide an ability measurement that strongly correlates with examination scores. We find that introducing additional item parameters leads to worse predictions of examination grades, while introducing additional learner traits is a more promising approach.
Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Yutaka, Ono; Furukawa, Toshiaki A.
2017-01-01
Background Several recent studies have shown that total scores on depressive symptom measures in a general population approximate an exponential pattern except for the lower end of the distribution. Furthermore, we confirmed that the exponential pattern is present for the individual item responses on the Center for Epidemiologic Studies Depression Scale (CES-D). To confirm the reproducibility of such findings, we investigated the total score distribution and item responses of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative study. Methods Data were drawn from the National Survey of Midlife Development in the United States (MIDUS), which comprises four subsamples: (1) a national random digit dialing (RDD) sample, (2) oversamples from five metropolitan areas, (3) siblings of individuals from the RDD sample, and (4) a national RDD sample of twin pairs. K6 items are scored using a 5-point scale: “none of the time,” “a little of the time,” “some of the time,” “most of the time,” and “all of the time.” The pattern of total score distribution and item responses were analyzed using graphical analysis and exponential regression model. Results The total score distributions of the four subsamples exhibited an exponential pattern with similar rate parameters. The item responses of the K6 approximated a linear pattern from “a little of the time” to “all of the time” on log-normal scales, while “none of the time” response was not related to this exponential pattern. Discussion The total score distribution and item responses of the K6 showed exponential patterns, consistent with other depressive symptom scales. PMID:28289560
Rasch analysis of the Chedoke-McMaster Attitudes towards Children with Handicaps scale.
Armstrong, Megan; Morris, Christopher; Tarrant, Mark; Abraham, Charles; Horton, Mike C
2017-02-01
Aim To assess whether the Chedoke-McMaster Attitudes towards Children with Handicaps (CATCH) 36-item total scale and subscales fit the unidimensional Rasch model. Method The CATCH was administered to 1881 children, aged 7-16 years in a cross-sectional survey. Data were used from a random sample of 416 for the initial Rasch analysis. The analysis was performed on the 36-item scale and then separately for each subscale. The analysis explored fit to the Rasch model in terms of overall scale fit, individual item fit, item response categories, and unidimensionality. Item bias for gender and school level was also assessed. Revised scales were then tested on an independent second random sample of 415 children. Results Analyses indicated that the 36-item overall scale was not unidimensional and did not fit the Rasch model. Two scales of affective attitudes and behavioural intention were retained after four items were removed from each due to misfit to the Rasch model. Additionally, the scaling was improved when the two most negative response categories were aggregated. There was no item bias by gender or school level on the revised scales. Items assessing cognitive attitudes did not fit the Rasch model and had low internal consistency as a scale. Conclusion Affective attitudes and behavioural intention CATCH sub-scales should be treated separately. Caution should be exercised when using the cognitive subscale. Implications for Rehabilitation The 36-item Chedoke-McMaster Attitudes towards Children with Handicaps (CATCH) scale as a whole did not fit the Rasch model; thus indicating a multi-dimensional scale. Researchers should use two revised eight-item subscales of affective attitudes and behavioural intentions when exploring interventions aiming to improve children's attitudes towards disabled people or factors associated with those attitudes. Researchers should use the cognitive subscale with caution, as it did not create a unidimensional and internally consistent scale. Therefore, conclusions drawn from this scale may not accurately reflect children's attitudes.
Chan, Derwin K; Ivarsson, Andreas; Stenling, Andreas; Yang, Sophie X; Chatzisarantis, Nikos L; Hagger, Martin S
2015-12-01
Consistency tendency is characterized by the propensity for participants responding to subsequent items in a survey consistent with their responses to previous items. This method effect might contaminate the results of sport psychology surveys using cross-sectional design. We present a randomized controlled crossover study examining the effect of consistency tendency on the motivational pathway (i.e., autonomy support → autonomous motivation → intention) of self-determination theory in the context of sport injury prevention. Athletes from Sweden (N = 341) responded to the survey printed in either low interitem distance (IID; consistency tendency likely) or high IID (consistency tendency suppressed) on two separate occasions, with a one-week interim period. Participants were randomly allocated into two groups, and they received the survey of different IID at each occasion. Bayesian structural equation modeling showed that low IID condition had stronger parameter estimates than high IID condition, but the differences were not statistically significant.
Cultural Consensus Theory: Aggregating Continuous Responses in a Finite Interval
NASA Astrophysics Data System (ADS)
Batchelder, William H.; Strashny, Alex; Romney, A. Kimball
Cultural consensus theory (CCT) consists of cognitive models for aggregating responses of "informants" to test items about some domain of their shared cultural knowledge. This paper develops a CCT model for items requiring bounded numerical responses, e.g. probability estimates, confidence judgments, or similarity judgments. The model assumes that each item generates a latent random representation in each informant, with mean equal to the consensus answer and variance depending jointly on the informant and the location of the consensus answer. The manifest responses may reflect biases of the informants. Markov Chain Monte Carlo (MCMC) methods were used to estimate the model, and simulation studies validated the approach. The model was applied to an existing cross-cultural dataset involving native Japanese and English speakers judging the similarity of emotion terms. The results sharpened earlier studies that showed that both cultures appear to have very similar cognitive representations of emotion terms.
ERIC Educational Resources Information Center
De Boeck, Paul
2008-01-01
It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters…
Tokuda, Yasuharu; Okubo, Tomoya; Ohde, Sachiko; Jacobs, Joshua; Takahashi, Osamu; Omata, Fumio; Yanai, Haruo; Hinohara, Shigeaki; Fukui, Tsuguya
2009-06-01
The Short Form-8 (SF-8) questionnaire is a commonly used 8-item instrument of health-related quality of life (QOL) and provides a health profile of eight subdimensions. Our aim was to examine the psychometric properties of the Japanese version of the SF-8 instrument using methodology based on nominal categories model. Using data from an adjusted random sample from a nationally representative panel, the nominal categories modeling was applied to SF-8 items to characterize coverage of the latent trait (theta). Probabilities for response choices were described as functions on the latent trait. Information functions were generated based on the estimated item parameters. A total of 3344 participants (53%, women; median age, 35 years) provided responses. One factor was retained (eigenvalue, 4.65; variance proportion of 0.58) and used as theta. All item response category characteristic curves satisfied the monotonicity assumption in accurate order with corresponding ordinal responses. Four items (general health, bodily pain, vitality, and mental health) cover most of the spectrum of theta, while the other four items (physical function, role physical [role limitations because of physical health], social functioning, and role emotional [role limitations because of emotional problems] ) cover most of the negative range of theta. Information function for all items combined peaked at -0.7 of theta (information = 18.5) and decreased with increasing theta. The SF-8 instrument performs well among those with poor QOL across the continuum of the latent trait and thus can recognize more effectively persons with relatively poorer QOL than those with relatively better QOL.
ERIC Educational Resources Information Center
Huang, Hung-Yu; Wang, Wen-Chung
2014-01-01
The DINA (deterministic input, noisy, and gate) model has been widely used in cognitive diagnosis tests and in the process of test development. The outcomes known as slip and guess are included in the DINA model function representing the responses to the items. This study aimed to extend the DINA model by using the random-effect approach to allow…
A Composite Likelihood Inference in Latent Variable Models for Ordinal Longitudinal Responses
ERIC Educational Resources Information Center
Vasdekis, Vassilis G. S.; Cagnone, Silvia; Moustaki, Irini
2012-01-01
The paper proposes a composite likelihood estimation approach that uses bivariate instead of multivariate marginal probabilities for ordinal longitudinal responses using a latent variable model. The model considers time-dependent latent variables and item-specific random effects to be accountable for the interdependencies of the multivariate…
Failure of self-consistency in the discrete resource model of visual working memory.
Bays, Paul M
2018-06-03
The discrete resource model of working memory proposes that each individual has a fixed upper limit on the number of items they can store at one time, due to division of memory into a few independent "slots". According to this model, responses on short-term memory tasks consist of a mixture of noisy recall (when the tested item is in memory) and random guessing (when the item is not in memory). This provides two opportunities to estimate capacity for each observer: first, based on their frequency of random guesses, and second, based on the set size at which the variability of stored items reaches a plateau. The discrete resource model makes the simple prediction that these two estimates will coincide. Data from eight published visual working memory experiments provide strong evidence against such a correspondence. These results present a challenge for discrete models of working memory that impose a fixed capacity limit. Copyright © 2018 The Author. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Montague, Margariete A.
This study investigated the feasibility of concurrently and randomly sampling examinees and items in order to estimate group achievement. Seven 32-item tests reflecting a 640-item universe of simple open sentences were used such that item selection (random, systematic) and assignment (random, systematic) of items (four, eight, sixteen) to forms…
Kundig, François; Staines, Anthony; Kinge, Thompson; Perneger, Thomas V
2011-11-01
In self-completed surveys, anonymous questionnaires are sometimes numbered so as to avoid sending reminders to initial nonrespondents. This number may be perceived as a threat to confidentiality by some respondents, which may reduce the response rate, or cause social desirability bias. In this study, we evaluated whether using nonnumbered vs. numbered questionnaires influenced the response rate and the response content. During a patient safety culture survey, we randomized participants into two groups: one received an anonymous nonnumbered questionnaire and the other a numbered questionnaire. We compared the survey response rates and distributions of the responses for the 42-questionnaire items across the two groups. Response rates were similar in the two groups (nonnumbered, 75.2%; numbered, 72.8%; difference, 2.4%; P=0.28). Five of the 42 questions had statistically significant differences in distributions, but these differences were small. Unexpectedly, in all five instances, the patient safety culture ratings were more favorable in the nonnumbered group. Numbering of mailed questionnaires had no impact on the response rate. Numbering influenced significantly the response content of several items, but these differences were small and ran against the hypothesis of social desirability bias. Copyright © 2011 Elsevier Inc. All rights reserved.
Improving response rate and quality of survey data with a scratch lottery ticket incentive
2012-01-01
Background The quality of data collected in survey research is usually indicated by the response rate; the representativeness of the sample, and; the rate of completed questions (item-response). In attempting to improve a generally declining response rate in surveys considerable efforts are being made through follow-up mailings and various types of incentives. This study examines effects of including a scratch lottery ticket in the invitation letter to a survey. Method Questionnaires concerning oral health were mailed to a random sample of 2,400 adults. A systematically selected half of the sample (1,200 adults) received a questionnaire including a scratch lottery ticket. One reminder without the incentive was sent. Results The incentive increased the response rate and improved representativeness by reaching more respondents with lower education. Furthermore, it reduced item nonresponse. The initial incentive had no effect on the propensity to respond after the reminder. Conclusion When attempting to improve survey data, three issues become important: response rate, representativeness, and item-response. This study shows that including a scratch lottery ticket in the invitation letter performs well on all the three. PMID:22515335
Identifying and addressing the limitations of safety climate surveys.
O'Connor, Paul; Buttrey, Samuel E; O'Dea, Angela; Kennedy, Quinn
2011-08-01
There are a variety of qualitative and quantitative tools for measuring safety climate. However, questionnaires are by far the most commonly used methodology. This paper reports the descriptive analysis of a large sample of safety climate survey data (n=110,014) collected over 10 years from U.S. Naval aircrew using the Command Safety Assessment Survey (CSAS). The analysis demonstrated that there was substantial non-random response bias associated with the data (the reverse worded items had a unique pattern of responses, there was a increasing tendency over time to only provide a modal response, the responses to the same item towards the beginning and end of the questionnaire did not correlate as highly as might be expected, and the faster the questionnaire was completed the higher the frequency of modal responses). It is suggested that the non-random responses bias was due to the negative effect on participant motivation of a number of factors (questionnaire design, lack of a belief in the importance of the response, participant fatigue, and questionnaire administration). Researchers must consider the factors that increase the likelihood of non-random measurement error in safety climate survey data and cease to rely on data that are solely collected using a long and complex questionnaire. In the absence of valid and reliable data it will not be possible for organizations to take the measures required to improve safety climate. Copyright © 2011 Elsevier B.V. All rights reserved.
Guyatt, G H; Cook, D J; King, D; Norman, G R; Kane, S L; van Ineveld, C
1999-02-01
To determine whether framing questions positively or negatively influences residents' apparent satisfaction with their training. In 1993-94, 276 residents at five Canadian internal medicine residency programs responded to 53 Likert-scale items designed to determine sources of the residents' satisfaction and stress. Two versions of the questionnaire were randomly distributed: one in which half the items were stated positively and the other half negatively, the other version in which the items were stated in the opposite way. The residents scored 43 of the 53 items higher when stated positively and scored ten higher when stated negatively (p < .0001). When analyzed using an analysis-of-variance model, the effect of positive versus negative framing was highly significant (F = 129.81, p < .0001). While the interaction between item and framing was also significant, the effect was much less strong (F = 5.56, p < .0001). On a scale where 1 represented the lowest possible level of satisfaction and 7 the highest, the mean score of the positively stated items was 4.1 and that of the negatively stated items, 3.8, an effect of 0.3. These results suggest a significant "response acquiescence bias." To minimize this bias, questionnaires assessing attitudes toward educational programs should include a mix of positively and negatively stated items.
Speech-Language Pathologists' Opinions on Response to Intervention
ERIC Educational Resources Information Center
Sanger, Dixie; Mohling, Sara; Stremlau, Aliza
2012-01-01
The purpose of this study was to survey the opinions of speech-language pathologists (SLPs) on response to intervention (RTI). Questionnaires were mailed to 2,000 randomly selected elementary and secondary SLPs throughout the United States. Mean results of 583 respondents (29.15%) indicated that SLPs agreed on 37 Likert-type items and responded…
A signal detection-item response theory model for evaluating neuropsychological measures.
Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G
2018-02-05
Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
A Test of Web and Mail Mode Effects in a Financially Sensitive Survey of Older Americans
Hsu, Joanne W.
2018-01-01
This study leverages a randomized experimental design of a mixed-mode mail- and web-based survey to examine mode effects separately from sample selectivity issues. Using data from the Cognitive Economics Study, which contains some sensitive financial questions, we analyze two sets of questions: fixed-choice questions posed nearly identically across mode, and dollar-value questions that exploit features available only on web mode. Focusing on differences in item nonresponse and response distributions, our results indicate that, in contrast to mail mode, web mode surveys display lower item nonresponse for all questions. While respondents appear to prefer providing financial information in ranges, use of reminder screens on the web version yields greater use of exact values without large sacrifices in item response. Still, response distributions for all questions are similar across mode, suggesting that data on sensitive financial questions collected from the two modes can be pooled.
Clarifying and Measuring Filial Concepts across Five Cultural Groups
Jones, Patricia S.; Lee, Jerry W.; Zhang, Xinwei E.
2011-01-01
Literature on responsibility of adult children for aging parents reflects lack of conceptual clarity. We examined filial concepts across five cultural groups: African-, Asian-, Euro-, Latino-, and Native Americans. Data were randomly divided for scale development (n = 285) and cross-validation (n = 284). Exploratory factor analysis on 59 items identified three filial concepts: Responsibility, Respect, and Care. Confirmatory factor analysis on a 12-item final scale showed data fit the three-factor model better than the single factor solution despite substantial correlations between the factors (.82, .82 for Care with Responsibility and Respect, and .74 for Responsibility with Respect). The scale can be used in cross-cultural research to test hypotheses that predict associations among filial values, filial caregiving, and caregiver health outcomes. PMID:21618557
Strategic trade-offs between quantity and quality in working memory.
Fougnie, Daryl; Cormiea, Sarah M; Kanabar, Anish; Alvarez, George A
2016-08-01
Is working memory capacity determined by an immutable limit-for example, 4 memory storage slots? The fact that performance is typically unaffected by task instructions has been taken as support for such structural models of memory. Here, we modified a standard working memory task to incentivize participants to remember more items. Participants were asked to remember a set of colors over a short retention interval. In 1 condition, participants reported a random item's color using a color wheel. In the modified task, participants responded to all items and their response was only considered correct if all responses were on the correct half of the color wheel. We looked for a trade-off between quantity and quality-participants storing more items, but less precisely, when required to report them all. This trade-off was observed when tasks were blocked and when task-type was cued after encoding, but not when task-type was cued during the response, suggesting that task differences changed how items were actively encoded and maintained. This strategic control over the contents of working memory challenges models that assume inflexible limits on memory storage. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Tsubono, Y; Fukao, A; Hisamichi, S
1994-06-01
A self-administered questionnaire using the mark-sheet method (MSM), in which responses of subjects are computer processed directly through an optical scanning device, has recently been utilized in epidemiologic surveys. Compared to the data coding process for a conventional questionnaire, in which a keypuncher enters the responses manually into a computer (manual method; MM), optical scanning requires less time and cost. Accuracy of the MSM for use in the general population in Japan, however, remains uncertain. Therefore the response rates, frequencies of missing values, validity and reproducibility of the answers in self-administered questionnaires were compared between the MSM and MM. Subjects were 463 residents aged 40-69 years living in 6 local districts of a rural town in northeastern Japan. They were randomly allocated, by district basis, to the MSM group (n = 242) or the MM group (n = 221). The questionnaire was delivered and collected at the subject's home by volunteers. Two weeks after collecting the original questionnaire, the same type of questionnaire was again distributed to half of the responders randomly chosen to investigate reproducibility. The overall response rate did not differ in MSM and MM (96.7% vs 98.2%, p = 0.312). Among questions with a multiple-choice type of answer, proportions of missing values were not different for most of the items, but it was lower in MSM for all of the 33 food frequency items. Reproducibilities of food frequency items measured by Spearman's rank correlation did not differ substantially in two groups.(ABSTRACT TRUNCATED AT 250 WORDS)
Improved uncertainty quantification in nondestructive assay for nonproliferation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burr, Tom; Croft, Stephen; Jarman, Ken
2016-12-01
This paper illustrates methods to improve uncertainty quantification (UQ) for non-destructive assay (NDA) measurements used in nuclear nonproliferation. First, it is shown that current bottom-up UQ applied to calibration data is not always adequate, for three main reasons: (1) Because there are errors in both the predictors and the response, calibration involves a ratio of random quantities, and calibration data sets in NDA usually consist of only a modest number of samples (3–10); therefore, asymptotic approximations involving quantities needed for UQ such as means and variances are often not sufficiently accurate; (2) Common practice overlooks that calibration implies a partitioningmore » of total error into random and systematic error, and (3) In many NDA applications, test items exhibit non-negligible departures in physical properties from calibration items, so model-based adjustments are used, but item-specific bias remains in some data. Therefore, improved bottom-up UQ using calibration data should predict the typical magnitude of item-specific bias, and the suggestion is to do so by including sources of item-specific bias in synthetic calibration data that is generated using a combination of modeling and real calibration data. Second, for measurements of the same nuclear material item by both the facility operator and international inspectors, current empirical (top-down) UQ is described for estimating operator and inspector systematic and random error variance components. A Bayesian alternative is introduced that easily accommodates constraints on variance components, and is more robust than current top-down methods to the underlying measurement error distributions.« less
Strategic trade-offs between quality and quantity in working memory
Fougnie, Daryl; Cormiea, Sarah M.; Kanabar, Anish; Alvarez, George A.
2016-01-01
Is working memory capacity determined by an immutable limit—e.g. four memory storage slots? The fact that performance is typically unaffected by task instructions has been taken as support for such structural models of memory. Here, we modified a standard working memory task to incentivize participants to remember more items. Participants were asked to remember a set of colors over a short retention interval. In one condition, participants reported a random item’s color using a color wheel. In the modified task, participants responded to all items and their response was only considered correct if all responses were on the correct half of the color wheel. We looked for a trade-off between quantity and quality—participants storing more items, but less precisely, when required to report them all. This trade-off was observed when tasks were blocked, when task-type was cued after encoding, but not when task-type was cued during the response, suggesting that task differences changed how items were actively encoded and maintained. This strategic control over the contents of working memory challenges models that assume inflexible limits on memory storage. PMID:26950383
Oropharyngeal dysphagia: surveying practice patterns of the speech-language pathologist.
Martino, Rosemary; Pron, Gaylene; Diamant, Nicholas E
2004-01-01
The present study was designed to obtain a comprehensive view of the dysphagia assessment practice patterns of speech-language pathologists and their opinion on the importance of these practices using survey methods and taking into consideration clinician, patient, and practice-setting variables. A self-administered mail questionnaire was developed following established methodology to maximize response rates. Eight dysphagia experts independently rated the new survey for content validity. Test-retest reliability was assessed with a random sample of 23 participants. The survey was sent to 50 speech-language pathologists randomly selected from the Canadian professional association database of members who practice in dysphagia. Surveys were mailed according to the Dillman Total Design Method and included an incentive offer. High survey (64%) and item response (95%) rates were achieved and clinicians were reliable reporters of their practice behaviors (ICC>0.60). Of all the clinical assessment items, 36% were reported with high (>80%) utilization and 24% with low (<20%) utilization, the former pertaining to tongue motion and vocal quality after food/fluid intake and the latter to testing of oral sensation without food. One-third (33%) of instrumental assessment items were highly utilized and included assessment of bolus movement and laryngeal response to bolus misdirection. Overall, clinician experience and teaching institutions influenced greater utilization. Opinions of importance were similar to utilization behaviors (r = 0.947, p = 0.01). Of all patients referred for dysphagia assessment, full clinical assessments were administered to 71% of patients but instrumental assessments to only 36%. A hierarchical model of practice behavior is proposed to explain this pattern of progressively decreasing item utilization.
Generating constrained randomized sequences: item frequency matters.
French, Robert M; Perruchet, Pierre
2009-11-01
All experimental psychologists understand the importance of randomizing lists of items. However, randomization is generally constrained, and these constraints-in particular, not allowing immediately repeated items-which are designed to eliminate particular biases, frequently engender others. We describe a simple Monte Carlo randomization technique that solves a number of these problems. However, in many experimental settings, we are concerned not only with the number and distribution of items but also with the number and distribution of transitions between items. The algorithm mentioned above provides no control over this. We therefore introduce a simple technique that uses transition tables for generating correctly randomized sequences. We present an analytic method of producing item-pair frequency tables and item-pair transitional probability tables when immediate repetitions are not allowed. We illustrate these difficulties and how to overcome them, with reference to a classic article on word segmentation in infants. Finally, we provide free access to an Excel file that allows users to generate transition tables with up to 10 different item types, as well as to generate appropriately distributed randomized sequences of any length without immediately repeated elements. This file is freely available from http://leadserv.u-bourgogne.fr/IMG/xls/TransitionMatrix.xls.
Scott, Anthony; Jeon, Sung-Hee; Joyce, Catherine M; Humphreys, John S; Kalb, Guyonne; Witt, Julia; Leahy, Anne
2011-09-05
Surveys of doctors are an important data collection method in health services research. Ways to improve response rates, minimise survey response bias and item non-response, within a given budget, have not previously been addressed in the same study. The aim of this paper is to compare the effects and costs of three different modes of survey administration in a national survey of doctors. A stratified random sample of 4.9% (2,702/54,160) of doctors undertaking clinical practice was drawn from a national directory of all doctors in Australia. Stratification was by four doctor types: general practitioners, specialists, specialists-in-training, and hospital non-specialists, and by six rural/remote categories. A three-arm parallel trial design with equal randomisation across arms was used. Doctors were randomly allocated to: online questionnaire (902); simultaneous mixed mode (a paper questionnaire and login details sent together) (900); or, sequential mixed mode (online followed by a paper questionnaire with the reminder) (900). Analysis was by intention to treat, as within each primary mode, doctors could choose either paper or online. Primary outcome measures were response rate, survey response bias, item non-response, and cost. The online mode had a response rate 12.95%, followed by the simultaneous mixed mode with 19.7%, and the sequential mixed mode with 20.7%. After adjusting for observed differences between the groups, the online mode had a 7 percentage point lower response rate compared to the simultaneous mixed mode, and a 7.7 percentage point lower response rate compared to sequential mixed mode. The difference in response rate between the sequential and simultaneous modes was not statistically significant. Both mixed modes showed evidence of response bias, whilst the characteristics of online respondents were similar to the population. However, the online mode had a higher rate of item non-response compared to both mixed modes. The total cost of the online survey was 38% lower than simultaneous mixed mode and 22% lower than sequential mixed mode. The cost of the sequential mixed mode was 14% lower than simultaneous mixed mode. Compared to the online mode, the sequential mixed mode was the most cost-effective, although exhibiting some evidence of response bias. Decisions on which survey mode to use depend on response rates, response bias, item non-response and costs. The sequential mixed mode appears to be the most cost-effective mode of survey administration for surveys of the population of doctors, if one is prepared to accept a degree of response bias. Online surveys are not yet suitable to be used exclusively for surveys of the doctor population.
2011-01-01
Background Surveys of doctors are an important data collection method in health services research. Ways to improve response rates, minimise survey response bias and item non-response, within a given budget, have not previously been addressed in the same study. The aim of this paper is to compare the effects and costs of three different modes of survey administration in a national survey of doctors. Methods A stratified random sample of 4.9% (2,702/54,160) of doctors undertaking clinical practice was drawn from a national directory of all doctors in Australia. Stratification was by four doctor types: general practitioners, specialists, specialists-in-training, and hospital non-specialists, and by six rural/remote categories. A three-arm parallel trial design with equal randomisation across arms was used. Doctors were randomly allocated to: online questionnaire (902); simultaneous mixed mode (a paper questionnaire and login details sent together) (900); or, sequential mixed mode (online followed by a paper questionnaire with the reminder) (900). Analysis was by intention to treat, as within each primary mode, doctors could choose either paper or online. Primary outcome measures were response rate, survey response bias, item non-response, and cost. Results The online mode had a response rate 12.95%, followed by the simultaneous mixed mode with 19.7%, and the sequential mixed mode with 20.7%. After adjusting for observed differences between the groups, the online mode had a 7 percentage point lower response rate compared to the simultaneous mixed mode, and a 7.7 percentage point lower response rate compared to sequential mixed mode. The difference in response rate between the sequential and simultaneous modes was not statistically significant. Both mixed modes showed evidence of response bias, whilst the characteristics of online respondents were similar to the population. However, the online mode had a higher rate of item non-response compared to both mixed modes. The total cost of the online survey was 38% lower than simultaneous mixed mode and 22% lower than sequential mixed mode. The cost of the sequential mixed mode was 14% lower than simultaneous mixed mode. Compared to the online mode, the sequential mixed mode was the most cost-effective, although exhibiting some evidence of response bias. Conclusions Decisions on which survey mode to use depend on response rates, response bias, item non-response and costs. The sequential mixed mode appears to be the most cost-effective mode of survey administration for surveys of the population of doctors, if one is prepared to accept a degree of response bias. Online surveys are not yet suitable to be used exclusively for surveys of the doctor population. PMID:21888678
Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D
2015-01-01
To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.
Hopewell, Sally; Clarke, Mike; Moher, David; Wager, Elizabeth; Middleton, Philippa; Altman, Douglas G; Schulz, Kenneth F
2008-01-01
Background Clear, transparent, and sufficiently detailed abstracts of conferences and journal articles related to randomized controlled trials (RCTs) are important, because readers often base their assessment of a trial solely on information in the abstract. Here, we extend the CONSORT (Consolidated Standards of Reporting Trials) Statement to develop a minimum list of essential items, which authors should consider when reporting the results of a RCT in any journal or conference abstract. Methods and Findings We generated a list of items from existing quality assessment tools and empirical evidence. A three-round, modified-Delphi process was used to select items. In all, 109 participants were invited to participate in an electronic survey; the response rate was 61%. Survey results were presented at a meeting of the CONSORT Group in Montebello, Canada, January 2007, involving 26 participants, including clinical trialists, statisticians, epidemiologists, and biomedical editors. Checklist items were discussed for eligibility into the final checklist. The checklist was then revised to ensure that it reflected discussions held during and subsequent to the meeting. CONSORT for Abstracts recommends that abstracts relating to RCTs have a structured format. Items should include details of trial objectives; trial design (e.g., method of allocation, blinding/masking); trial participants (i.e., description, numbers randomized, and number analyzed); interventions intended for each randomized group and their impact on primary efficacy outcomes and harms; trial conclusions; trial registration name and number; and source of funding. We recommend the checklist be used in conjunction with this explanatory document, which includes examples of good reporting, rationale, and evidence, when available, for the inclusion of each item. Conclusions CONSORT for Abstracts aims to improve reporting of abstracts of RCTs published in journal articles and conference proceedings. It will help authors of abstracts of these trials provide the detail and clarity needed by readers wishing to assess a trial's validity and the applicability of its results. PMID:18215107
Hopewel, Sally; Clarke, Mike; Moher, David; Wager, Elizabeth; Middleton, Philippa; Altman, Douglas G; Schulz, Kenneth F; The, Consort Group
2008-03-01
Clear, transparent, and sufficiently detailed abstracts of conferences and journal articles related to randomized controlled trials (RCTs) are important, because readers often base their assessment of a trial solely on information in the abstract. Here, we extend the CONSORT (Consolidated Standards of Reporting Trials) Statement to develop a minimum list of essential items, which authors should consider when reporting the results of a RCT in any journal or conference abstract. We generated a list of items from existing quality assessment tools and empirical evidence. A three-round, modified-Delphi process was used to select items. In all, 109 participants were invited to participate in an electronic survey; the response rate was 61%. Survey results were presented at a meeting of the CONSORT Group in Montebello, Canada, January 2007, involving 26 participants, including clinical trialists, statisticians, epidemiologists, and biomedical editors. Checklist items were discussed for eligibility into the final checklist. The checklist was then revised to ensure that it reflected discussions held during and subsequent to the meeting. CONSORT for Abstracts recommends that abstracts relating to RCTs have a structured format. Items should include details of trial objectives; trial design (e.g., method of allocation, blinding/masking); trial participants (i.e., description, numbers randomized, and number analyzed); interventions intended for each randomized group and their impact on primary efficacy outcomes and harms; trial conclusions; trial registration name and number; and source of funding. We recommend the checklist be used in conjunction with this explanatory document, which includes examples of good reporting, rationale, and evidence, when available, for the inclusion of each item. CONSORT for Abstracts aims to improve reporting of abstracts of RCTs published in journal articles and conference proceedings. It will help authors of abstracts of these trials provide the detail and clarity needed by readers wishing to assess a trial's validity and the applicability of its results.
People's Intuitions about Randomness and Probability: An Empirical Study
ERIC Educational Resources Information Center
Lecoutre, Marie-Paule; Rovira, Katia; Lecoutre, Bruno; Poitevineau, Jacques
2006-01-01
What people mean by randomness should be taken into account when teaching statistical inference. This experiment explored subjective beliefs about randomness and probability through two successive tasks. Subjects were asked to categorize 16 familiar items: 8 real items from everyday life experiences, and 8 stochastic items involving a repeatable…
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
2015-08-19
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
Improving Inpatient Surveys: Web-Based Computer Adaptive Testing Accessed via Mobile Phone QR Codes
2016-01-01
Background The National Health Service (NHS) 70-item inpatient questionnaire surveys inpatients on their perceptions of their hospitalization experience. However, it imposes more burden on the patient than other similar surveys. The literature shows that computerized adaptive testing (CAT) based on item response theory can help shorten the item length of a questionnaire without compromising its precision. Objective Our aim was to investigate whether CAT can be (1) efficient with item reduction and (2) used with quick response (QR) codes scanned by mobile phones. Methods After downloading the 2008 inpatient survey data from the Picker Institute Europe website and analyzing the difficulties of this 70-item questionnaire, we used an author-made Excel program using the Rasch partial credit model to simulate 1000 patients’ true scores followed by a standard normal distribution. The CAT was compared to two other scenarios of answering all items (AAI) and the randomized selection method (RSM), as we investigated item length (efficiency) and measurement accuracy. The author-made Web-based CAT program for gathering patient feedback was effectively accessed from mobile phones by scanning the QR code. Results We found that the CAT can be more efficient for patients answering questions (ie, fewer items to respond to) than either AAI or RSM without compromising its measurement accuracy. A Web-based CAT inpatient survey accessed by scanning a QR code on a mobile phone was viable for gathering inpatient satisfaction responses. Conclusions With advances in technology, patients can now be offered alternatives for providing feedback about hospitalization satisfaction. This Web-based CAT is a possible option in health care settings for reducing the number of survey items, as well as offering an innovative QR code access. PMID:26935793
Improving Inpatient Surveys: Web-Based Computer Adaptive Testing Accessed via Mobile Phone QR Codes.
Chien, Tsair-Wei; Lin, Weir-Sen
2016-03-02
The National Health Service (NHS) 70-item inpatient questionnaire surveys inpatients on their perceptions of their hospitalization experience. However, it imposes more burden on the patient than other similar surveys. The literature shows that computerized adaptive testing (CAT) based on item response theory can help shorten the item length of a questionnaire without compromising its precision. Our aim was to investigate whether CAT can be (1) efficient with item reduction and (2) used with quick response (QR) codes scanned by mobile phones. After downloading the 2008 inpatient survey data from the Picker Institute Europe website and analyzing the difficulties of this 70-item questionnaire, we used an author-made Excel program using the Rasch partial credit model to simulate 1000 patients' true scores followed by a standard normal distribution. The CAT was compared to two other scenarios of answering all items (AAI) and the randomized selection method (RSM), as we investigated item length (efficiency) and measurement accuracy. The author-made Web-based CAT program for gathering patient feedback was effectively accessed from mobile phones by scanning the QR code. We found that the CAT can be more efficient for patients answering questions (ie, fewer items to respond to) than either AAI or RSM without compromising its measurement accuracy. A Web-based CAT inpatient survey accessed by scanning a QR code on a mobile phone was viable for gathering inpatient satisfaction responses. With advances in technology, patients can now be offered alternatives for providing feedback about hospitalization satisfaction. This Web-based CAT is a possible option in health care settings for reducing the number of survey items, as well as offering an innovative QR code access.
Item response analysis of the Positive and Negative Syndrome Scale
Santor, Darcy A; Ascher-Svanum, Haya; Lindenmayer, Jean-Pierre; Obenchain, Robert L
2007-01-01
Background Statistical models based on item response theory were used to examine (a) the performance of individual Positive and Negative Syndrome Scale (PANSS) items and their options, (b) the effectiveness of various subscales to discriminate among individual differences in symptom severity, and (c) the appropriateness of cutoff scores recently recommended by Andreasen and her colleagues (2005) to establish symptom remission. Methods Option characteristic curves were estimated using a nonparametric item response model to examine the probability of endorsing each of 7 options within each of 30 PANSS items as a function of standardized, overall symptom severity. Our data were baseline PANSS scores from 9205 patients with schizophrenia or schizoaffective disorder who were enrolled between 1995 and 2003 in either a large, naturalistic, observational study or else in 1 of 12 randomized, double-blind, clinical trials comparing olanzapine to other antipsychotic drugs. Results Our analyses show that the majority of items forming the Positive and Negative subscales of the PANSS perform very well. We also identified key areas for improvement or revision in items and options within the General Psychopathology subscale. The Positive and Negative subscale scores are not only more discriminating of individual differences in symptom severity than the General Psychopathology subscale score, but are also more efficient on average than the 30-item total score. Of the 8 items recently recommended to establish symptom remission, 1 performed markedly different from the 7 others and should either be deleted or rescored requiring that patients achieve a lower score of 2 (rather than 3) to signal remission. Conclusion This first item response analysis of the PANSS supports its sound psychometric properties; most PANSS items were either very good or good at assessing overall severity of illness. These analyses did identify some items which might be further improved for measuring individual severity differences or for defining remission thresholds. Findings also suggest that the Positive and Negative subscales are more sensitive to change than the PANSS total score and, thus, may constitute a "mini PANSS" that may be more reliable, require shorter administration and training time, and possibly reduce sample sizes needed for future research. PMID:18005449
Human behavioral complexity peaks at age 25
Brugger, Peter
2017-01-01
Random Item Generation tasks (RIG) are commonly used to assess high cognitive abilities such as inhibition or sustained attention. They also draw upon our approximate sense of complexity. A detrimental effect of aging on pseudo-random productions has been demonstrated for some tasks, but little is as yet known about the developmental curve of cognitive complexity over the lifespan. We investigate the complexity trajectory across the lifespan of human responses to five common RIG tasks, using a large sample (n = 3429). Our main finding is that the developmental curve of the estimated algorithmic complexity of responses is similar to what may be expected of a measure of higher cognitive abilities, with a performance peak around 25 and a decline starting around 60, suggesting that RIG tasks yield good estimates of such cognitive abilities. Our study illustrates that very short strings of, i.e., 10 items, are sufficient to have their complexity reliably estimated and to allow the documentation of an age-dependent decline in the approximate sense of complexity. PMID:28406953
Do animals and furniture items elicit different brain responses in human infants?
Jeschonek, Susanna; Marinovic, Vesna; Hoehl, Stefanie; Elsner, Birgit; Pauen, Sabina
2010-11-01
One of the earliest categorical distinctions to be made by preverbal infants is the animate-inanimate distinction. To explore the neural basis for this distinction in 7-8-month-olds, an equal number of animal and furniture pictures was presented in an ERP-paradigm. The total of 118 pictures, all looking different from each other, were presented in a semi-randomized order for 1000ms each. Infants' brain responses to exemplars from both categories differed systematically regarding the negative central component (Nc: 400-600ms) at anterior channels. More specifically, the Nc was enhanced for animals in one subgroup of infants, and for furniture items in another subgroup of infants. Explorative analyses related to categorical priming further revealed category-specific differences in brain responses in the late time window (650-1550ms) at right frontal channels: Unprimed stimuli (preceded by a different-category item) elicited a more positive response as compared to primed stimuli (preceded by a same-category item). In sum, these findings suggest that the infant's brain discriminates exemplars from both global domains. Given the design of our task, we conclude that processes of category identification are more likely to account for our findings than processes of on-line category formation during the experimental session. Copyright © 2009 Elsevier B.V. All rights reserved.
Effects of hypnosis and level of processing on repeated recall of line drawings.
McKelvie, S J; Pullara, M
1988-07-01
Moderately susceptible subjects (N = 30) initially judged 30 line drawings of objects for pleasantness (deep processing) and 30 line drawings for visual complexity (shallow processing), after which they were given two immediate recall tests. Following a 48-hr delay, subjects were allocated randomly to hypnosis, simulation, or neutral control conditions and were tested four more times. Subjects produced more correct and incorrect responses over the six trials and gave a higher number of correct responses for deep items than for shallow items. Over the last four trials, hypnosis had no general facilitative effect relative to the other two treatments, but the effect of depth was strongest for hypnotized subjects, who recalled more deep items than did the controls. Finally, both hypnotized and simulating subjects rated their recall as more involuntary and their experimental treatment as more helpful than did the controls. Caution is urged in the forensic use of hypnosis as a retrieval device.
Shikata, Satoru; Nakayama, Takeo; Yamagishi, Hisakazu
2008-01-01
In this study, we conducted a limited survey of reports of surgical randomized controlled trials, using the consolidated standards of reporting trials (CONSORT) statement and additional check items to clarify problems in the evaluation of surgical reports. A total of 13 randomized trials were selected from two latest review articles on biliary surgery. Each randomized trial was evaluated according to 28 quality measures that comprised items from the CONSORT statement plus additional items. Analysis focused on relationships between the quality of each study and the estimated effect gap ("pooled estimate in meta-analysis" -- "estimated effect of each study"). No definite relationships were found between individual study quality and the estimated effect gap. The following items could have been described but were not provided in almost all the surgical RCT reports: "clearly defined outcomes"; "details of randomization"; "participant flow charts"; "intention-to-treat analysis"; "ancillary analyses"; and "financial conflicts of interest". The item, "participation of a trial methodologist in the study" was not found in any of the reports. Although the quality of reporting trials is not always related to a biased estimation of treatment effect, the items used for quality measures must be described to enable readers to evaluate the quality and applicability of the reporting. Further development of an assessment tool is needed for items specific to surgical randomized controlled trials.
Key Items to Get Right When Conducting a Randomized Controlled Trial in Education
ERIC Educational Resources Information Center
Coalition for Evidence-Based Policy, 2005
2005-01-01
This is a checklist of key items to get right when conducting a randomized controlled trial to evaluate an educational program or practice ("intervention"). It is intended as a practical resource for researchers and sponsors of research, describing items that are often critical to the success of a randomized controlled trial. A significant…
Non-ignorable missingness item response theory models for choice effects in examinee-selected items.
Liu, Chen-Wei; Wang, Wen-Chung
2017-11-01
Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.
Kelly, Laura; Jenkinson, Crispin; Ziebland, Sue
2013-01-01
Objective The internet is a valuable resource for accessing health information and support. We are developing an instrument to assess the effects of websites with experiential and factual health information. This study aimed to inform an item pool for the proposed questionnaire. Methods Items were informed through a review of relevant literature and secondary qualitative analysis of 99 narrative interviews relating to patient and carer experiences of health. Statements relating to identified themes were re-cast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n = 21) were used to assess items for face and content validity. Results Eighty-two generic items were identified following secondary qualitative analysis and expert review. Cognitive interviewing confirmed the questionnaire instructions, 62 items and the response options were acceptable to patients and carers. Conclusion Using a clear conceptual basis to inform item generation, 62 items have been identified as suitable to undergo further psychometric testing. Practice implications The final questionnaire will initially be used in a randomized controlled trial examining the effects of online patient's experiences. This will inform recommendations on the best way to present patients’ experiences within health information websites. PMID:23598293
Ortiz, Glorimar; Schacht, Lucille
2012-01-01
Measurement of consumers' satisfaction in psychiatric settings is important because it has been correlated with improved clinical outcomes and administrative measures of high-quality care. These consumer satisfaction measurements are actively used as performance measures required by the accreditation process and for quality improvement activities. Our objectives were (i) to re-evaluate, through exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), the structure of an instrument intended to measure consumers' satisfaction with care in psychiatric settings and (ii) to examine and publish the psychometric characteristics, validity and reliability, of the Inpatient Consumer Survey (ICS). To psychometrically test the structure of the ICS, 34 878 survey results, submitted by 90 psychiatric hospitals in 2008, were extracted from the Behavioral Healthcare Performance Measurement System (BHPMS). Basic descriptive item-response and correlation analyses were performed for total surveys. Two datasets were randomly created for analysis. A random sample of 8229 survey results was used for EFA. Another random sample of 8261 consumer survey results was used for CFA. This same sample was used to perform validity and reliability analyses. The item-response analysis showed that the mean range for a disagree/agree five-point scale was 3.10-3.94. Correlation analysis showed a strong relationship between items. Six domains (dignity, rights, environment, empowerment, participation, and outcome) with internal reliabilities between good to moderate (0.87-0.73) were shown to be related to overall care satisfaction. Overall reliability for the instrument was excellent (0.94). Results from CFA provided support for the domains structure of the ICS proposed through EFA. The overall findings from this study provide evidence that the ICS is a reliable measure of consumer satisfaction in psychiatric inpatient settings. The analysis has shown the ICS to provide valid and reliable results and to focus on the specific concerns of consumers of psychiatric inpatient care. Scores by item indicate that opportunity for improvement exists across healthcare organizations.
ERIC Educational Resources Information Center
Seitz, Sue; Morris, Dan
In a study on short term memory, 32 educable mentally retarded subjects (mean IQ 62.68, mean mental age 103.78 months) were randomly assigned to each of the four experimental conditions. An automated machine presented the stimuli (32 three-letter words) and the interference items (a list of random numbers read aloud between stimuli presentations).…
Optimization of Contrast Detection Power with Probabilistic Behavioral Information
Cordes, Dietmar; Herzmann, Grit; Nandy, Rajesh; Curran, Tim
2012-01-01
Recent progress in the experimental design for event-related fMRI experiments made it possible to find the optimal stimulus sequence for maximum contrast detection power using a genetic algorithm. In this study, a novel algorithm is proposed for optimization of contrast detection power by including probabilistic behavioral information, based on pilot data, in the genetic algorithm. As a particular application, a recognition memory task is studied and the design matrix optimized for contrasts involving the familiarity of individual items (pictures of objects) and the recollection of qualitative information associated with the items (left/right orientation). Optimization of contrast efficiency is a complicated issue whenever subjects’ responses are not deterministic but probabilistic. Contrast efficiencies are not predictable unless behavioral responses are included in the design optimization. However, available software for design optimization does not include options for probabilistic behavioral constraints. If the anticipated behavioral responses are included in the optimization algorithm, the design is optimal for the assumed behavioral responses, and the resulting contrast efficiency is greater than what either a block design or a random design can achieve. Furthermore, improvements of contrast detection power depend strongly on the behavioral probabilities, the perceived randomness, and the contrast of interest. The present genetic algorithm can be applied to any case in which fMRI contrasts are dependent on probabilistic responses that can be estimated from pilot data. PMID:22326984
Cappelleri, Joseph C; Althof, Stanley E; O'Leary, Michael P; Tseng, Li-Jung
2008-04-01
To evaluate the effect of sildenafil citrate on each item of the 14-item Self-Esteem And Relationship (SEAR) questionnaire, which is used to measure self-esteem, confidence, satisfaction with sexual relationship, and overall relationship satisfaction in men with erectile dysfunction (ED). Data were combined from two 12-week, double-blind, placebo-controlled, flexible-dose sildenafil trials having identical protocols, one conducted in the USA and the other in Mexico, Brazil, Australia and Japan. All men had ED and were aged >or=18 years. Response categories of each SEAR item used a 4-week reference period and were based on a five-point scale (1, almost never/never; 2, a few times; 3, sometimes; 4, most times; 5, almost always/always). The difference (sildenafil vs placebo) in the change from baseline to week 12 was evaluated with a Wilcoxon rank sum test using ridit analysis, and an analysis of covariance model that included treatment group, centre, study and baseline item score. Compared with the 274 patients receiving placebo, the 279 receiving sildenafil reported significantly greater mean and median improvements (P < 0.001) in each of the 14 SEAR items. The probability of increased psychosocial benefit from baseline to week 12 was higher with sildenafil for each SEAR item, and ranged from 0.60 ('My partner was unhappy with the quality of our sexual relations'[item reverse-scored]) to 0.72 ('I was satisfied with my sexual performance'). Across all items, the mean (sd) probability was 0.67 (0.04) that a randomly selected patient in the sildenafil group would have a more favourable change relative to a randomly selected patient in the placebo group. Sildenafil produced substantial and meaningful improvements at the item-specific level. This analysis complements previously published work on self-esteem, confidence and relationship satisfaction.
Developing a scale to measure "attachment to the local community" in late middle aged individuals.
Sakai, Taichi; Omori, Junko; Takahashi, Kazuko; Mitsumori, Yasuko; Kobayashi, Maasa; Ono, Wakanako; Miyazaki, Toshie; Anzai, Hitomi; Saito, Mika
2016-01-01
Objectives This study was conducted to develop a scale for measuring "attachment to the local community" for its use in health services. The scale is also intended to nurture new social relationships in late middle-aged individuals.Methods Thirty items were initially planned to be included in the scale to measure "attachment to the local community", according to a previous study that identified the concept. The study subjects were late middle-aged residents of City B in Prefecture A, located in Tokyo suburbs. From the basic resident register data, 1,000 individuals (local residents in the 50-69 year age group) were selected by a multi-stage random sampling technique, on the basis of their residential area, age, and sex (while maintaining the male to female ratio). An unsigned self-administered questionnaire was distributed to the subjects, and the responses were collected by postal mail. The collected data was analyzed using psychometric study of scale.Results Valid responses were obtained from 583 subjects, and the response rate was 58.3%. In an item analysis, none of the items were rejected. In a subsequent factor analysis, 7 items were eliminated. These items included 2 items with a factor loading of <0.40, 3 items loading on multiple factors and showing a factor loading of ≥0.40, and 2 items with a low factor correlation (0.04-0.16). These items included factors that related to only these 2 items. Consequently, 23 items in the following 4-factor structure were selected as the scale items: "Source of vitality to live life," "Intention to cherish ties with people," "Place where one can be oneself," and "Pride of being a resident." Cronbach's coefficient α for the entire scale of "attachment to the local community" was 0.95, demonstrating internal consistency. We then examined the correlation with an existing scale to measure social support; the results revealed a statistically significant correlation and confirmed criterion-related validity (P<0.001). In addition, the fit indices in a covariance structure analysis showed adequate values.Conclusions The developed scale was considered reliable and appropriate for measuring "attachment to the local community."
ERIC Educational Resources Information Center
Lee, Eunjung
2013-01-01
The purpose of this research was to compare the equating performance of various equating procedures for the multidimensional tests. To examine the various equating procedures, simulated data sets were used that were generated based on a multidimensional item response theory (MIRT) framework. Various equating procedures were examined, including…
Short-Term Memory Scanning Viewed as Exemplar-Based Categorization
ERIC Educational Resources Information Center
Nosofsky, Robert M.; Little, Daniel R.; Donkin, Christopher; Fific, Mario
2011-01-01
Exemplar-similarity models such as the exemplar-based random walk (EBRW) model (Nosofsky & Palmeri, 1997b) were designed to provide a formal account of multidimensional classification choice probabilities and response times (RTs). At the same time, a recurring theme has been to use exemplar models to account for old-new item recognition and to…
Impact of Missing Data on Person-Model Fit and Person Trait Estimation
ERIC Educational Resources Information Center
Zhang, Bo; Walker, Cindy M.
2008-01-01
The purpose of this research was to examine the effects of missing data on person-model fit and person trait estimation in tests with dichotomous items. Under the missing-completely-at-random framework, four missing data treatment techniques were investigated including pairwise deletion, coding missing responses as incorrect, hotdeck imputation,…
Performance Assessment in Serious Games: Compensating for the Effects of Randomness
ERIC Educational Resources Information Center
Westera, Wim
2016-01-01
This paper is about performance assessment in serious games. We conceive serious gaming as a process of player-lead decision taking. Starting from combinatorics and item-response theory we provide an analytical model that makes explicit to what extent observed player performances (decisions) are blurred by chance processes (guessing behaviors). We…
Modeling Growth in Electronic Learning Environments Using a Longitudinal Random Item Response Model
ERIC Educational Resources Information Center
Kadengye, Damazo T.; Ceulemans, Eva; Van Den Noortgate, Wim
2015-01-01
In educational environments, monitoring persons' progress over time may help teachers to evaluate the effectiveness of their teaching procedures. Electronic learning environments are increasingly being used as part of formal education and resulting datasets can be used to understand and to improve the environment. This study presents…
The Role of Perpetrator Motivation in Two Crime Scenarios
ERIC Educational Resources Information Center
Sizemore, O. J.
2013-01-01
Undergraduate volunteers (n = 134) were randomly assigned in a 2 x 2 design that manipulated type of crime (rape vs. robbery) and perpetrator motivation (anger vs. desire). After reading one of the crime scenarios, participants responded to a series of attitude items regarding responsibility for the crime, assigned blame to agents mentioned in the…
The Pieper-Zulkowski pressure ulcer knowledge test.
Pieper, Barbara; Zulkowski, Karen
2014-09-01
To describe the development and initial testing of the Pieper-Zulkowski Pressure Ulcer Knowledge Test (PZ-PUKT). Cross-sectional, instrument testing. Hospital association pressure ulcer educational program conference. Pressure ulcer research and guidelines from the last 5 years were examined for test item content. The initial PZ-PUKT had 115 items; response options were "true," "false," and "don't know." Registered nurses (N = 108) were randomly divided into 2 groups to take either the 60 prevention/risk and staging items or the 55 wound description items. Analyses of these responses resulted in 72 items, which were administered in total to a second cohort of 98 nurses for reliability. Cronbach's α was .80 for the 72-item PZ-PUKT. Cronbach's α values for the subscales were as follows: staging, .67; wound description, .64; and prevention/risk, .56. The mean correct scores were as follows: total, 80%; prevention, 77%; staging, 86%; and wound description, 77%. Nurses with wound care certification scored significantly higher on the PZ-PUKT than did nurses with other clinical certifications or with nurses who lacked certification. The PZ-PUKT has updated content about pressure ulcer prevention/risk, staging, and wound description. Reliability values are highest for the total test. Further use of the instrument in diverse settings will add to reliability testing and may provide direction for determination of a passing cutoff score.
Gender and physical therapy career success factors.
Rozier, C K; Raymond, M J; Goldstein, M S; Hamilton, B L
1998-07-01
Gender and profession are thought to affect how career success is perceived as well as how it is achieved. This study investigated items considered important in defining career success for male and female physical therapists. The study also explored the relationship among gender, beliefs about career success, and career experiences. Data were obtained through an investigator-developed survey. The self-report questionnaire consisted of 78 items in 4 areas: descriptive information, items important in characterizing career success, items perceived to enhance or inhibit career success, and items assessing self-esteem. Questionnaires were mailed to a random sample of active physical therapist members of the American Physical Therapy Association (N = 5,000). The response rate was 38.1% (n = 1,906). Both men and women selected indicators such as practicing ethically, improving patient health, and feeling satisfied over high income or status when describing career success. All respondents agreed that clinical competency and motivation are key factors related to achieving career success. Family issues, full-time employment, and flexibility of practice conditions emerged as primary gender differences. A unique set of indicators describe physical therapy career success. Gender differences in its description and factors that influence its achievement are related primarily to family issues. Career success for women depends to a greater degree on the ability to manage family responsibilities in conjunction with employment opportunities.
Narimoto, Tadamasa; Matsuura, Naomi; Takezawa, Tomohiro; Mitsuhashi, Yoshinori; Hiratani, Michio
2013-01-01
The authors investigated whether impaired spatial short-term memory exhibited by children with nonverbal learning disabilities is due to a problem in the encoding process. Children with or without nonverbal learning disabilities performed a simple spatial test that required them to remember 3, 5, or 7 spatial items presented simultaneously in random positions (i.e., spatial configuration) and to decide if a target item was changed or all items including the target were in the same position. The results showed that, even when the spatial positions in the encoding and probe phases were similar, the mean proportion correct of children with nonverbal learning disabilities was 0.58 while that of children without nonverbal learning disabilities was 0.84. The authors argue with the results that children with nonverbal learning disabilities have difficulty encoding relational information between spatial items, and that this difficulty is responsible for their impaired spatial short-term memory.
Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S
2016-01-01
To report piloting and initial validation of the VQoL_CYP, a novel age-appropriate vision-related quality of life (VQoL) instrument for self-reporting by children with visual impairment (VI). Participants were a random patient sample of children with VI aged 10-15 years. 69 patients, drawn from patient databases at Great Ormond Street Hospital and Moorfields Eye Hospital, United Kingdom, participated in piloting of the draft 47-item VQoL instrument, which enabled preliminary item reduction. Subsequent administration of the instrument, alongside functional vision (FV) and generic health-related quality of life (HRQoL) self-report measures, to 101 children with VI comprising a nationally representative sample enabled further item reduction and evaluation of psychometric properties using Rasch analysis. Construct validity was assessed through Pearson correlation coefficients. Item reduction through piloting (8 items removed for skewness and individual item response pattern) and validation (1 item removed for skewness and 3 for misfit in Rasch) produced a 35-item scale, with fit values within acceptable limits, no notable differential item functioning, good measurement precision, ordered response categories and acceptable targeting in Rasch. The VQoL_CYP showed good construct validity, correlating strongly with HRQoL scores, moderately with FV scores but not with acuity. Robust child-appropriate self-report VQoL measures for children with VI are necessary for understanding the broader impacts of living with a visual disability, distinguishing these from limited functioning per se. Future planned use in larger patient samples will allow further psychometric development of the VQoL_CYP as an adjunct to objective outcomes assessment.
Walton, David M; Beattie, Tyler; Putos, Joseph; MacDermid, Joy C
2016-06-01
The Brief Pain Inventory is composed of two quantifiable scales: pain severity and pain interference. The reported factor structure of the interference subscale is not consistent in the extant literature, with no clear choice between a single- or two-factor structure. Here, we report on the results of Rasch-based analysis of the interference subscale using a large population-based ambulatory patient database (the Quebec Pain Registry). Observational cohort. A total of 1,000 responses were randomly drawn from a total database of 5,654 for this analysis. Both the original 7-item and an expanded 10-item version (Tyler 2002) of the interference subscale were evaluated. Rasch analysis revealed significant misfit of both versions of the scale, with the original 7-item version outperforming the expanded 10-item version. Analysis of dimensionality revealed that both versions showed improved model fit when considered two subscales (affective and physical interference) with the item on sleep interference removed or considered separately. Additionally, significant uniform differential item functioning was identified for 6 of the 7 original items when the sample was stratified by age above or below 55 years. The interference subscale achieved adequate model fit when considered as two separate subscales with age as a mediator of response, while interpreting the sleep interference item separately. A transformation matrix revealed that in all cases, ordinal-level change at the extreme ends of the scale appears to be more meaningful than does a similar change at the midpoints. The Interference subscale of the BPI should be interpreted as two separate subscales (Affective Interference, Physical Interference) with the sleep item removed or interpreted separately for optimal fit to the Rasch model. Implications for research and clinical use are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.
Wolfe, Edward W; McGill, Michael T
2011-01-01
This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.
Multilevel Multidimensional Item Response Model with a Multilevel Latent Covariate
ERIC Educational Resources Information Center
Cho, Sun-Joo; Bottge, Brian A.
2015-01-01
In a pretest-posttest cluster-randomized trial, one of the methods commonly used to detect an intervention effect involves controlling pre-test scores and other related covariates while estimating an intervention effect at post-test. In many applications in education, the total post-test and pre-test scores that ignores measurement error in the…
2017-01-01
Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. Objective The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. Methods We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima’s graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). Results Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. Conclusions Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment. PMID:28931496
Random Item Generation Is Affected by Age
ERIC Educational Resources Information Center
Multani, Namita; Rudzicz, Frank; Wong, Wing Yiu Stephanie; Namasivayam, Aravind Kumar; van Lieshout, Pascal
2016-01-01
Purpose: Random item generation (RIG) involves central executive functioning. Measuring aspects of random sequences can therefore provide a simple method to complement other tools for cognitive assessment. We examine the extent to which RIG relates to specific measures of cognitive function, and whether those measures can be estimated using RIG…
Finbråten, Hanne Søberg; Pettersen, Kjell Sverre; Wilde-Larsson, Bodil; Nordström, Gun; Trollvik, Anne; Guttersrud, Øystein
2017-11-01
To validate the European Health Literacy Survey Questionnaire (HLS-EU-Q47) in people with type 2 diabetes mellitus. The HLS-EU-Q47 latent variable is outlined in a framework with four cognitive domains integrated in three health domains, implying 12 theoretically defined subscales. Valid and reliable health literacy measurers are crucial to effectively adapt health communication and education to individuals and groups of patients. Cross-sectional study applying confirmatory latent trait analyses. Using a paper-and-pencil self-administered approach, 388 adults responded in March 2015. The data were analysed using the Rasch methodology and confirmatory factor analysis. Response violation (response dependency) and trait violation (multidimensionality) of local independence were identified. Fitting the "multidimensional random coefficients multinomial logit" model, 1-, 3- and 12-dimensional Rasch models were applied and compared. Poor model fit and differential item functioning were present in some items, and several subscales suffered from poor targeting and low reliability. Despite multidimensional data, we did not observe any unordered response categories. Interpreting the domains as distinct but related latent dimensions, the data fit a 12-dimensional Rasch model and a 12-factor confirmatory factor model best. Therefore, the analyses did not support the estimation of one overall "health literacy score." To support the plausibility of claims based on the HLS-EU score(s), we suggest: removing the health care aspect to reduce the magnitude of multidimensionality; rejecting redundant items to avoid response dependency; adding "harder" items and applying a six-point rating scale to improve subscale targeting and reliability; and revising items to improve model fit and avoid bias owing to person factors. © 2017 John Wiley & Sons Ltd.
Hamel, J F; Sebille, V; Le Neel, T; Kubis, G; Boyer, F C; Hardouin, J B
2017-12-01
Subjective health measurements using Patient Reported Outcomes (PRO) are increasingly used in randomized trials, particularly for patient groups comparisons. Two main types of analytical strategies can be used for such data: Classical Test Theory (CTT) and Item Response Theory models (IRT). These two strategies display very similar characteristics when data are complete, but in the common case when data are missing, whether IRT or CTT would be the most appropriate remains unknown and was investigated using simulations. We simulated PRO data such as quality of life data. Missing responses to items were simulated as being completely random, depending on an observable covariate or on an unobserved latent trait. The considered CTT-based methods allowed comparing scores using complete-case analysis, personal mean imputations or multiple-imputations based on a two-way procedure. The IRT-based method was the Wald test on a Rasch model including a group covariate. The IRT-based method and the multiple-imputations-based method for CTT displayed the highest observed power and were the only unbiased method whatever the kind of missing data. Online software and Stata® modules compatibles with the innate mi impute suite are provided for performing such analyses. Traditional procedures (listwise deletion and personal mean imputations) should be avoided, due to inevitable problems of biases and lack of power.
Parallel coding of conjunctions in visual search.
Found, A
1998-10-01
Two experiments investigated whether the conjunctive nature of nontarget items influenced search for a conjunction target. Each experiment consisted of two conditions. In both conditions, the target item was a red bar tilted to the right, among white tilted bars and vertical red bars. As well as color and orientation, display items also differed in terms of size. Size was irrelevant to search in that the size of the target varied randomly from trial to trial. In one condition, the size of items correlated with the other attributes of display items (e.g., all red items were big and all white items were small). In the other condition, the size of items varied randomly (i.e., some red items were small and some were big, and some white items were big and some were small). Search was more efficient in the size-correlated condition, consistent with the parallel coding of conjunctions in visual search.
Feed mechanism and method for feeding minute items
Stringer, Timothy Kent; Yerganian, Simon Scott
2012-11-06
A feeding mechanism and method for feeding minute items, such as capacitors, resistors, or solder preforms. The mechanism is adapted to receive a plurality of the randomly-positioned and randomly-oriented extremely small or minute items, and to isolate, orient, and position the items in a specific repeatable pickup location wherefrom they may be removed for use by, for example, a computer-controlled automated assembly machine. The mechanism comprises a sliding shelf adapted to receive and support the items; a wiper arm adapted to achieve a single even layer of the items; and a pushing arm adapted to push the items into the pickup location. The mechanism can be adapted for providing the items with a more exact orientation, and can also be adapted for use in a liquid environment.
Feed mechanism and method for feeding minute items
Stringer, Timothy Kent [Bucyrus, KS; Yerganian, Simon Scott [Lee's Summit, MO
2009-10-20
A feeding mechanism and method for feeding minute items, such as capacitors, resistors, or solder preforms. The mechanism is adapted to receive a plurality of the randomly-positioned and randomly-oriented extremely small or minute items, and to isolate, orient, and position one or more of the items in a specific repeatable pickup location wherefrom they may be removed for use by, for example, a computer-controlled automated assembly machine. The mechanism comprises a sliding shelf adapted to receive and support the items; a wiper arm adapted to achieve a single even layer of the items; and a pushing arm adapted to push the items into the pickup location. The mechanism can be adapted for providing the items with a more exact orientation, and can also be adapted for use in a liquid environment.
Development of an Inconsistent Responding Scale for the Triarchic Psychopathy Measure.
Mowle, Elyse N; Kelley, Shannon E; Edens, John F; Donnellan, M Brent; Smith, Shannon Toney; Wygant, Dustin B; Sellbom, Martin
2017-08-01
Inconsistent or careless responding to self-report measures is estimated to occur in approximately 10% of university research participants and may be even more common among offender populations. Inconsistent responding may be a result of a number of factors including inattentiveness, reading or comprehension difficulties, and cognitive impairment. Many stand-alone personality scales used in applied and research settings, however, do not include validity indicators to help identify inattentive response patterns. Using multiple archival samples, the current study describes the development of an inconsistent responding scale for the Triarchic Psychopathy Measure (TriPM; Patrick, 2010), a widely used self-report measure of psychopathy. We first identified pairs of correlated TriPM items in a derivation sample (N = 2,138) and then created a total score based on the sum of the absolute value of the differences for each item pair. The resulting scale, the Triarchic Assessment Procedure for Inconsistent Responding (TAPIR), strongly differentiated between genuine TriPM protocols and randomly generated TriPM data (N = 1,000), as well as between genuine protocols and those in which 50% of the original data were replaced with random item responses. TAPIR scores demonstrated fairly consistent patterns of association with some theoretically relevant correlates (e.g., inconsistency scales embedded in other personality inventories), although not others (e.g., measures of conscientiousness) across our cross-validation samples. Tentative TAPIR cut scores that may discriminate between attentively and carelessly completed protocols are presented. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Cheng, Su-Fen; Lee-Hsieh, Jane; Turton, Michael A; Lin, Kuan-Chia
2014-06-01
Little research has investigated the establishment of norms for nursing students' self-directed learning (SDL) ability, recognized as an important capability for professional nurses. An item response theory (IRT) approach was used to establish norms for SDL abilities valid for the different nursing programs in Taiwan. The purposes of this study were (a) to use IRT with a graded response model to reexamine the SDL instrument, or the SDLI, originally developed by this research team using confirmatory factor analysis and (b) to establish SDL ability norms for the four different nursing education programs in Taiwan. Stratified random sampling with probability proportional to size was used. A minimum of 15% of students from the four different nursing education degree programs across Taiwan was selected. A total of 7,879 nursing students from 13 schools were recruited. The research instrument was the 20-item SDLI developed by Cheng, Kuo, Lin, and Lee-Hsieh (2010). IRT with the graded response model was used with a two-parameter logistic model (discrimination and difficulty) for the data analysis, calculated using MULTILOG. Norms were established using percentile rank. Analysis of item information and test information functions revealed that 18 items exhibited very high discrimination and two items had high discrimination. The test information function was higher in this range of scores, indicating greater precision in the estimate of nursing student SDL. Reliability fell between .80 and .94 for each domain and the SDLI as a whole. The total information function shows that the SDLI is appropriate for all nursing students, except for the top 2.5%. SDL ability norms were established for each nursing education program and for the nation as a whole. IRT is shown to be a potent and useful methodology for scale evaluation. The norms for SDL established in this research will provide practical standards for nursing educators and students in Taiwan.
ERIC Educational Resources Information Center
Gönülates, Emre; Kortemeyer, Gerd
2017-01-01
Homework is an important component of most physics courses. One of the functions it serves is to provide meaningful formative assessment in preparation for examinations. However, correlations between homework and examination scores tend to be low, likely due to unproductive student behavior such as copying and random guessing of answers. In this…
Developing and testing the CHORDS: Characteristics of Responsible Drinking Survey.
Barry, Adam E; Goodson, Patricia
2011-01-01
Report on the development and psychometric testing of a theoretically and evidence-grounded instrument, the Characteristics of Responsible Drinking Survey (CHORDS). Instrument subjected to four phases of pretesting (cognitive validity, cognitive and motivational qualities, pilot test, and item evaluation) and a final posttest implementation. Large public university in Texas. Randomly selected convenience sample (n = 729) of currently enrolled students. This 78-item questionnaire measures individuals' responsible drinking beliefs, motivations, intentions, and behaviors. Cronbach α, split-half reliability, principal components analysis and Spearman ρ were conducted to investigate reliability, stability, and validity. Measures in the CHORDS exhibited high internal consistency reliability and strong correlations of split-half reliability. Factor analyses indicated five distinct scales were present, as proposed in the theoretical model. Subscale composite scores also exhibited a correlation to alcohol consumption behaviors, indicating concurrent validity. The CHORDS represents the first instrument specifically designed to assess responsible drinking beliefs and behaviors. It was found to elicit valid and reliable data among a college student sample. This instrument holds much promise for practitioners who desire to empirically investigate dimensions of responsible drinking.
Hassett, Afton L; Li, Tracy; Buyske, Steven; Savage, Shantal V; Gignac, Monique A M
2008-05-01
To consider the feasibility of assessing multiple facets of independence in rheumatoid arthritis (RA) using a measure developed from existing items and examining its face validity, construct validity and responsiveness to change. The ATTAIN (Abatacept Trial in Treatment of Anti-tumor necrosis factor [TNF] Inadequate responders) database was used. Patients with RA were randomized 2:1, abatacept (n = 258) and placebo (n = 133). A multi-faceted scale to measure physical and psychosocial independence was constructed using items from the Health Assessment Questionnaire (HAQ) and Short Form 36 Health Survey (SF-36). Questions assessing activity limitations and need for outside caregiver help were also examined. Interviews with 20 RA patients assessed face validity. Item Response Theory analysis yielded two traits - 'Psychosocial Independence', derived from the number of days with activity limitations plus the Role Emotional, Social Functioning and Role Physical subscale items from the SF-36; and 'Physical Independence', derived from 15 HAQ items assessing need for help from another. The two traits showed no significant differential item functioning for age or gender and demonstrated good face validity. Changes over 169 days on Psychosocial Independence were greater (mean 0.46 units, 95% confidence interval [CI]: 0.17-0.75) for the abatacept group than for placebo (p = 0.002). Changes in Physical Independence were greater (mean 0.59 units, 95% CI: 0.35-0.82) for the abatacept group than for placebo (p < 0.001). The multi-faceted assessment of independence in RA based on items from commonly used instruments is feasible suggesting promise for evaluating independence in future clinical trials. This approach demonstrated good face and construct validity and responsiveness in RA patients who had previously failed anti-TNF therapy. However, we caution against an interpretation that these data suggest that abatacept improves independence because the component parts of this assessment came from instruments used in the ATTAIN trial where data had been previously analyzed.
Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun
2016-01-01
The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Hoben, Matthias; Bär, Marion; Mahler, Cornelia; Berger, Sarah; Squires, Janet E; Estabrooks, Carole A; Kruse, Andreas; Behrens, Johann
2014-01-31
To study the association between organizational context and research utilization in German residential long term care (LTC), we translated three Canadian assessment instruments: the Alberta Context Tool (ACT), Estabrooks' Kinds of Research Utilization (RU) items and the Conceptual Research Utilization Scale. Target groups for the tools were health care aides (HCAs), registered nurses (RNs), allied health professionals (AHPs), clinical specialists and care managers. Through a cognitive debriefing process, we assessed response processes validity-an initial stage of validity, necessary before more advanced validity assessment. We included 39 participants (16 HCAs, 5 RNs, 7 AHPs, 5 specialists and 6 managers) from five residential LTC facilities. We created lists of questionnaire items containing problematic items plus items randomly selected from the pool of remaining items. After participants completed the questionnaires, we conducted individual semi-structured cognitive interviews using verbal probing. We asked participants to reflect on their answers for list items in detail. Participants' answers were compared to concept maps defining the instrument concepts in detail. If at least two participants gave answers not matching concept map definitions, items were revised and re-tested with new target group participants. Cognitive debriefings started with HCAs. Based on the first round, we modified 4 of 58 ACT items, 1 ACT item stem and all 8 items of the RU tools. All items were understood by participants after another two rounds. We included revised HCA ACT items in the questionnaires for the other provider groups. In the RU tools for the other provider groups, we used different wording than the HCA version, as was done in the original English instruments. Only one cognitive debriefing round was needed with each of the other provider groups. Cognitive debriefing is essential to detect and respond to problematic instrument items, particularly when translating instruments for heterogeneous, less well educated provider groups such as HCAs. Cognitive debriefing is an important step in research tool development and a vital component of establishing response process validity evidence. Publishing cognitive debriefing results helps researchers to determine potentially critical elements of the translated tools and assists with interpreting scores.
2014-01-01
Background To study the association between organizational context and research utilization in German residential long term care (LTC), we translated three Canadian assessment instruments: the Alberta Context Tool (ACT), Estabrooks’ Kinds of Research Utilization (RU) items and the Conceptual Research Utilization Scale. Target groups for the tools were health care aides (HCAs), registered nurses (RNs), allied health professionals (AHPs), clinical specialists and care managers. Through a cognitive debriefing process, we assessed response processes validity–an initial stage of validity, necessary before more advanced validity assessment. Methods We included 39 participants (16 HCAs, 5 RNs, 7 AHPs, 5 specialists and 6 managers) from five residential LTC facilities. We created lists of questionnaire items containing problematic items plus items randomly selected from the pool of remaining items. After participants completed the questionnaires, we conducted individual semi-structured cognitive interviews using verbal probing. We asked participants to reflect on their answers for list items in detail. Participants’ answers were compared to concept maps defining the instrument concepts in detail. If at least two participants gave answers not matching concept map definitions, items were revised and re-tested with new target group participants. Results Cognitive debriefings started with HCAs. Based on the first round, we modified 4 of 58 ACT items, 1 ACT item stem and all 8 items of the RU tools. All items were understood by participants after another two rounds. We included revised HCA ACT items in the questionnaires for the other provider groups. In the RU tools for the other provider groups, we used different wording than the HCA version, as was done in the original English instruments. Only one cognitive debriefing round was needed with each of the other provider groups. Conclusion Cognitive debriefing is essential to detect and respond to problematic instrument items, particularly when translating instruments for heterogeneous, less well educated provider groups such as HCAs. Cognitive debriefing is an important step in research tool development and a vital component of establishing response process validity evidence. Publishing cognitive debriefing results helps researchers to determine potentially critical elements of the translated tools and assists with interpreting scores. PMID:24479645
Lee, Wei-Lun; Tsai, Shieunt-Han; Tsai, Chao-Wen; Lee, Chia-Ying
2011-01-01
To determine work stress, and stress-coping strategies, and to analyze their the relationships in order to improve health-promoting lifestyle of nurses in Taiwan. Three hundred eighty-five nurses who had work experience for more than 6 mo, were selected from four district hospitals in Kaohsiung and Ping Tung. We used a stratified cluster random sampling method for the selection. The nurses answered a self-report questionnaire, which was categorized into four sections: personal background data, work stress, stress-coping strategies, and health-promoting lifestyle. The findings indicate work stress and the health promoting lifestyle of nurses are at a higher level, with stress-coping strategies being at a medium level. Work stress and stress-coping strategies were significantly and positively correlated. Professional relationships, managerial role, personal responsibility, and recognition of work stress and the responsibilities of a health-promoting lifestyle were negatively correlated. Managerial role, personal responsibility, and organizational atmosphere of work stress as well as realization, an item of health-promoting lifestyle, were negatively correlated. Recognition of work stress and stress management, items of health-promoting lifestyle, were negatively correlated. Health responsibility, and self-actualization, items of health-promoting lifestyle, as well as stress-coping strategies were negatively correlated. Nutrition, an item of health-promoting lifestyle, and the support stress-coping strategy was negatively correlated. Nurses have greater work pressure and better work stress-coping strategies, but worse health responsibility and realization of a health-promoting lifestyle. We suggest hospitals build good relationships and appropriately increase employment of nurses through a good work atmosphere to achieve nurses' realization of a health-promoting lifestyle.
Windmann, Sabine; Hill, Holger
2014-10-01
Performance on tasks requiring discrimination of at least two stimuli can be viewed either from an objective perspective (referring to actual stimulus differences), or from a subjective perspective (corresponding to participant's responses). Using event-related potentials recorded during an old/new recognition memory test involving emotionally laden and neutral words studied either blockwise or randomly intermixed, we show here how the objective perspective (old versus new items) yields late effects of blockwise emotional item presentation at parietal sites that the subjective perspective fails to find, whereas the subjective perspective ("old" versus "new" responses) is more sensitive to early effects of emotion at anterior sites than the objective perspective. Our results demonstrate the potential advantage of dissociating the subjective and the objective perspective onto task performance (in addition to analyzing trials with correct responses), especially for investigations of illusions and information processing biases, in behavioral and cognitive neuroscience studies. Copyright © 2014 Elsevier Inc. All rights reserved.
GITLIN, LAURA N.; ROTH, DAVID L.; BURGIO, LOUIS D.; LOEWENSTEIN, DAVID A.; WINTER, LARAINE; NICHOLS, LINDA; ARGÜELLES, SOLEDAD; CORCORAN, MARY; BURNS, ROBERT; MARTINDALE, JENNIFER
2008-01-01
Objective To evaluate psychometric properties and response patterns of the Caregiver Assessment of Function and Upset (CAFU), a 15-item multidimensional measure of dependence in dementia patients and caregiver reaction. Method 640 families were administered the CAFU (53% White, 43% African American, and 4% mixed race and ethnicity). We created a random split of the sample and conducted exploratory factor analyses on Sample 1 and confirmatory factor analyses on Sample 2. Convergent and discriminant validity were evaluated using Spearman rank correlation coefficients. Results A two-factor structure for functional items was derived, and excellent factorial validity was obtained. Convergent and discriminant validity were obtained for function and upset measures. Differential response patterns for dependence and caregiver upset were found for caregiver race, relationship, and care recipient gender but not for caregiver gender. Discussion The CAFU is easily administered, reliable, and valid for evaluating appraisals of dependencies and upsetting care areas. PMID:15750049
Wang, Jen; Thombs, Brett D.; Schmid, Margareta R.
2012-01-01
Abstract Background Growing recognition of the role of citizens and patients in health and health care has placed a spotlight on health literacy and patient education. Objective To identify specific competencies for health in definitions of health literacy and patient‐centred concepts and empirically test their dimensionality in the general population. Methods A thorough review of the literature on health literacy, self‐management, patient empowerment, patient education and shared decision making revealed considerable conceptual overlap as competencies for health and identified a corpus of 30 generic competencies for health. A questionnaire containing 127 items covering the 30 competencies was fielded as a telephone interview in German, French and Italian among 1255 respondents randomly selected from the resident population in Switzerland. Findings Analyses with the software MPlus to model items with mixed response categories showed that the items do not load onto a single factor. Multifactorial models with good fit could be erected for each of five dimensions defined a priori and their corresponding competencies: information and knowledge (four competencies, 17 items), general cognitive skills (four competencies, 17 items), social roles (two competencies, seven items), medical management (four competencies, 27 items) and healthy lifestyle (two competencies, six items). Multiple indicators and multiple causes models identified problematic differential item functioning for only six items belonging to two competencies. Conclusions The psychometric analyses of this instrument support broader conceptualization of health literacy not as a single competence but rather as a package of competencies for health. PMID:22390287
Otsuka, Sachio; Saiki, Jun
2016-02-01
Prior studies have shown that visual statistical learning (VSL) enhances familiarity (a type of memory) of sequences. How do statistical regularities influence the processing of each triplet element and inserted distractors that disrupt the regularity? Given that increased attention to triplets induced by VSL and inhibition of unattended triplets, we predicted that VSL would promote memory for each triplet constituent, and degrade memory for inserted stimuli. Across the first two experiments, we found that objects from structured sequences were more likely to be remembered than objects from random sequences, and that letters (Experiment 1) or objects (Experiment 2) inserted into structured sequences were less likely to be remembered than those inserted into random sequences. In the subsequent two experiments, we examined an alternative account for our results, whereby the difference in memory for inserted items between structured and random conditions is due to individuation of items within random sequences. Our findings replicated even when control letters (Experiment 3A) or objects (Experiment 3B) were presented before or after, rather than inserted into, random sequences. Our findings suggest that statistical learning enhances memory for each item in a regular set and impairs memory for items that disrupt the regularity. Copyright © 2015 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Liao, Chi-Wen; Livingston, Samuel A.
2008-01-01
Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…
ERIC Educational Resources Information Center
Lynch, Robert C.; Sedlacek, William E.
To ascertain the nature and extent of the differences between fraternity and non-fraternity men at the University of Maryland, a study was conducted in June 1969 with a small random sample (approximately 50 in each group). Their spring 1969 semester grades, ACT (or converted SAT) composite scores, and responses to selected items on the 1969…
Applications of computerized adaptive testing (CAT) to the assessment of headache impact.
Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew
2003-12-01
To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.
2016-01-01
Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
Brodey, Benjamin B; Gonzalez, Nicole L; Elkin, Kathryn Ann; Sasiela, W Jordan; Brodey, Inger S
2017-09-06
The computerized administration of self-report psychiatric diagnostic and outcomes assessments has risen in popularity. If results are similar enough across different administration modalities, then new administration technologies can be used interchangeably and the choice of technology can be based on other factors, such as convenience in the study design. An assessment based on item response theory (IRT), such as the Patient-Reported Outcomes Measurement Information System (PROMIS) depression item bank, offers new possibilities for assessing the effect of technology choice upon results. To create equivalent halves of the PROMIS depression item bank and to use these halves to compare survey responses and user satisfaction among administration modalities-paper, mobile phone, or tablet-with a community mental health care population. The 28 PROMIS depression items were divided into 2 halves based on content and simulations with an established PROMIS response data set. A total of 129 participants were recruited from an outpatient public sector mental health clinic based in Memphis. All participants took both nonoverlapping halves of the PROMIS IRT-based depression items (Part A and Part B): once using paper and pencil, and once using either a mobile phone or tablet. An 8-cell randomization was done on technology used, order of technologies used, and order of PROMIS Parts A and B. Both Parts A and B were administered as fixed-length assessments and both were scored using published PROMIS IRT parameters and algorithms. All 129 participants received either Part A or B via paper assessment. Participants were also administered the opposite assessment, 63 using a mobile phone and 66 using a tablet. There was no significant difference in item response scores for Part A versus B. All 3 of the technologies yielded essentially identical assessment results and equivalent satisfaction levels. Our findings show that the PROMIS depression assessment can be divided into 2 equivalent halves, with the potential to simplify future experimental methodologies. Among community mental health care recipients, the PROMIS items function similarly whether administered via paper, tablet, or mobile phone. User satisfaction across modalities was also similar. Because paper, tablet, and mobile phone administrations yielded similar results, the choice of technology should be based on factors such as convenience and can even be changed during a study without adversely affecting the comparability of results. ©Benjamin B Brodey, Nicole L Gonzalez, Kathryn Ann Elkin, W Jordan Sasiela, Inger S Brodey. Originally published in JMIR Mental Health (http://mental.jmir.org), 06.09.2017.
Zhao, Xiyan; Zhen, Zhong; Guo, Jing; Zhao, Tianyu; Ye, Ru; Guo, Yu; Chen, Hongdong; Lian, Fengmei; Tong, Xiaolin
2016-01-01
Placebo-controlled randomized trials are often used to evaluate the absolute effect of new treatments and are considered gold standard for clinical trials. No studies, however, have yet been conducted evaluating the reporting quality of placebo-controlled randomized trials. The current study aims to assess the reporting quality of placebo-controlled randomized trials on treatment of diabetes with Traditional Chinese Medicine (TCM) in Mainland China and to provide recommendations for improvements.China National Knowledge Infrastructure database, Wanfang database, China Biology Medicine database, and VIP database were searched for placebo-controlled randomized trials on treatment of diabetes with TCM. Review, animal experiment, and randomized controlled trials without placebo control were excluded. According to Consolidated Standards of Reporting Trials (CONSORT) 2010 checklists items, each item was given a yes or no depending on whether it was reported or not.A total of 68 articles were included. The reporting percentage in each article ranged from 24.3% to 73%, and 30.9% articles reported more than 50% of the items. Seven of the 37 items were reported more than 90% of the items, whereas 7 items were not mentioned at all. The average reporting for "title and abstract," "introduction," "methods," "results," "discussion," and "other information" was 43.4%, 78.7%, 40.1%, 49.9%, 71.1%, and 17.2%, respectively. The percentage of each section had increased after 2010. In addition, the reporting of multiple study centers, funding, placebo species, informed consent forms, and ethical approvals were 14.7%, 50%, 36.85%, 33.8%, and 4.4%, respectively.Although a scoring system was created according to the CONSORT 2010 checklist, it was not designed as an assessment tool. According to CONSORT 2010, the reporting quality of placebo-controlled randomized trials on the treatment of diabetes with TCM improved after 2010. Future improvements, however, are still needed, particularly in methods sections.
Killaspy, Helen; White, Sarah; Dowling, Sarah; Krotofil, Joanna; McPherson, Peter; Sandhu, Sima; Arbuthnott, Maurice; Curtis, Sarah; Leavey, Gerard; Priebe, Stefan; Shepherd, Geoff; King, Michael
2016-04-14
No standardised tools for assessing the quality of specialist mental health supported accommodation services exist. To address this, we adapted the Quality Indicator for Rehabilitative care-QuIRC-that was originally developed to assess the quality of longer term inpatient and community based mental health facilities. The QuIRC, which is completed by the service manager and gives ratings of seven domains of care, has good psychometric properties. Focus groups with staff of the three main types of supported accommodation in the UK (residential care, supported housing and floating outreach services) were carried out to identify potential amendments to the QuIRC. Additional advice was gained from consultation with three expert panels, two of which comprised service users with lived experience of mental health and supported accommodation services. The amended QuIRC (QuIRC-SA) was piloted with a manager of each of the three service types. Item response variance, inter-rater reliability and internal consistency were assessed in a random sample of 52 services. Factorial structure and discriminant validity were assessed in a larger random sample of 87 services. The QuIRC-SA comprised 143 items of which only 18 items showed a narrow range of response and five items had poor inter-rater reliability. The tool showed good discriminant validity, with supported housing services generally scoring higher than the other two types of supported accommodation on most domains. Exploratory factor analysis showed that the QuIRC-SA items loaded onto the domains to which they had been allocated. The QuIRC-SA is the first standardised tool for quality assessment of specialist mental health supported accommodation services. Its psychometric properties mean that it has potential for use in research as well as audit and quality improvement programmes. A web based application is being developed to make it more accessible which will produce a printable report for the service manager about the performance of their service, comparison data for similar services and suggestions on how to improve service quality.
Gender-Based Differential Item Performance in Mathematics Achievement Items.
ERIC Educational Resources Information Center
Doolittle, Allen E.; Cleary, T. Anne
1987-01-01
Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)
2000-12-01
A SKIP FLAG INDICATING THE RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP...RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5, NOTE 2, IN APPENDIX D. G-52...RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5
Female Sexual Function Index Short Version: A MsFLASH Item Response Analysis.
Carpenter, Janet S; Jones, Salene M W; Studts, Christina R; Heiman, Julia R; Reed, Susan D; Newton, Katherine M; Guthrie, Katherine A; Larson, Joseph C; Cohen, Lee S; Freeman, Ellen W; Jane Lau, R; Learman, Lee A; Shifren, Jan L
2016-11-01
The Female Sexual Function Index (FSFI) is a psychometrically sound and popular 19-item self-report measure, but its length may preclude its use in studies with multiple outcome measures, especially when sexual function is not a primary endpoint. Only one attempt has been made to create a shorter scale, resulting in the Italian FSFI-6, later translated into Spanish and Korean without further psychometric analysis. Our study evaluated whether a subset of items on the 19-item English-language FSFI would perform as well as the full-length FSFI in peri- and postmenopausal women. We used baseline data from 898 peri- and postmenopausal women recruited from multiple communities, ages 42-62 years, and enrolled in randomized controlled trials for vasomotor symptom management. Goals were to (1) create a psychometrically sound, shorter version of the FSFI for use in peri- and postmenopausal women as a continuous measure and (2) compare it to the Italian FSFI-6. Results indicated that a 9-item scale provided more information than the FSFI-6 across a spectrum of sexual functioning, was able to capture sample variability, and showed sufficient range without floor or ceiling effects. All but one of the items from the Italian 6-item version were included in the 9-item version. Most omitted FSFI items focused on frequency of events or experiences. When assessment of sexual function is a secondary endpoint and subject burden related to questionnaire length is a priority, the 9-item FSFI may provide important information about sexual function in English-speaking peri- and postmenopausal women.
[Mokken scaling of the Cognitive Screening Test].
Diesfeldt, H F A
2009-10-01
The Cognitive Screening Test (CST) is a twenty-item orientation questionnaire in Dutch, that is commonly used to evaluate cognitive impairment. This study applied Mokken Scale Analysis, a non-parametric set of techniques derived from item response theory (IRT), to CST-data of 466 consecutive participants in psychogeriatric day care. The full item set and the standard short version of fourteen items both met the assumptions of the monotone homogeneity model, with scalability coefficient H = 0.39, which is considered weak. In order to select items that would fulfil the assumption of invariant item ordering or the double monotonicity model, the subjects were randomly partitioned into a training set (50% of the sample) and a test set (the remaining half). By means of an automated item selection eleven items were found to measure one latent trait, with H = 0.67 and item H coefficients larger than 0.51. Cross-validation of the item analysis in the remaining half of the subjects gave comparable values (H = 0.66; item H coefficients larger than 0.56). The selected items involve year, place of residence, birth date, the monarch's and prime minister's names, and their predecessors. Applying optimal discriminant analysis (ODA) it was found that the full set of twenty CST items performed best in distinguishing two predefined groups of patients of lower or higher cognitive ability, as established by an independent criterion derived from the Amsterdam Dementia Screening Test. The chance corrected predictive value or prognostic utility was 47.5% for the full item set, 45.2% for the fourteen items of the standard short version of the CST, and 46.1% for the homogeneous, unidimensional set of selected eleven items. The results of the item analysis support the application of the CST in cognitive assessment, and revealed a more reliable 'short' version of the CST than the standard short version (CST14).
On the Complexity of Item Response Theory Models.
Bonifay, Wes; Cai, Li
2017-01-01
Complexity in item response theory (IRT) has traditionally been quantified by simply counting the number of freely estimated parameters in the model. However, complexity is also contingent upon the functional form of the model. We examined four popular IRT models-exploratory factor analytic, bifactor, DINA, and DINO-with different functional forms but the same number of free parameters. In comparison, a simpler (unidimensional 3PL) model was specified such that it had 1 more parameter than the previous models. All models were then evaluated according to the minimum description length principle. Specifically, each model was fit to 1,000 data sets that were randomly and uniformly sampled from the complete data space and then assessed using global and item-level fit and diagnostic measures. The findings revealed that the factor analytic and bifactor models possess a strong tendency to fit any possible data. The unidimensional 3PL model displayed minimal fitting propensity, despite the fact that it included an additional free parameter. The DINA and DINO models did not demonstrate a proclivity to fit any possible data, but they did fit well to distinct data patterns. Applied researchers and psychometricians should therefore consider functional form-and not goodness-of-fit alone-when selecting an IRT model.
Pederson, Linda L.; Thorne, Stacy L.; Caraballo, Ralph S.; Evans, Brian; Athey, Leslie; McMichael, Joseph
2010-01-01
Objectives. We sought to modify an instrument and to use it to collect information on smoking knowledge, attitudes, beliefs, and behaviors among Hispanics/Latinos, and to adapt survey methods to obtain high participation levels. Methods. Promotoras (outreach workers) conducted face-to-face interviews with 1485 Hispanic adults (July 2007–April 2008). The project team used GeoFrame field enumeration methods to develop a sampling frame from households in randomly selected colonias (residential areas along the Texas–Mexico border that may lack some basic necessities (e.g. portable water), in El Paso, Texas. Results. The revised questionnaire included 36 unchanged items from the State Adult Tobacco Survey, 7 modified items, and 17 new items focusing on possible culturally specific quitting methods, secondhand smoke issues, and attitudes and knowledge about tobacco use that might be unique for Hispanic/Latino groups. The eligibility rate was 90.2%, and the conservative combined completed screener and interview response rate was 80.0%. Conclusions. Strategic, targeted, carefully designed methods and surveys can achieve high reach and response rates in hard-to-reach populations. Similar procedures could be used to obtain cooperation of groups who may not be accessible with traditional methods. PMID:20147687
Mixture Rasch model for guessing group identification
NASA Astrophysics Data System (ADS)
Siow, Hoo Leong; Mahdi, Rasidah; Siew, Eng Ling
2013-04-01
Several alternative dichotomous Item Response Theory (IRT) models have been introduced to account for guessing effect in multiple-choice assessment. The guessing effect in these models has been considered to be itemrelated. In the most classic case, pseudo-guessing in the three-parameter logistic IRT model is modeled to be the same for all the subjects but may vary across items. This is not realistic because subjects can guess worse or better than the pseudo-guessing. Derivation from the three-parameter logistic IRT model improves the situation by incorporating ability in guessing. However, it does not model non-monotone function. This paper proposes to study guessing from a subject-related aspect which is guessing test-taking behavior. Mixture Rasch model is employed to detect latent groups. A hybrid of mixture Rasch and 3-parameter logistic IRT model is proposed to model the behavior based guessing from the subjects' ways of responding the items. The subjects are assumed to simply choose a response at random. An information criterion is proposed to identify the behavior based guessing group. Results show that the proposed model selection criterion provides a promising method to identify the guessing group modeled by the hybrid model.
Variability in Parameter Estimates and Model Fit across Repeated Allocations of Items to Parcels
ERIC Educational Resources Information Center
Sterba, Sonya K.; MacCallum, Robert C.
2010-01-01
Different random or purposive allocations of items to parcels within a single sample are thought not to alter structural parameter estimates as long as items are unidimensional and congeneric. If, additionally, numbers of items per parcel and parcels per factor are held fixed across allocations, different allocations of items to parcels within a…
Analyzing degradation data with a random effects spline regression model
Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip
2017-03-17
This study proposes using a random effects spline regression model to analyze degradation data. Spline regression avoids having to specify a parametric function for the true degradation of an item. A distribution for the spline regression coefficients captures the variation of the true degradation curves from item to item. We illustrate the proposed methodology with a real example using a Bayesian approach. The Bayesian approach allows prediction of degradation of a population over time and estimation of reliability is easy to perform.
Analyzing degradation data with a random effects spline regression model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip
This study proposes using a random effects spline regression model to analyze degradation data. Spline regression avoids having to specify a parametric function for the true degradation of an item. A distribution for the spline regression coefficients captures the variation of the true degradation curves from item to item. We illustrate the proposed methodology with a real example using a Bayesian approach. The Bayesian approach allows prediction of degradation of a population over time and estimation of reliability is easy to perform.
Molenaar, Dylan; de Boeck, Paul
2018-06-01
In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
2014-01-01
Background Tracing mail survey responses is useful for the management of reminders but may cause concerns about anonymity among prospective participants. We examined the impact of numbering return envelopes on the participation and the results of a survey on a sensitive topic among hospital staff. Methods In a survey about regrets associated with providing healthcare conducted among hospital-based doctors and nurses, two randomly drawn subsamples were provided numbered (N = 1100) and non-numbered (N = 500) envelopes for the return of completed questionnaires. Participation, explicit refusals, and item responses were compared. We also conducted a meta-analysis of the effect of questionnaire/envelope numbering on participation in health surveys. Results The participation rate was lower in the “numbered” group than in the “non-numbered” group (30.3% vs. 35.0%, p = 0.073), the proportion of explicit refusals was higher in the “numbered” group (23.1% vs 17.5%, p = 0.016), and the proportion of those who never returned the questionnaire was similar (46.6% vs 47.5%, p = 0.78). The means of responses differed significantly for 12 of 105 items (11.4%), which did not differ significantly from the expected frequency of type 1 errors, i.e., 5% (permutation test, p = 0.078). The meta-analysis of 7 experimental surveys (including this one) indicated that numbering is associated with a 2.4% decrease in the survey response rate (95% confidence interval 0.3% to 4.4%). Conclusions Numbered return envelopes may reduce the response rate and increase explicit refusals to participate in a sensitive survey. Reduced participation was confirmed by a meta-analysis of randomized health surveys. There was no strong evidence of bias. PMID:24428941
The portrayal of mental health and illness in Australian non-fiction media.
Francis, Catherine; Pirkis, Jane; Blood, R Warwick; Dunt, David; Burgess, Philip; Morley, Belinda; Stewart, Andrew; Putnis, Peter
2004-07-01
To provide a detailed picture of the extent, nature and quality of portrayal of mental health/illness in Australian non-fiction media. Media items were retrieved from Australian newspaper, television and radio sources over a 1-year period, and identifying/descriptive data extracted from all items. Quality ratings were made on a randomly selected 10% of items, using an instrument based on criteria in Achieving the Balance (a resource designed to promote responsible reporting of mental health/illness). Reporting of mental health/illness was common, with 4351 newspaper, 1237 television and 7801 radio items collected during the study period. Media items most frequently focused on policy/program initiatives in mental health (29.0%), or on causes/symptoms/treatment of mental illnesses (23.9%). Stories about mental health issues in the context of crime were relatively uncommon, accounting for only 5.6% of items. Most media items were of good quality on eight of the nine dimensions; the exception was that details of appropriate help services were only included in 6.4% of items. In contrast to previous research, the current study found that media reporting of mental health/illness was extensive, generally of good quality and focused less on themes of crime and violence than may have been expected. This is encouraging, since there is evidence that negative media portrayal of mental health/illness can detrimentally affect community attitudes. However, there are still opportunities for improving media reporting of mental health/illness, which should be taken up in future media strategies.
Biases and power for groups comparison on subjective health measurements.
Hamel, Jean-François; Hardouin, Jean-Benoit; Le Neel, Tanguy; Kubis, Gildas; Roquelaure, Yves; Sébille, Véronique
2012-01-01
Subjective health measurements are increasingly used in clinical research, particularly for patient groups comparisons. Two main types of analytical strategies can be used for such data: so-called classical test theory (CTT), relying on observed scores and models coming from Item Response Theory (IRT) relying on a response model relating the items responses to a latent parameter, often called latent trait. Whether IRT or CTT would be the most appropriate method to compare two independent groups of patients on a patient reported outcomes measurement remains unknown and was investigated using simulations. For CTT-based analyses, groups comparison was performed using t-test on the scores. For IRT-based analyses, several methods were compared, according to whether the Rasch model was considered with random effects or with fixed effects, and the group effect was included as a covariate or not. Individual latent traits values were estimated using either a deterministic method or by stochastic approaches. Latent traits were then compared with a t-test. Finally, a two-steps method was performed to compare the latent trait distributions, and a Wald test was performed to test the group effect in the Rasch model including group covariates. The only unbiased IRT-based method was the group covariate Wald's test, performed on the random effects Rasch model. This model displayed the highest observed power, which was similar to the power using the score t-test. These results need to be extended to the case frequently encountered in practice where data are missing and possibly informative.
Bischoff, Thomas; Diserens, Esther-Amélie; Herzig, Lilli; Meystre-Agustoni, Giovanna; Panese, Francesco; Favrat, Bernard; Sass, Catherine; Bodenmann, Patrick
2012-01-01
Objectives Advances in biopsychosocial science have underlined the importance of taking social history and life course perspective into consideration in primary care. For both clinical and research purposes, this study aims to develop and validate a standardised instrument measuring both material and social deprivation at an individual level. Methods We identified relevant potential questions regarding deprivation using a systematic review, structured interviews, focus group interviews and a think-aloud approach. Item response theory analysis was then used to reduce the length of the 38-item questionnaire and derive the deprivation in primary care questionnaire (DiPCare-Q) index using data obtained from a random sample of 200 patients during their planned visits to an ambulatory general internal medicine clinic. Patients completed the questionnaire a second time over the phone 3 days later to enable us to assess reliability. Content validity of the DiPCare-Q was then assessed by 17 general practitioners. Psychometric properties and validity of the final instrument were investigated in a second set of patients. The DiPCare-Q was administered to a random sample of 1898 patients attending one of 47 different private primary care practices in western Switzerland along with questions on subjective social status, education, source of income, welfare status and subjective poverty. Results Deprivation was defined in three distinct dimensions: material (eight items), social (five items) and health deprivation (three items). Item consistency was high in both the derivation (Kuder-Richardson Formula 20 (KR20) =0.827) and the validation set (KR20 =0.778). The DiPCare-Q index was reliable (interclass correlation coefficients=0.847) and was correlated to subjective social status (rs=−0.539). Conclusion The DiPCare-Q is a rapid, reliable and validated instrument that may prove useful for measuring both material and social deprivation in primary care. PMID:22307103
Yu, Dan-Dan; Xie, Yan-Ming; Liao, Xing; Zhi, Ying-Jie; Jiang, Jun-Jie; Chen, Wei
2018-02-01
To evaluate the methodological quality and reporting quality of randomized controlled trials(RCTs) published in China Journal of Chinese Materia Medica, we searched CNKI and China Journal of Chinese Materia webpage to collect RCTs since the establishment of the magazine. The Cochrane risk of bias assessment tool was used to evaluate the methodological quality of RCTs. The CONSORT 2010 list was adopted as reporting quality evaluating tool. Finally, 184 RCTs were included and evaluated methodologically, of which 97 RCTs were evaluated with reporting quality. For the methodological evaluating, 62 trials(33.70%) reported the random sequence generation; 9(4.89%) trials reported the allocation concealment; 25(13.59%) trials adopted the method of blinding; 30(16.30%) trials reported the number of patients withdrawing, dropping out and those lost to follow-up;2 trials (1.09%) reported trial registration and none of the trial reported the trial protocol; only 8(4.35%) trials reported the sample size estimation in details. For reporting quality appraising, 3 reporting items of 25 items were evaluated with high-quality,including: abstract, participants qualified criteria, and statistical methods; 4 reporting items with medium-quality, including purpose, intervention, random sequence method, and data collection of sites and locations; 9 items with low-quality reporting items including title, backgrounds, random sequence types, allocation concealment, blindness, recruitment of subjects, baseline data, harms, and funding;the rest of items were of extremely low quality(the compliance rate of reporting item<10%). On the whole, the methodological and reporting quality of RCTs published in the magazine are generally low. Further improvement in both methodological and reporting quality for RCTs of traditional Chinese medicine are warranted. It is recommended that the international standards and procedures for RCT design should be strictly followed to conduct high-quality trials. At the same time, in order to improve the reporting quality of randomized controlled trials, CONSORT standards should be adopted in the preparation of research reports and submissions. Copyright© by the Chinese Pharmaceutical Association.
Poisson and negative binomial item count techniques for surveys with sensitive question.
Tian, Guo-Liang; Tang, Man-Lai; Wu, Qin; Liu, Yin
2017-04-01
Although the item count technique is useful in surveys with sensitive questions, privacy of those respondents who possess the sensitive characteristic of interest may not be well protected due to a defect in its original design. In this article, we propose two new survey designs (namely the Poisson item count technique and negative binomial item count technique) which replace several independent Bernoulli random variables required by the original item count technique with a single Poisson or negative binomial random variable, respectively. The proposed models not only provide closed form variance estimate and confidence interval within [0, 1] for the sensitive proportion, but also simplify the survey design of the original item count technique. Most importantly, the new designs do not leak respondents' privacy. Empirical results show that the proposed techniques perform satisfactorily in the sense that it yields accurate parameter estimate and confidence interval.
Acculturation and the Center For Epidemiological Studies-Depression Scale for Hispanic women.
McCabe, Brian E; Vermeesch, Amber L; Hall, Rosemary F; Peragallo, Nilda P; Mitrani, Victoria B
2011-01-01
Culturally valid measures of depression for Spanish-speaking Hispanic women are important for developing and implementing effective interventions to reduce health disparities. The Center for Epidemiological Studies-Depression Scale (CES-D) is a widely used measure of depression. Differential item functioning has been studied using language preference as a proxy for acculturation, but it is unknown if the results were due to acculturation or the language of administration. The aim of this study was to evaluate the relationship of acculturation, defined with a dimensional measure, to Spanish CES-D item responses. Spanish-speaking Hispanic women (n = 504) were recruited for a randomized controlled trial of Salud, Educación, Prevención y Autocuidado (Health, Education, Prevention, and Self-Care). Acculturation, an important dimension of variation within the diverse U.S. Hispanic community, was defined by high or low scores on the Americanism subscale of the Bidimensional Acculturation Scale. Differential item functioning for each of the 20 CES-D items between more acculturated and less acculturated women was tested using ordinal logistic regression. No items on the Depressed Affect, Somatic Activity, or Positive Affect subscales showed meaningful differential item functioning, but 1 item ("People were unfriendly") on the Interpersonal subscale had small results (R = 1.1%). The majority of CES-D items performed similarly for Spanish-speaking Hispanic women with high and low acculturation. Less acculturated women responded more positively to "People were unfriendly," despite having an equivalent level of depression, than did more acculturated women. Possibilities for improving this item are proposed.
Weidmer, Beverly A; Brach, Cindy; Hays, Ron D
2012-09-01
The complexity of health information often exceeds patients' skills to understand and use it. To develop survey items assessing how well healthcare providers communicate health information. Domains and items for the Consumer Assessment of Healthcare Providers and Systems (CAHPS) Item Set for Addressing Health Literacy were identified through an environmental scan and input from stakeholders. The draft item set was translated into Spanish and pretested in both English and Spanish. The revised item set was field tested with a randomly selected sample of adult patients from 2 sites using mail and telephonic data collection. Item-scale correlations, confirmatory factor analysis, and internal consistency reliability estimates were estimated to assess how well the survey items performed and identify composite measures. Finally, we regressed the CAHPS global rating of the provider item on the CAHPS core communication composite and the new health literacy composites. A total of 601 completed surveys were obtained (52% response rate). Two composite measures were identified: (1) Communication to Improve Health Literacy (16 items); and (2) How Well Providers Communicate About Medicines (6 items). These 2 composites were significantly uniquely associated with the global rating of the provider (communication to improve health literacy: P<0.001, b=0.28; and communication about medicines composite: P=0.02, b=0.04). The 2 composites and the CAHPS core communication composite accounted for 51% of the variance in the global rating of the provider. A 5-item subset of the Communication to Improve Health Literacy composite accounted for 90% of the variance of the original 16-item composite. This study provides support for reliability and validity of the CAHPS Item Set for Addressing Health Literacy. These items can serve to assess whether healthcare providers have communicated effectively with their patients and as a tool for quality improvement.
Hobart, J; Thompson, A
2001-01-01
OBJECTIVES—Routine data collection is now considered mandatory. Therefore, staff rated clinical scales that consist of multiple items should have the minimum number of items necessary for rigorous measurement. This study explores the possibility of developing a short form Barthel index, suitable for use in clinical trials, epidemiological studies, and audit, that satisfies criteria for rigorous measurement and is psychometrically equivalent to the 10 item instrument. METHODS—Data were analysed from 844 consecutive admissions to a neurological rehabilitation unit in London. Random half samples were generated. Short forms were developed in one sample (n=419), by selecting items with the best measurement properties, and tested in the other (n=418). For each of the 10 items of the BI, item total correlations and effect sizes were computed and rank ordered. The best items were defined as those with the lowest cross product of these rank orderings. The acceptability, reliability, validity, and responsiveness of three short form BIs (five, four, and three item) were determined and compared with the 10 item BI. Agreement between scores generated by short forms and 10 item BI was determined using intraclass correlation coefficients and the method of Bland and Altman. RESULTS—The five best items in this sample were transfers, bathing, toilet use, stairs, and mobility. Of the three short forms examined, the five item BI had the best measurement properties and was psychometrically equivalent to the 10 item BI. Agreement between scores generated by the two measures for individual patients was excellent (ICC=0.90) but not identical (limits of agreement=1.84±3.84). CONCLUSIONS—The five item short form BI may be a suitable outcome measure for group comparison studies in comparable samples. Further evaluations are needed. Results demonstrate a fundamental difference between assessment and measurement and the importance of incorporating psychometric methods in the development and evaluation of health measures. PMID:11459898
Prediction of true test scores from observed item scores and ancillary data.
Haberman, Shelby J; Yao, Lili; Sinharay, Sandip
2015-05-01
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
Evaluating a Modular Design Approach to Collecting Survey Data Using Text Messages
West, Brady T.; Ghimire, Dirgha; Axinn, William G.
2015-01-01
This article presents analyses of data from a pilot study in Nepal that was designed to provide an initial examination of the errors and costs associated with an innovative methodology for survey data collection. We embedded a randomized experiment within a long-standing panel survey, collecting data on a small number of items with varying sensitivity from a probability sample of 450 young Nepalese adults. Survey items ranged from simple demographics to indicators of substance abuse and mental health problems. Sampled adults were randomly assigned to one of three different modes of data collection: 1) a standard one-time telephone interview, 2) a “single sitting” back-and-forth interview with an interviewer using text messaging, and 3) an interview using text messages within a modular design framework (which generally involves breaking the survey response task into distinct parts over a short period of time). Respondents in the modular group were asked to respond (via text message exchanges with an interviewer) to only one question on a given day, rather than complete the entire survey. Both bivariate and multivariate analyses demonstrate that the two text messaging modes increased the probability of disclosing sensitive information relative to the telephone mode, and that respondents in the modular design group, while responding less frequently, found the survey to be significantly easier. Further, those who responded in the modular group were not unique in terms of available covariates, suggesting that the reduced item response rates only introduced limited nonresponse bias. Future research should consider enhancing this methodology, applying it with other modes of data collection (e. g., web surveys), and continuously evaluating its effectiveness from a total survey error perspective. PMID:26322137
Audrin, Catherine; Ceravolo, Leonardo; Chanal, Julien; Brosch, Tobias; Sander, David
2017-11-23
The present study investigated the extent to which luxury vs. non-luxury brand labels (i.e., extrinsic cues) randomly assigned to items and preferences for these items impact choice, and how this impact may be moderated by materialistic tendencies (i.e., individual characteristics). The main objective was to investigate the neural correlates of abovementioned effects using functional magnetic resonance imaging. Behavioural results showed that the more materialistic people are, the more they choose and like items labelled with luxury brands. Neuroimaging results revealed the implication of a neural network including the dorsolateral and ventromedial prefrontal cortex and the orbitofrontal cortex that was modulated by the brand label and also by the participants' preference. Most importantly, items with randomly assigned luxurious brand labels were preferentially chosen by participants and triggered enhanced signal in the caudate nucleus. This effect increased linearly with materialistic tendencies. Our results highlight the impact of brand-item association, although random in our study, and materialism on preference, relying on subparts of the brain valuation system for the integration of extrinsic cues, preferences and individual characteristics.
Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee
2013-07-01
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
ERIC Educational Resources Information Center
van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas
2007-01-01
The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…
Rodgers, Wendy M; Hall, Craig R; Wilson, Philip M; Berry, Tanya R
2009-02-01
The purpose of this research was to examine whether exercisers and nonexercisers are rated similarly on a variety of characteristics by a sample of randomly selected regular exercisers, nonexercisers who intend to exercise, and nonexercisers with no intention to exercise. Previous research by Martin Ginis et al. (2003) has demonstrated an exerciser stereotype that advantages exercisers. It is unknown, however, the extent to which an exerciser stereotype is shared by nonexercisers, particularly nonintenders. Following an item-generation procedure, a sample of 470 (n=218 men; n=252 women) people selected using random digit dialing responded to a questionnaire assessing the extent to which they agreed that exercisers and nonexercisers possessed 24 characteristics, such as "happy," "fit," "fat," and "lazy." The results strongly support a positive exerciser bias, with exercisers rated more favorably on 22 of the 24 items. The degree of bias was equivalent in all groups of respondents. Examination of the demographic characteristics revealed no differences among the three groups on age, work status, or child-care responsibilities, suggesting that there is a pervasive positive exerciser bias.
Influence of item distribution pattern and abundance on efficiency of benthic core sampling
Behney, Adam C.; O'Shaughnessy, Ryan; Eichholz, Michael W.; Stafford, Joshua D.
2014-01-01
ore sampling is a commonly used method to estimate benthic item density, but little information exists about factors influencing the accuracy and time-efficiency of this method. We simulated core sampling in a Geographic Information System framework by generating points (benthic items) and polygons (core samplers) to assess how sample size (number of core samples), core sampler size (cm2), distribution of benthic items, and item density affected the bias and precision of estimates of density, the detection probability of items, and the time-costs. When items were distributed randomly versus clumped, bias decreased and precision increased with increasing sample size and increased slightly with increasing core sampler size. Bias and precision were only affected by benthic item density at very low values (500–1,000 items/m2). Detection probability (the probability of capturing ≥ 1 item in a core sample if it is available for sampling) was substantially greater when items were distributed randomly as opposed to clumped. Taking more small diameter core samples was always more time-efficient than taking fewer large diameter samples. We are unable to present a single, optimal sample size, but provide information for researchers and managers to derive optimal sample sizes dependent on their research goals and environmental conditions.
Fiedler, Daniela; Tröbst, Steffen; Harms, Ute
2017-01-01
Students of all ages face severe conceptual difficulties regarding key aspects of evolution—the central, unifying, and overarching theme in biology. Aspects strongly related to abstract “threshold” concepts like randomness and probability appear to pose particular difficulties. A further problem is the lack of an appropriate instrument for assessing students’ conceptual knowledge of randomness and probability in the context of evolution. To address this problem, we have developed two instruments, Randomness and Probability Test in the Context of Evolution (RaProEvo) and Randomness and Probability Test in the Context of Mathematics (RaProMath), that include both multiple-choice and free-response items. The instruments were administered to 140 university students in Germany, then the Rasch partial-credit model was applied to assess them. The results indicate that the instruments generate reliable and valid inferences about students’ conceptual knowledge of randomness and probability in the two contexts (which are separable competencies). Furthermore, RaProEvo detected significant differences in knowledge of randomness and probability, as well as evolutionary theory, between biology majors and preservice biology teachers. PMID:28572180
Nguyen, Allison M; Arbuckle, Rob; Korver, Tjeerd; Chen, Fang; Taylor, Beverley; Turnbull, Alice; Norquist, Josephine M
2017-08-01
The objective of this study was to evaluate the psychometric properties of the Dysmenorrhea Daily Diary (DysDD), an electronic patient-reported outcome, in a sample of 355 women with primary dysmenorrhea enrolled in a phase IIb, multicenter, randomized, partially blinded, placebo-controlled trial for treatment of dysmenorrhea. Subjects completed the DysDD over three menstrual cycles, one pre-treatment baseline cycle and two treatment cycles. The DysDD was administered alongside the Menstrual Distress Questionnaire (MDQ), the Short-Form 36 Version 2.0 (SF-36v2), and a Global Assessment of Change (GAC). Item response distributions, test-retest reliability, concurrent and known groups validity, responsiveness, and minimally important difference (MID) were evaluated for the DysDD. As expected, item response distributions varied throughout the menstrual period for all items, with the response scales fully utilized. Within-cycle test-retest reliability was adequate (weighted kappa: 0.5-0.7), although between-cycle test-retest was poor (weighted kappa: 0.1-0.5), most likely due to the highly variable nature of dysmenorrhea between cycles rather than limitations of the measure. Correlations with the MDQ and SF-36v2 were low-moderate, but in the predicted direction, supporting concurrent validity. There were significant differences in DysDD scores across severity groups based on pain medication use. The DysDD was responsive to changes in patients' dysmenorrhea with significantly different changes in scores between change groups (p < 0.0001). MID analyses suggest changes on the DysDD 0-10 pelvic pain score of three points can be considered clinically meaningful. Overall, findings indicate that the DysDD has acceptable reliability and is a valid and responsive instrument for assessing dysmenorrhea.
Garcia-Martinez, Irma; Weiss, Theresa R; Yousaf, Muhammad N; Ali, Ather; Mehal, Wajahat Z
2018-01-01
Leukocyte activation (LA) testing identifies food items that induce a patient specific cellular response in the immune system, and has recently been shown in a randomized double blinded prospective study to reduce symptoms in patients with irritable bowel syndrome (IBS). We hypothesized that test reactivity to particular food items, and the systemic immune response initiated by these food items, is due to the release of cellular DNA from blood immune cells. We tested this by quantifying total DNA concentration in the cellular supernatant of immune cells exposed to positive and negative foods from 20 healthy volunteers. To establish if the DNA release by positive samples is a specific phenomenon, we quantified myeloperoxidase (MPO) in cellular supernatants. We further assessed if a particular immune cell population (neutrophils, eosinophils, and basophils) was activated by the positive food items by flow cytometry analysis. To identify the signaling pathways that are required for DNA release we tested if specific inhibitors of key signaling pathways could block DNA release. Foods with a positive LA test result gave a higher supernatant DNA content when compared to foods with a negative result. This was specific as MPO levels were not increased by foods with a positive LA test. Protein kinase C (PKC) inhibitors resulted in inhibition of positive food stimulated DNA release. Positive foods resulted in CD63 levels greater than negative foods in eosinophils in 76.5% of tests. LA test identifies food items that result in release of DNA and activation of peripheral blood innate immune cells in a PKC dependent manner, suggesting that this LA test identifies food items that result in release of inflammatory markers and activation of innate immune cells. This may be the basis for the improvement in symptoms in IBS patients who followed an LA test guided diet.
The development and exploratory analysis of the Back Pain Attitudes Questionnaire (Back-PAQ)
Darlow, Ben; Perry, Meredith; Mathieson, Fiona; Stanley, James; Melloh, Markus; Marsh, Reginald; Baxter, G David; Dowell, Anthony
2014-01-01
Objectives To develop an instrument to assess attitudes and underlying beliefs about back pain, and subsequently investigate its internal consistency and underlying structures. Design The instrument was developed by a multidisciplinary team of clinicians and researchers based on analysis of qualitative interviews with people experiencing acute and chronic back pain. Exploratory analysis was conducted using data from a population-based cross-sectional survey. Setting Qualitative interviews with community-based participants and subsequent postal survey. Participants Instrument development informed by interviews with 12 participants with acute back pain and 11 participants with chronic back pain. Data for exploratory analysis collected from New Zealand residents and citizens aged 18 years and above. 1000 participants were randomly selected from the New Zealand Electoral Roll. 602 valid responses were received. Measures The 34-item Back Pain Attitudes Questionnaire (Back-PAQ) was developed. Internal consistency was evaluated by the Cronbach α coefficient. Exploratory analysis investigated the structure of the data using Principal Component Analysis. Results The 34-item long form of the scale had acceptable internal consistency (α=0.70; 95% CI 0.66 to 0.73). Exploratory analysis identified five two-item principal components which accounted for 74% of the variance in the reduced data set: ‘vulnerability of the back’; ‘relationship between back pain and injury’; ‘activity participation while experiencing back pain’; ‘prognosis of back pain’ and ‘psychological influences on recovery’. Internal consistency was acceptable for the reduced 10-item scale (α=0.61; 95% CI 0.56 to 0.66) and the identified components (α between 0.50 and 0.78). Conclusions The 34-item long form of the scale may be appropriate for use in future cross-sectional studies. The 10-item short form may be appropriate for use as a screening tool, or an outcome assessment instrument. Further testing of the 10-item Back-PAQ's construct validity, reliability, responsiveness to change and predictive ability needs to be conducted. PMID:24860003
An NCME Instructional Module on Polytomous Item Response Theory Models
ERIC Educational Resources Information Center
Penfield, Randall David
2014-01-01
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…
The patient safety climate in healthcare organizations (PSCHO) survey: Short-form development.
Benzer, Justin K; Meterko, Mark; Singer, Sara J
2017-08-01
Measures of safety climate are increasingly used to guide safety improvement initiatives. However, cost and respondent burden may limit the use of safety climate surveys. The purpose of this study was to develop a 15- to 20-item safety climate survey based on the Patient Safety Climate in Healthcare Organizations survey, a well-validated 38-item measure of safety climate. The Patient Safety Climate in Healthcare Organizations was administered to all senior managers, all physicians, and a 10% random sample of all other hospital personnel in 69 private sector hospitals and 30 Veterans Health Administration hospitals. Both samples were randomly divided into a derivation sample to identify a short-form subset and a confirmation sample to assess the psychometric properties of the proposed short form. The short form consists of 15 items represented 3 overarching domains in the long-form scale-organization, work unit, and interpersonal. The proposed short form efficiently captures 3 important sources of variance in safety climate: organizational, work-unit, and interpersonal. The short-form development process was a practical method that can be applied to other safety climate surveys. This safety climate short form may increase response rates in studies that involve busy clinicians or repeated measures. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Boelsen-Robinson, Tara; Chung, Alexandra; Khalil, Marianne; Wong, Evelyn; Kurzeme, Ariana; Peeters, Anna
2017-04-01
Examine the nutritional quality of food and beverages consumed across a sample of community aquatic and recreation centres in metropolitan Melbourne, Australia. Interviewer-administered surveys of randomly selected patrons attending four aquatic and recreation centres were conducted to ascertain food and beverage items consumed over two data collection periods (May-June 2014, January-February 2015). We selected centres in and around metropolitan Melbourne with a sit-down cafeteria and children's swimming classes. We classified items by government nutrient profiling guidelines; 'green' (best choice), 'amber' (choose carefully) or 'red' (limit). A total of 2,326 surveys were conducted (response rate 63%). Thirty-five per cent of surveyed patrons consumed food or beverages while at the centre; 54% of patrons purchased from the café and 61% brought items to the centre. More than half the food consumed from the café was 'red', increasing to 92% for children. One in five children visiting the centre consumed a 'red' item bought from the centre café. The nutritional quality of food and beverages consumed at recreation centres was generally poor, with the on-site cafés providing the majority of discretionary items consumed. Implications for public health: Community aquatic and recreation centres provide an opportunity to promote healthy eating by increasing the provision of healthy options and limiting discretionary food and drink items. © 2017 The Authors.
Competitive foods available in Pennsylvania public high schools.
Probart, Claudia; McDonnell, Elaine; Weirich, J Elaine; Hartman, Terryl; Bailey-Davis, Lisa; Prabhakher, Vaheedha
2005-08-01
This study examined the types and extent of competitive foods available in public high schools in Pennsylvania. We developed, pilot tested, and distributed surveys to school foodservice directors in a random sample of 271 high schools in Pennsylvania. Two hundred twenty-eight surveys were returned, for a response rate of 84%. Statistical analyses were performed: Descriptive statistics were used to examine the extent of competitive food sales in Pennsylvania public high schools. The survey data were analyzed using SPSS software version 11.5.1 (2002, SPSS base 11.0 for Windows, SPSS Inc, Chicago, IL). A la carte sales provide almost dollar 700/day to school foodservice programs, almost 85% of which receive no financial support from their school districts. The top-selling a la carte items are "hamburgers, pizza, and sandwiches." Ninety-four percent of respondents indicated that vending machines are accessible to students. The item most commonly offered in vending machines is bottled water (71.5%). While food items are less often available through school stores and club fund-raisers, candy is the item most commonly offered through these sources. Competitive foods are widely available in high schools. Although many of the items available are low in nutritional value, we found several of the top-selling a la carte options to be nutritious and bottled water the item most often identified as available through vending machines.
Marfeo, Elizabeth E; Ni, Pengsheng; McDonough, Christine; Peterik, Kara; Marino, Molly; Meterko, Mark; Rasch, Elizabeth K; Chan, Leighton; Brandt, Diane; Jette, Alan M
2018-03-01
Purpose To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function. Methods Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB. Results Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties. Conclusions IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.
Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
ERIC Educational Resources Information Center
Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.
2011-01-01
The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…
A Comparison of Web and Telephone Responses From a National HIV and AIDS Survey
Calzavara, Liviana; Allman, Dan; Worthington, Catherine A; Tyndall, Mark; Iveniuk, James
2016-01-01
Background Response differences to survey questions are known to exist for different modes of questionnaire completion. Previous research has shown that response differences by mode are larger for sensitive and complicated questions. However, it is unknown what effect completion mode may have on HIV and AIDS survey research, which addresses particularly sensitive and stigmatized health issues. Objectives We seek to compare responses between self-selected Web and telephone respondents in terms of social desirability and item nonresponse in a national HIV and AIDS survey. Methods A survey of 2085 people in Canada aged 18 years and older was conducted to explore public knowledge, attitudes, and behaviors around HIV and AIDS in May 2011. Participants were recruited using random-digit dialing and could select to be interviewed on the telephone or self-complete through the Internet. For this paper, 15 questions considered to be either sensitive, stigma-related, or less-sensitive in nature were assessed to estimate associations between responses and mode of completion. Multivariate regression analyses were conducted for questions with significant (P≤.05) bivariate differences in responses to adjust for sociodemographic factors. As survey mode was not randomly assigned, we created a propensity score variable and included it in our multivariate models to control for mode selection bias. Results A total of 81% of participants completed the questionnaire through the Internet, and 19% completed by telephone. Telephone respondents were older, reported less education, had lower incomes, and were more likely from the province of Quebec. Overall, 2 of 13 questions assessed for social desirability and 3 of 15 questions assessed for item nonresponse were significantly associated with choice of mode in the multivariate analysis. For social desirability, Web respondents were more likely than telephone respondents to report more than 1 sexual partner in the past year (fully adjusted odds ratio (OR)=3.65, 95% CI 1.80-7.42) and more likely to have donated to charity in the past year (OR=1.63, 95% CI 1.15-2.29). For item nonresponse, Web respondents were more likely than telephone respondents to have a missing or “don’t know” response when asked about: the disease they were most concerned about (OR=3.02, 95% CI 1.67-5.47); if they had ever been tested for HIV (OR=8.04, 95% CI 2.46-26.31); and when rating their level of comfort with shopping at grocery store if the owner was known to have HIV or AIDS (OR=3.11, 95% CI 1.47-6.63). Conclusion Sociodemographic differences existed between Web and telephone respondents, but for 23 of 28 questions considered in our analysis, there were no significant differences in responses by mode. For surveys with very sensitive health content, such as HIV and AIDS, Web administration may be subject to less social desirability bias but may also have greater item nonresponse for certain questions. PMID:27473597
ERIC Educational Resources Information Center
Fukuhara, Hirotaka; Kamata, Akihito
2011-01-01
A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Item Response Models for Examinee-Selected Items
ERIC Educational Resources Information Center
Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei
2012-01-01
In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
ERIC Educational Resources Information Center
Lee, Woo-yeol; Cho, Sun-Joo
2017-01-01
Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models
ERIC Educational Resources Information Center
Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol
2016-01-01
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
Guenole, Nigel; Brown, Anna A; Cooper, Andrew J
2018-06-01
This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
Moseson, Heidi; Massaquoi, Moses; Dehlendorf, Christine; Bawo, Luke; Dahn, Bernice; Zolia, Yah; Vittinghoff, Eric; Hiatt, Robert A; Gerdts, Caitlin
2015-12-01
Direct measurement of sensitive health events is often limited by high levels of under-reporting due to stigma and concerns about privacy. Abortion in particular is notoriously difficult to measure. This study implements a novel method to estimate the cumulative lifetime incidence of induced abortion in Liberia. In a randomly selected sample of 3219 women ages 15–49 years in June 2013 in Liberia, we implemented the ‘Double List Experiment’. To measure abortion incidence, each woman was read two lists: (A) a list of non-sensitive items and (B) a list of correlated non-sensitive items with abortion added. The sensitive item, abortion, was randomly added to either List A or List B for each respondent. The respondent reported a simple count of the options on each list that she had experienced, without indicating which options. Difference in means calculations between the average counts for each list were then averaged to provide an estimate of the population proportion that has had an abortion. The list experiment estimates that 32% [95% confidence interval (CI): 0.29-0.34) of respondents surveyed had ever had an abortion (26% of women in urban areas, and 36% of women in rural areas, P-value for difference < 0.001), with a 95% response rate. The list experiment generated an estimate five times greater than the only previous representative estimate of abortion in Liberia, indicating the potential utility of this method to reduce under-reporting in the measurement of abortion. The method could be widely applied to measure other stigmatized health topics, including sexual behaviours, sexual assault or domestic violence.
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda.
Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert
2008-12-02
The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda.
Bost, James E; Williams, Brian A; Bottegal, Matthew T; Dang, Qianyu; Rubio, Doris M
2007-12-01
We evaluated the validity and responsiveness of three instruments: the numeric rating scale (NRS) pain score, the 8-item Short-Form Health Survey (SF-8), and the 40-item Quality of Recovery from Anesthesia (QoR) Survey in 154 outpatients undergoing anterior cruciate ligament reconstruction (ACLR). The objective was to provide a robust psychometric basis for outcome survey selection for surgical outpatients undergoing regional anesthesia without general anesthesia. Patients undergoing ACLR with a standardized spinal anesthesia plan were randomized to receive a perineural catheter with either placebo injection-infusion, or injection-infusion with levobupivacaine. Patients completed the NRS, SF-8, and QoR instruments for four postoperative days to evaluate pain, physical function, and mental function. Regarding pain, neither the NRS nor the QoR offered advantages over the SF-8. Regarding physical function, the QoR physical independence composite offered no advantage over the SF-8 physical component summary. The QoR physical comfort composite assessed short-term changes in treatment-related side effects, and thus provided information not covered by the SF-8. Regarding mental function, the SF-8 mental component summary and QoR emotional state composite showed little change over the four days, although the latter measure showed higher responsiveness to change. For ACLR outpatients receiving regional anesthesia, the SF-8 is sufficient to assess postoperative pain and physical function. Adding the QoR physical comfort composite will help assess short-term side effects.
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda
Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert
2008-01-01
Background The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. Methods A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. Results The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. Conclusion This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda. PMID:19055716
Edjolo, Arlette; Proust-Lima, Cécile; Delva, Fleur; Dartigues, Jean-François; Pérès, Karine
2016-02-15
We aimed to describe the hierarchical structure of Instrumental Activities of Daily Living (IADL) and basic Activities of Daily Living (ADL) and trajectories of dependency before death in an elderly population using item response theory methodology. Data were obtained from a population-based French cohort study, the Personnes Agées QUID (PAQUID) Study, of persons aged ≥65 years at baseline in 1988 who were recruited from 75 randomly selected areas in Gironde and Dordogne. We evaluated IADL and ADL data collected at home every 2-3 years over a 24-year period (1988-2012) for 3,238 deceased participants (43.9% men). We used a longitudinal item response theory model to investigate the item sequence of 11 IADL and ADL combined into a single scale and functional trajectories adjusted for education, sex, and age at death. The findings confirmed the earliest losses in IADL (shopping, transporting, finances) at the partial limitation level, and then an overlapping of concomitant IADL and ADL, with bathing and dressing being the earliest ADL losses, and finally total losses for toileting, continence, eating, and transferring. Functional trajectories were sex-specific, with a benefit of high education that persisted until death in men but was only transient in women. An in-depth understanding of this sequence provides an early warning of functional decline for better adaptation of medical and social care in the elderly. © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Multiple sensitive estimation and optimal sample size allocation in the item sum technique.
Perri, Pier Francesco; Rueda García, María Del Mar; Cobo Rodríguez, Beatriz
2018-01-01
For surveys of sensitive issues in life sciences, statistical procedures can be used to reduce nonresponse and social desirability response bias. Both of these phenomena provoke nonsampling errors that are difficult to deal with and can seriously flaw the validity of the analyses. The item sum technique (IST) is a very recent indirect questioning method derived from the item count technique that seeks to procure more reliable responses on quantitative items than direct questioning while preserving respondents' anonymity. This article addresses two important questions concerning the IST: (i) its implementation when two or more sensitive variables are investigated and efficient estimates of their unknown population means are required; (ii) the determination of the optimal sample size to achieve minimum variance estimates. These aspects are of great relevance for survey practitioners engaged in sensitive research and, to the best of our knowledge, were not studied so far. In this article, theoretical results for multiple estimation and optimal allocation are obtained under a generic sampling design and then particularized to simple random sampling and stratified sampling designs. Theoretical considerations are integrated with a number of simulation studies based on data from two real surveys and conducted to ascertain the efficiency gain derived from optimal allocation in different situations. One of the surveys concerns cannabis consumption among university students. Our findings highlight some methodological advances that can be obtained in life sciences IST surveys when optimal allocation is achieved. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
[Design and validation of a questionnaire for psychosocial nursing diagnosis in Primary Care].
Brito-Brito, Pedro Ruymán; Rodríguez-Álvarez, Cristobalina; Sierra-López, Antonio; Rodríguez-Gómez, José Ángel; Aguirre-Jaime, Armando
2012-01-01
To develop a valid, reliable and easy-to-use questionnaire for a psychosocial nursing diagnosis. The study was performed in two phases: first phase, questionnaire design and construction; second phase, validity and reliability tests. A bank of items was constructed using the NANDA classification as a theoretical framework. Each item was assigned a Likert scale or dichotomous response. The combination of responses to the items constituted the diagnostic rules to assign up to 28 labels. A group of experts carried out the validity test for content. Other validated scales were used as reference standards for the criterion validity tests. Forty-five nurses provided the questionnaire to the patients on three separate occasions over a period of three weeks, and the other validated scales only once to 188 randomly selected patients in Primary Care centres in Tenerife (Spain). Validity tests for construct confirmed the six dimensions of the questionnaire with 91% of total variance explained. Validity tests for criterion showed a specificity of 66%-100%, and showed high correlations with the reference scales when the questionnaire was assigning nursing diagnoses. Reliability tests showed agreement of 56%-91% (P<.001), and a 93% internal consistency. The Questionnaire for Psychosocial Nursing Diagnosis was called CdePS, and included 61 items. The CdePS is a valid, reliable and easy-to-use tool in Primary Care centres to improve the assigning of a psychosocial nursing diagnosis. Copyright © 2011 Elsevier España, S.L. All rights reserved.
A Quasi-Parametric Method for Fitting Flexible Item Response Functions
ERIC Educational Resources Information Center
Liang, Longjuan; Browne, Michael W.
2015-01-01
If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
Qualitative Development of the PROMIS® Pediatric Stress Response Item Banks
Gardner, William; Pajer, Kathleen; Riley, Anne W.; Forrest, Christopher B.
2013-01-01
Objective To describe the qualitative development of the Patient-Reported Outcome Measurement Information System (PROMIS®) Pediatric Stress Response item banks. Methods Stress response concepts were specified through a literature review and interviews with content experts, children, and parents. A library comprising 2,677 items derived from 71 instruments was developed. Items were classified into conceptual categories; new items were written and redundant items were removed. Items were then revised based on cognitive interviews (n = 39 children), readability analyses, and translatability reviews. Results 2 pediatric Stress Response sub-domains were identified: somatic experiences (43 items) and psychological experiences (64 items). Final item pools cover the full range of children’s stress experiences. Items are comprehensible among children aged ≥8 years and ready for translation. Conclusions Child- and parent-report versions of the item banks assess children’s somatic and psychological states when demands tax their adaptive capabilities. PMID:23124904
Identifying Items to Assess Methodological Quality in Physical Therapy Trials: A Factor Analysis
Cummings, Greta G.; Fuentes, Jorge; Saltaji, Humam; Ha, Christine; Chisholm, Annabritt; Pasichnyk, Dion; Rogers, Todd
2014-01-01
Background Numerous tools and individual items have been proposed to assess the methodological quality of randomized controlled trials (RCTs). The frequency of use of these items varies according to health area, which suggests a lack of agreement regarding their relevance to trial quality or risk of bias. Objective The objectives of this study were: (1) to identify the underlying component structure of items and (2) to determine relevant items to evaluate the quality and risk of bias of trials in physical therapy by using an exploratory factor analysis (EFA). Design A methodological research design was used, and an EFA was performed. Methods Randomized controlled trials used for this study were randomly selected from searches of the Cochrane Database of Systematic Reviews. Two reviewers used 45 items gathered from 7 different quality tools to assess the methodological quality of the RCTs. An exploratory factor analysis was conducted using the principal axis factoring (PAF) method followed by varimax rotation. Results Principal axis factoring identified 34 items loaded on 9 common factors: (1) selection bias; (2) performance and detection bias; (3) eligibility, intervention details, and description of outcome measures; (4) psychometric properties of the main outcome; (5) contamination and adherence to treatment; (6) attrition bias; (7) data analysis; (8) sample size; and (9) control and placebo adequacy. Limitation Because of the exploratory nature of the results, a confirmatory factor analysis is needed to validate this model. Conclusions To the authors' knowledge, this is the first factor analysis to explore the underlying component items used to evaluate the methodological quality or risk of bias of RCTs in physical therapy. The items and factors represent a starting point for evaluating the methodological quality and risk of bias in physical therapy trials. Empirical evidence of the association among these items with treatment effects and a confirmatory factor analysis of these results are needed to validate these items. PMID:24786942
Identifying items to assess methodological quality in physical therapy trials: a factor analysis.
Armijo-Olivo, Susan; Cummings, Greta G; Fuentes, Jorge; Saltaji, Humam; Ha, Christine; Chisholm, Annabritt; Pasichnyk, Dion; Rogers, Todd
2014-09-01
Numerous tools and individual items have been proposed to assess the methodological quality of randomized controlled trials (RCTs). The frequency of use of these items varies according to health area, which suggests a lack of agreement regarding their relevance to trial quality or risk of bias. The objectives of this study were: (1) to identify the underlying component structure of items and (2) to determine relevant items to evaluate the quality and risk of bias of trials in physical therapy by using an exploratory factor analysis (EFA). A methodological research design was used, and an EFA was performed. Randomized controlled trials used for this study were randomly selected from searches of the Cochrane Database of Systematic Reviews. Two reviewers used 45 items gathered from 7 different quality tools to assess the methodological quality of the RCTs. An exploratory factor analysis was conducted using the principal axis factoring (PAF) method followed by varimax rotation. Principal axis factoring identified 34 items loaded on 9 common factors: (1) selection bias; (2) performance and detection bias; (3) eligibility, intervention details, and description of outcome measures; (4) psychometric properties of the main outcome; (5) contamination and adherence to treatment; (6) attrition bias; (7) data analysis; (8) sample size; and (9) control and placebo adequacy. Because of the exploratory nature of the results, a confirmatory factor analysis is needed to validate this model. To the authors' knowledge, this is the first factor analysis to explore the underlying component items used to evaluate the methodological quality or risk of bias of RCTs in physical therapy. The items and factors represent a starting point for evaluating the methodological quality and risk of bias in physical therapy trials. Empirical evidence of the association among these items with treatment effects and a confirmatory factor analysis of these results are needed to validate these items. © 2014 American Physical Therapy Association.
Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J
2004-01-01
Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
Household item ownership and self-rated health: material and psychosocial explanations
Pikhart, Hynek; Bobak, Martin; Rose, Richard; Marmot, Michael
2003-01-01
Background There has been an ongoing debate whether the effects of socioeconomic factors on health are due to absolute poverty and material factors or to relative deprivation and psychosocial factors. In the present analyses, we examined the importance for health of material factors, which may have a direct effect on health, and of those that may affect health indirectly, through psychosocial mechanisms. Methods Random national samples of men and women in Hungary (n = 973) and Poland (n = 1141) were interviewed (response rates 58% and 59%, respectively). The subjects reported their self-rated health, socioeconomic circumstances, including ownership of different household items, and perceived control over life. Household items were categorised as "basic needs", "socially oriented", and "luxury". We examined the association between the ownership of different groups of items and self-rated health. Since the lists of household items were different in Hungary and Poland, we conducted parallel identical analyses of the Hungarian and Polish data. Results The overall prevalence of poor or very poor health was 13% in Poland and 25% in Hungary. Education, material deprivation and the number of household items were all associated with poor health in bivariate analyses. All three groups of household items were positively related to self-rated health in age-adjusted analyses. The relation of basic needs items to poor health disappeared after controlling for other socioeconomic variables (mainly material deprivation). The relation of socially oriented and luxury items to poor health, however, persisted in multivariate models. The results were similar in both datasets. Conclusions These data suggest that health is influenced by both material and psychosocial aspects of socioeconomic factors. PMID:14641929
Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka
2016-01-01
Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132
Biases and Power for Groups Comparison on Subjective Health Measurements
Hamel, Jean-François; Hardouin, Jean-Benoit; Le Neel, Tanguy; Kubis, Gildas; Roquelaure, Yves; Sébille, Véronique
2012-01-01
Subjective health measurements are increasingly used in clinical research, particularly for patient groups comparisons. Two main types of analytical strategies can be used for such data: so-called classical test theory (CTT), relying on observed scores and models coming from Item Response Theory (IRT) relying on a response model relating the items responses to a latent parameter, often called latent trait. Whether IRT or CTT would be the most appropriate method to compare two independent groups of patients on a patient reported outcomes measurement remains unknown and was investigated using simulations. For CTT-based analyses, groups comparison was performed using t-test on the scores. For IRT-based analyses, several methods were compared, according to whether the Rasch model was considered with random effects or with fixed effects, and the group effect was included as a covariate or not. Individual latent traits values were estimated using either a deterministic method or by stochastic approaches. Latent traits were then compared with a t-test. Finally, a two-steps method was performed to compare the latent trait distributions, and a Wald test was performed to test the group effect in the Rasch model including group covariates. The only unbiased IRT-based method was the group covariate Wald’s test, performed on the random effects Rasch model. This model displayed the highest observed power, which was similar to the power using the score t-test. These results need to be extended to the case frequently encountered in practice where data are missing and possibly informative. PMID:23115620
Yordanova, Ralitsa; Ivanov, Ivan
2018-04-25
Developmental testing is essential for early recognition of the various developmental impairments. The tools used should be composed of items that are age specific, adapted, and standardized for the population they are applied to. The achievements of neurosciences, medicine, psychology, pedagogy, etc. are applied in the elaboration of a comprehensive examination tool that should screen all major areas of development. The key age of 5 years permits identification of almost all major developmental disabilities leaving time for therapeutic intervention before school entrance. The aim of the research is to evaluate the developmental performance of 5-year-old Bulgarian children using the approach of translation neuroscience. A comprehensive test program was developed composed of 89 items grouped in the following domains: fine and gross motor development, coordination and balance, central motor neuron disturbances, language development and articulation, perception, attention and behavior, visual acuity, and strabismus. The overall sample comprises 434 children of mean age 63.5 months (SD-3.7). Male to female ratio is 1:1.02. From this group, 390 children are between 60 and 71 months of age. The children are examined in 51 kindergartens in 21 villages and 18 cities randomly chosen in southern Bulgaria. Eight children were excluded from the final analysis because they fulfilled less than 50% of the test items (7 children did not cooperate and 1 child was with autistic spectrum disorder). The items with abnormal response in less than 5% of the children are 43. The items with abnormal response in 6% to 35% of the children are 37. The items with high abnormal response (more than 35%) rate are only 9. The test is an example of a translational approach in neuroscience. On one hand, it is based on the results of several sciences studying growth and development from different perspective. On the other hand, the results from the present research may be implemented in other fields of child development-education, psychology, speech and language therapy, and intervention programs. © 2018 John Wiley & Sons, Ltd.
Totton, Sarah C; Cullen, Jonah N; Sargeant, Jan M; O'Connor, Annette M
2018-02-01
The goal of the REFLECT Statement (Reporting guidElines For randomized controLled trials in livEstoCk and food safeTy) (published in 2010) was to provide the veterinary research community with reporting guidelines tailored for randomized controlled trials for livestock and food safety. Our objective was to determine the prevalence of REFLECT Statement reporting of items 1-19 in controlled trials published in journals between 1970 and 2017 examining the comparative efficacy of FDA-registered antimicrobials against naturally acquired BRD (bovine respiratory disease) in weaned beef calves in Canada or the USA, and to compare the prevalence of reporting before and after 2010, when REFLECT was published. We divided REFLECT Statement, items 3, 5, 10, and 11 into subitems, because each dealt with multiple elements requiring separate assessment. As a result, 28 different items or subitems were evaluated independently. We searched MEDLINE ® and CABI (CAB Abstracts ® and Global Health ® ) (Web of Science™) in April 2017 and screened 2327 references. Two reviewers independently assessed the reporting of each item and subitem. Ninety-five references were eligible for the study. The reporting of the REFLECT items showed a point estimate for the prevalence ratio >1 (i.e. a higher proportion of studies published post-2010 reported this item compared to studies published pre-2010), apart from items 10.3, i.e., item 10, subitem 3 (who assigned study units to the interventions), 13 (the flow of study units through the study), 16 (number of study units in analysis), 18 (multiplicity), and 19 (adverse effects). Fifty-three (79%) of 67 studies published before 2010 and all 28 (100%) papers published after 2010 reported using a random allocation method in either the title, abstract, or methods (Prevalence ratio = 1.25; 95% CI (1.09,1.43)). However, 8 studies published prior to 2010 and 7 studies published post-2010 reported the term "systematic randomization" or variations of this term (which is not true randomization) to describe the allocation procedure. Fifty-five percent (37/67) of studies published pre-2010 reported blinding status (blinded/not blinded) of outcome assessors, compared to 24/28 (86%) of studies published post-2010 (Prevalence ratio = 1.5, 95% CI (1.19, 2.02)). The reporting of recommended items in journal articles in this body of work is generally improving; however, there is also evidence of confusion about what constitutes a random allocation procedure, and this suggests an educational need. As this study is observational, this precludes concluding that the publication of the REFLECT Statement was the cause of this trend. Copyright © 2017 Elsevier B.V. All rights reserved.
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Khan, Arif; Durgam, Suresh; Tang, Xiongwen; Ruth, Adam; Mathews, Maju; Gommoll, Carl P.
2016-01-01
Objective To investigate vilazodone, currently approved for major depressive disorder in adults, for generalized anxiety disorder (GAD). Method Three randomized, double-blind, placebo-controlled studies showing positive results for vilazodone (2,040 mg/d) in adult patients with GAD (DSM-IV-TR) were pooled for analyses; data were collected from June 2012 to March 2014. Post hoc outcomes in the pooled intent-to-treat population (n = 1,462) included mean change from baseline to week 8 in Hamilton Anxiety Rating Scale (HARS) total score, psychic and somatic anxiety subscale scores, and individual item scores; HARS response (≥ 50% total score improvement) and remission (total score ≤ 7) at week 8; and category shifts, defined as HARS item score ≥ 2 at baseline (moderate to very severe symptoms) and score of 0 at week 8 (no symptoms). Results The least squares mean difference was statistically significant for vilazodone versus placebo in change from baseline to week 8 in HARS total score (−1.83, P < .0001) and in psychic anxiety (−1.21, P < .0001) and somatic anxiety (−0.63, P < .01) subscale scores; differences from placebo were significant on 11 of 14 HARS items (P < .05). Response rates were higher with vilazodone than placebo (48% vs 39%, P < .001), as were remission rates (27% vs 21%, P < .01). The percentage of patients who shifted to no symptoms was significant for vilazodone on several items: anxious mood, tension, intellectual, depressed mood, somatic-muscular, somatic-sensory, cardiovascular, respiratory, and autonomic symptoms (P < .05). Conclusions Treatment with vilazodone versus placebo was effective in adult GAD patients, with significant differences between treatment groups found on both psychic and somatic HARS items. Trial Registration ClinicalTrials.gov identifiers: NCT01629966, NCT01766401, NCT01844115. PMID:27486544
Khan, Arif; Durgam, Suresh; Tang, Xiongwen; Ruth, Adam; Mathews, Maju; Gommoll, Carl P
2016-01-01
To investigate vilazodone, currently approved for major depressive disorder in adults, for generalized anxiety disorder (GAD). Three randomized, double-blind, placebo-controlled studies showing positive results for vilazodone (2,040 mg/d) in adult patients with GAD (DSM-IV-TR) were pooled for analyses; data were collected from June 2012 to March 2014. Post hoc outcomes in the pooled intent-to-treat population (n = 1,462) included mean change from baseline to week 8 in Hamilton Anxiety Rating Scale (HARS) total score, psychic and somatic anxiety subscale scores, and individual item scores; HARS response (≥ 50% total score improvement) and remission (total score ≤ 7) at week 8; and category shifts, defined as HARS item score ≥ 2 at baseline (moderate to very severe symptoms) and score of 0 at week 8 (no symptoms). The least squares mean difference was statistically significant for vilazodone versus placebo in change from baseline to week 8 in HARS total score (-1.83, P < .0001) and in psychic anxiety (-1.21, P < .0001) and somatic anxiety (-0.63, P < .01) subscale scores; differences from placebo were significant on 11 of 14 HARS items (P < .05). Response rates were higher with vilazodone than placebo (48% vs 39%, P < .001), as were remission rates (27% vs 21%, P < .01). The percentage of patients who shifted to no symptoms was significant for vilazodone on several items: anxious mood, tension, intellectual, depressed mood, somatic-muscular, somatic-sensory, cardiovascular, respiratory, and autonomic symptoms (P < .05). Treatment with vilazodone versus placebo was effective in adult GAD patients, with significant differences between treatment groups found on both psychic and somatic HARS items. ClinicalTrials.gov identifiers: NCT01629966, NCT01766401, NCT01844115.
Motte, Anne-France; Diallo, Stéphanie; van den Brink, Hélène; Châteauvieux, Constance; Serrano, Carole; Naud, Carole; Steelandt, Julie; Alsac, Jean-Marc; Aubry, Pierre; Cour, Florence; Pellerin, Olivier; Pineau, Judith; Prognon, Patrice; Borget, Isabelle; Bonan, Brigitte; Martelli, Nicolas
2017-11-01
The aim of this study was to determine relevant items for reporting clinical trials on implantable medical devices (IMDs) and to identify reporting guidelines which include these items. A panel of experts identified the most relevant items for evaluating IMDs from an initial list based on reference papers. We then conducted a systematic review of articles indexed in MEDLINE. We retrieved reporting guidelines from the EQUATOR network's library for health research reporting. Finally, we screened these reporting guidelines to find those using our set of reporting items. Seven relevant reporting items were selected that related to four topics: randomization, learning curve, surgical setting, and device information. A total of 348 reporting guidelines were identified, among which 26 met our inclusion criteria. However, none of the 26 reporting guidelines presented all seven items together. The most frequently reported item was timing of randomization (65%). On the contrary, device information and learning curve effects were poorly specified. To our knowledge, this study is the first to identify specific items related to IMDs in reporting guidelines for clinical trials. We have shown that no existing reporting guideline is totally suitable for these devices. Copyright © 2017 Elsevier Inc. All rights reserved.
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items
ERIC Educational Resources Information Center
Aybek, Eren Can; Demirtasli, R. Nukhet
2017-01-01
This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
ERIC Educational Resources Information Center
Ito, Kyoko; Sykes, Robert C.
This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
Brookes, Sara T; Macefield, Rhiannon C; Williamson, Paula R; McNair, Angus G; Potter, Shelley; Blencowe, Natalie S; Strong, Sean; Blazeby, Jane M
2016-08-17
Methods for developing a core outcome or information set require involvement of key stakeholders to prioritise many items and achieve agreement as to the core set. The Delphi technique requires participants to rate the importance of items in sequential questionnaires (or rounds) with feedback provided in each subsequent round such that participants are able to consider the views of others. This study examines the impact of receiving feedback from different stakeholder groups, on the subsequent rating of items and the level of agreement between stakeholders. Randomized controlled trials were nested within the development of three core sets each including a Delphi process with two rounds of questionnaires, completed by patients and health professionals. Participants rated items from 1 (not essential) to 9 (absolutely essential). For round 2, participants were randomized to receive feedback from their peer stakeholder group only (peer) or both stakeholder groups separately (multiple). Decisions as to which items to retain following each round were determined by pre-specified criteria. Whilst type of feedback did not impact on the percentage of items for which a participant subsequently changed their rating, or the magnitude of change, it did impact on items retained at the end of round 2. Each core set contained discordant items retained by one feedback group but not the other (3-22 % discordant items). Consensus between patients and professionals in items to retain was greater amongst those receiving multiple group feedback in each core set (65-82 % agreement for peer-only feedback versus 74-94 % for multiple feedback). In addition, differences in round 2 scores were smaller between stakeholder groups receiving multiple feedback than between those receiving peer group feedback only. Variability in item scores across stakeholders was reduced following any feedback but this reduction was consistently greater amongst the multiple feedback group. In the development of a core outcome or information set, providing feedback within Delphi questionnaires from all stakeholder groups separately may influence the final core set and improve consensus between the groups. Further work is needed to better understand how participants rate and re-rate items within a Delphi process. The three randomized controlled trials reported here were each nested within the development of a core information or outcome set to investigate processes in core outcome and information set development. Outcomes were not health-related and therefore trial registration was not applicable.
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
Writing, Evaluating and Assessing Data Response Items in Economics.
ERIC Educational Resources Information Center
Trotman-Dickenson, D. I.
1989-01-01
Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…
Item Response Modeling with Sum Scores
ERIC Educational Resources Information Center
Johnson, Timothy R.
2013-01-01
One of the distinctions between classical test theory and item response theory is that the former focuses on sum scores and their relationship to true scores, whereas the latter concerns item responses and their relationship to latent scores. Although item response theory is often viewed as the richer of the two theories, sum scores are still…
2007-01-01
response options were randomly arranged for both the pretest and posttest . Additionally, one of the " pretest " items was given prior to watching the film...After selecting a topic, the AXL system presents the goals of the module to the group . The group then watches the associated filmed case on the...questionnaires and focus group interviews were used to create measures to assess learning from Tripwire modules. ARI then pilot tested one of the new
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means
ERIC Educational Resources Information Center
Polak, Marike; De Rooij, Mark; Heiser, Willem J.
2012-01-01
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study
ERIC Educational Resources Information Center
Yi, Qing; Zhang, Jinming; Chang, Hua-Hua
2008-01-01
Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive…
Hardigan, Patrick C; Popovici, Ioana; Carvajal, Manuel J
2016-01-01
There is a gap between increasing demands from pharmacy journals, publishers, and reviewers for high survey response rates and the actual responses often obtained in the field by survey researchers. Presumably demands have been set high because response rates, times, and costs affect the validity and reliability of survey results. Explore the extent to which survey response rates, average response times, and economic costs are affected by conditions under which pharmacist workforce surveys are administered. A random sample of 7200 U.S. practicing pharmacists was selected. The sample was stratified by delivery method, questionnaire length, item placement, and gender of respondent for a total of 300 observations within each subgroup. A job satisfaction survey was administered during March-April 2012. Delivery method was the only classification showing significant differences in response rates and average response times. The postal mail procedure accounted for the highest response rates of completed surveys, but the email method exhibited the quickest turnaround. A hybrid approach, consisting of a combination of postal and electronic means, showed the least favorable results. Postal mail was 2.9 times more cost effective than the email approach and 4.6 times more cost effective than the hybrid approach. Researchers seeking to increase practicing pharmacists' survey participation and reduce response time and related costs can benefit from the analytical procedures tested here. Copyright © 2016 Elsevier Inc. All rights reserved.
Billington, D. Rex; Hsu, Patricia Hsien-Chuan; Feng, Xuan Joanna; Medvedev, Oleg N.; Kersten, Paula; Landon, Jason; Siegert, Richard J.
2016-01-01
The World Health Organisation Quality of Life (WHOQOL) questionnaires are widely used around the world and can claim strong cross-cultural validity due to their development in collaboration with international field centres. To enhance conceptual equivalence of quality of life across cultures, optional national items are often developed for use alongside the core instrument. The present study outlines the development of national items for the New Zealand WHOQOL-BREF. Focus groups with members of the community as well as health experts discussed what constitutes quality of life in their opinion. Based on themes extracted of aspects not contained in the existing WHOQOL instrument, 46 candidate items were generated and subsequently rated for their importance by a random sample of 585 individuals from the general population. Applying importance criteria reduced these items to 24, which were then sent to another large random sample (n = 808) to be rated alongside the existing WHOQOL-BREF. A final set of five items met the criteria for national items. Confirmatory factor analysis identified four national items as belonging to the psychological domain of quality of life, and one item to the social domain. Rasch analysis validated these results and generated ordinal-to-interval conversion algorithms to allow use of parametric statistics for domain scores with and without national items. PMID:27812203
Daly, Justine B; Campbell, Elizabeth M; Wiggers, John H; Considine, Robyn J
2002-06-01
This study aimed to determine the prevalence of responsible hospitality policies in a group of licensed premises associated with alcohol-related harm. During March 1999, 108 licensed premises with one or more police-identified alcohol-related incidents in the previous 3 months received a visit from a police officer. A 30-item audit checklist was used to determine the responsible hospitality policies being undertaken by each premises within eight policy domains: display required signage (three items); responsible host practices to prevent intoxication and under-age drinking (five items); written policies and guidelines for responsible service (three items); discouraging inappropriate promotions (three items); safe transport (two items); responsible management issues (seven items); physical environment (three items) and entry conditions (four items). No premises were undertaking all 30 items. Eighty per cent of the premises were undertaking 20 of the 30 items. All premises were undertaking at least 17 of the items. The proportion of premises undertaking individual items ranged from 16% to 100%. Premises were less likely to report having and providing written responsible hospitality documentation to staff, using door charges and having entry/re-entry rules. Significant differences between rural and urban premises were evident for four policies. Clubs were significantly more likely than hotels to have a written responsible service of alcohol policy and to clearly display codes of dress and conditions of entry. This study provides an indication of the extent and nature of responsible hospitality policies in a sample of licensed premises that are associated with a broad range of alcohol related harms. The finding that a large majority of such premises appear to adopt responsible hospitality policies suggests a need to assess the validity and reliability of tools used in the routine assessment of such policies, and of the potential for harm from licensed premises.
Covell, Christine L; Sidani, Souraya; Ritchie, Judith A
2012-06-01
The sequence used for collecting quantitative and qualitative data in concurrent mixed-methods research may influence participants' responses. Empirical evidence is needed to determine if the order of data collection in concurrent mixed methods research biases participants' responses to closed and open-ended questions. To examine the influence of the quantitative-qualitative sequence on responses to closed and open-ended questions when assessing the same variables or aspects of a phenomenon simultaneously within the same study phase. A descriptive cross-sectional, concurrent mixed-methods design was used to collect quantitative (survey) and qualitative (interview) data. The setting was a large multi-site health care centre in Canada. A convenience sample of 50 registered nurses was selected and participated in the study. Participants were randomly assigned to one of two sequences for data collection, quantitative-qualitative or qualitative-quantitative. Independent t-tests were performed to compare the two groups' responses to the survey items. Directed content analysis was used to compare the participants' responses to the interview questions. The sequence of data collection did not greatly affect the participants' responses to the closed-ended questions (survey items) or the open-ended questions (interview questions). The sequencing of data collection, when using both survey and semi-structured interviews, may not bias participants' responses to closed or open-ended questions. Additional research is required to confirm these findings. Copyright © 2011 Elsevier Ltd. All rights reserved.
Dykema, Jennifer; Stevenson, John; Kniss, Chad; Kvale, Katherine; González, Kim; Cautley, Eleanor
2012-05-01
From 2009 to 2010, an experiment was conducted to increase response rates among African American mothers in the Wisconsin Pregnancy Risk Assessment Monitoring System (PRAMS). Sample members were randomly assigned to groups that received a prepaid, cash incentive of $5 (n = 219); a coupon for diapers valued at $6 (n = 210); or no incentive (n = 209). Incentives were included with the questionnaire, which was mailed to respondents. We examined the effects of the incentives on several outcomes, including response rates, cost effectiveness, survey response distributions, and item nonresponse. Response rates were significantly higher for the cash group than for the coupon (42.5 vs. 32.4%, P < .05) or no incentive group (42.5 vs. 30.1%, P < .01); the coupon and no incentive groups performed similarly. While absolute costs were the highest for the cash group, the cost per completed survey was the lowest. The incentives had limited effects on response distributions for specific survey questions. Although respondents completing the survey by mail in the cash and coupon groups exhibited a trend toward being less likely to have missing data, the effect was not significant. Compared to a coupon or no incentive, a small cash incentive significantly improved response rates and was cost effective among African American respondents in Wisconsin PRAMS. Incentives had only limited effects, however, on survey response distributions, and no significant effects on item nonresponse.
Koriat, Asher; Sorka, Hila
2015-01-01
The classification of objects to natural categories exhibits cross-person consensus and within-person consistency, but also some degree of between-person variability and within-person instability. What is more, the variability in categorization is also not entirely random but discloses systematic patterns. In this study, we applied the Self-Consistency Model (SCM, Koriat, 2012) to category membership decisions, examining the possibility that confidence judgments and decision latency track the stable and variable components of categorization responses. The model assumes that category membership decisions are constructed on the fly depending on a small set of clues that are sampled from a commonly shared population of pertinent clues. The decision and confidence are based on the balance of evidence in favor of a positive or a negative response. The results confirmed several predictions derived from SCM. For each participant, consensual responses to items were more confident than non-consensual responses, and for each item, participants who made the consensual response tended to be more confident than those who made the nonconsensual response. The difference in confidence between consensual and nonconsensual responses increased with the proportion of participants who made the majority response for the item. A similar pattern was observed for response speed. The pattern of results obtained for cross-person consensus was replicated by the results for response consistency when the responses were classified in terms of within-person agreement across repeated presentations. These results accord with the sampling assumption of SCM, that confidence and response speed should be higher when the decision is consistent with what follows from the entire population of clues than when it deviates from it. Results also suggested that the context for classification can bias the sample of clues underlying the decision, and that confidence judgments mirror the effects of context on categorization decisions. The model and results offer a principled account of the stable and variable contributions to categorization behavior within a decision-making framework. Copyright © 2014 Elsevier B.V. All rights reserved.
Item Response Data Analysis Using Stata Item Response Theory Package
ERIC Educational Resources Information Center
Yang, Ji Seung; Zheng, Xiaying
2018-01-01
The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
Item Response Models for Local Dependence among Multiple Ratings
ERIC Educational Resources Information Center
Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan
2014-01-01
Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…
Predictive validity of the Work Ability Index and its individual items in the general population.
Lundin, Andreas; Leijon, Ola; Vaez, Marjan; Hallgren, Mats; Torgén, Margareta
2017-06-01
This study assesses the predictive ability of the full Work Ability Index (WAI) as well as its individual items in the general population. The Work, Health and Retirement Study (WHRS) is a stratified random national sample of 25-75-year-olds living in Sweden in 2000 that received a postal questionnaire ( n = 6637, response rate = 53%). Current and subsequent sickness absence was obtained from registers. The ability of the WAI to predict long-term sickness absence (LTSA; ⩾ 90 consecutive days) during a period of four years was analysed by logistic regression, from which the Area Under the Receiver Operating Characteristic curve (AUC) was computed. There were 313 incident LTSA cases among 1786 employed individuals. The full WAI had acceptable ability to predict LTSA during the 4-year follow-up (AUC = 0.79; 95% CI 0.76 to 0.82). Individual items were less stable in their predictive ability. However, three of the individual items: current work ability compared with lifetime best, estimated work impairment due to diseases, and number of diagnosed current diseases, exceeded AUC > 0.70. Excluding the WAI item on number of days on sickness absence did not result in an inferior predictive ability of the WAI. The full WAI has acceptable predictive validity, and is superior to its individual items. For public health surveys, three items may be suitable proxies of the full WAI; current work ability compared with lifetime best, estimated work impairment due to diseases, and number of current diseases diagnosed by a physician.
Item response theory - A first approach
NASA Astrophysics Data System (ADS)
Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar
2017-07-01
The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Student Learning about Biomolecular Self-Assembly Using Two Different External Representations
Höst, Gunnar E.; Larsson, Caroline; Olson, Arthur; Tibell, Lena A. E.
2013-01-01
Self-assembly is the fundamental but counterintuitive principle that explains how ordered biomolecular complexes form spontaneously in the cell. This study investigated the impact of using two external representations of virus self-assembly, an interactive tangible three-dimensional model and a static two-dimensional image, on student learning about the process of self-assembly in a group exercise. A conceptual analysis of self-assembly into a set of facets was performed to support study design and analysis. Written responses were collected in a pretest/posttest experimental design with 32 Swedish university students. A quantitative analysis of close-ended items indicated that the students improved their scores between pretest and posttest, with no significant difference between the conditions (tangible model/image). A qualitative analysis of an open-ended item indicated students were unfamiliar with self-assembly prior to the study. Students in the tangible model condition used the facets of self-assembly in their open-ended posttest responses more frequently than students in the image condition. In particular, it appears that the dynamic properties of the tangible model may support student understanding of self-assembly in terms of the random and reversible nature of molecular interactions. A tentative difference was observed in response complexity, with more multifaceted responses in the tangible model condition. PMID:24006395
Student learning about biomolecular self-assembly using two different external representations.
Höst, Gunnar E; Larsson, Caroline; Olson, Arthur; Tibell, Lena A E
2013-01-01
Self-assembly is the fundamental but counterintuitive principle that explains how ordered biomolecular complexes form spontaneously in the cell. This study investigated the impact of using two external representations of virus self-assembly, an interactive tangible three-dimensional model and a static two-dimensional image, on student learning about the process of self-assembly in a group exercise. A conceptual analysis of self-assembly into a set of facets was performed to support study design and analysis. Written responses were collected in a pretest/posttest experimental design with 32 Swedish university students. A quantitative analysis of close-ended items indicated that the students improved their scores between pretest and posttest, with no significant difference between the conditions (tangible model/image). A qualitative analysis of an open-ended item indicated students were unfamiliar with self-assembly prior to the study. Students in the tangible model condition used the facets of self-assembly in their open-ended posttest responses more frequently than students in the image condition. In particular, it appears that the dynamic properties of the tangible model may support student understanding of self-assembly in terms of the random and reversible nature of molecular interactions. A tentative difference was observed in response complexity, with more multifaceted responses in the tangible model condition.
Psychosocial consequences of cancer cachexia: the development of an item bank.
Häne, Hanspeter; Oberholzer, Rolf; Walker, Jochen; Hopkinson, Jane B; de Wolf-Linder, Susanne; Strasser, Florian
2013-12-01
Cancer cachexia syndrome (CCS) is often accompanied by psychosocial consequences (PSC). To alleviate PSC, a systematic assessment method is required. Currently, few assessment tools are available (e.g., Functional Assessment of Anorexia/Cachexia Therapy). There is no systematic assessment tool that captures the PSC of CCS. To develop a pilot item bank to assess the PSC of CCS. A total of 132 questions, generated from patient answers in a previous study, were reduced to 121 items by content analysis and evaluation by multidisciplinary experts (doctor, nutritionists, and nurses). In our two-step, cross-sectional study, patients, judged by staff to have PSC of CCS, were included, and the questions were randomly allocated to the patients. Questions were evaluated for understandability and triggering emotions, and patients were asked to provide a response using a four-point Likert scale. Subsequently, problematic questions were revised, reformulated, and retested. A total of 20 patients with a variety of tumor types participated. Of the 121 questions, 31 had to be reformulated after Step 1 and were retested in Step 2, after which seven were again evaluated as not being perfectly comprehensible. In Step 1, 22 questions were found to trigger emotions, but no item required remodeling. Item performance using the Likert scale revealed no consistent floor or ceiling effects. Our final pilot question bank comprised 117 questions. The final item bank contains questions that are understood and accepted by the patients. This item bank now needs to be developed into a measurement tool that groups items into domains and can be used in future research studies. Copyright © 2013 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.
A Multidimensional Ideal Point Item Response Theory Model for Binary Data
ERIC Educational Resources Information Center
Maydeu-Olivares, Albert; Hernandez, Adolfo; McDonald, Roderick P.
2006-01-01
We introduce a multidimensional item response theory (IRT) model for binary data based on a proximity response mechanism. Under the model, a respondent at the mode of the item response function (IRF) endorses the item with probability one. The mode of the IRF is the ideal point, or in the multidimensional case, an ideal hyperplane. The model…
Hourihan, Kathleen L; Tullis, Jonathan G
2015-08-01
Although it is well known that organized lists of words (e.g., categories) are recalled better than unrelated lists, little research has examined whether participants can predict how categorical relatedness influences recall. In two experiments, participants studied lists of words that included items from big categories (12 items), small categories (4 items), and unrelated items, and provided immediate JOLs. In Experiment 1, free recall was highest for items from large categories and lowest for unrelated items. Importantly, participants were sensitive to the effects of category size on recall, with JOLs to items from big categories actually increasing over the study list. In Experiment 2, one group of participants was cued to recall all exemplars from the categories in a blocked manner, whereas the other group was cued in a random order. As expected, the random group did not show the recall benefit for big categories over small categories observed in free recall, while the blocked group did. Critically, the pattern of metacognitive judgments closely matched actual cued recall performance. Participants' JOLs were sensitive to the interaction between category size and output order, demonstrating a relatively sophisticated strategy that incorporates the interaction of multiple extrinsic cues in predicting recall.
Toward a Principled Sampling Theory for Quasi-Orders
Ünlü, Ali; Schrepp, Martin
2016-01-01
Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets. PMID:27965601
Toward a Principled Sampling Theory for Quasi-Orders.
Ünlü, Ali; Schrepp, Martin
2016-01-01
Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets.
ERIC Educational Resources Information Center
Bulut, Okan; Lei, Ming; Guo, Qi
2018-01-01
Item positions in educational assessments are often randomized across students to prevent cheating. However, if altering item positions results in any significant impact on students' performance, it may threaten the validity of test scores. Two widely used approaches for detecting position effects -- logistic regression and hierarchical…
Using Kernel Equating to Assess Item Order Effects on Test Scores
ERIC Educational Resources Information Center
Moses, Tim; Yang, Wen-Ling; Wilson, Christine
2007-01-01
This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test…
A Two-Decision Model for Responses to Likert-Type Items
ERIC Educational Resources Information Center
Thissen-Roe, Anne; Thissen, David
2013-01-01
Extreme response set, the tendency to prefer the lowest or highest response option when confronted with a Likert-type response scale, can lead to misfit of item response models such as the generalized partial credit model. Recently, a series of intrinsically multidimensional item response models have been hypothesized, wherein tendency toward…
Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E
2016-04-01
We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
Raykov, Tenko; Marcoulides, George A
2016-04-01
The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
Advertising influences on young children's food choices and parental influence.
Ferguson, Christopher J; Muñoz, Monica E; Medrano, Maria R
2012-03-01
To evaluate whether advertising for food influences choices made by children, the strength of these influences, and whether they might be easily undone by parental influences. Children between 3 and 8 years of age (n=75) were randomized to watch a series of programs with embedded commercials. Some children watched a commercial for a relatively healthy food item, the other children watched a commercial for a less healthy item, both from the same fast-food company. Children were also randomized either to receive parental encouragement to choose the healthy item or to choose whichever item they preferred. Results indicated that children were more likely to choose the advertised item, despite parental input. Parental input only slightly moderated this influence. Although advertising impact on children's food choices is moderate in size, it appears resilient to parental efforts to intervene. Food advertisements directed at children may have a small but meaningful effect on their healthy food choices. Copyright © 2012 Mosby, Inc. All rights reserved.
The development and exploratory analysis of the Back Pain Attitudes Questionnaire (Back-PAQ).
Darlow, Ben; Perry, Meredith; Mathieson, Fiona; Stanley, James; Melloh, Markus; Marsh, Reginald; Baxter, G David; Dowell, Anthony
2014-05-23
To develop an instrument to assess attitudes and underlying beliefs about back pain, and subsequently investigate its internal consistency and underlying structures. The instrument was developed by a multidisciplinary team of clinicians and researchers based on analysis of qualitative interviews with people experiencing acute and chronic back pain. Exploratory analysis was conducted using data from a population-based cross-sectional survey. Qualitative interviews with community-based participants and subsequent postal survey. Instrument development informed by interviews with 12 participants with acute back pain and 11 participants with chronic back pain. Data for exploratory analysis collected from New Zealand residents and citizens aged 18 years and above. 1000 participants were randomly selected from the New Zealand Electoral Roll. 602 valid responses were received. The 34-item Back Pain Attitudes Questionnaire (Back-PAQ) was developed. Internal consistency was evaluated by the Cronbach α coefficient. Exploratory analysis investigated the structure of the data using Principal Component Analysis. The 34-item long form of the scale had acceptable internal consistency (α=0.70; 95% CI 0.66 to 0.73). Exploratory analysis identified five two-item principal components which accounted for 74% of the variance in the reduced data set: 'vulnerability of the back'; 'relationship between back pain and injury'; 'activity participation while experiencing back pain'; 'prognosis of back pain' and 'psychological influences on recovery'. Internal consistency was acceptable for the reduced 10-item scale (α=0.61; 95% CI 0.56 to 0.66) and the identified components (α between 0.50 and 0.78). The 34-item long form of the scale may be appropriate for use in future cross-sectional studies. The 10-item short form may be appropriate for use as a screening tool, or an outcome assessment instrument. Further testing of the 10-item Back-PAQ's construct validity, reliability, responsiveness to change and predictive ability needs to be conducted. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].
Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto
2013-06-01
To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models
ERIC Educational Resources Information Center
Lee, Wooyeol; Cho, Sun-Joo
2017-01-01
Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
ERIC Educational Resources Information Center
Tay, Louis; Vermunt, Jeroen K.; Wang, Chun
2013-01-01
We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
On Multidimensional Item Response Theory: A Coordinate-Free Approach. Research Report. ETS RR-07-30
ERIC Educational Resources Information Center
Antal, Tamás
2007-01-01
A coordinate-free definition of complex-structure multidimensional item response theory (MIRT) for dichotomously scored items is presented. The point of view taken emphasizes the possibilities and subtleties of understanding MIRT as a multidimensional extension of the classical unidimensional item response theory models. The main theorem of the…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
ERIC Educational Resources Information Center
Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.
2016-01-01
Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…
ERIC Educational Resources Information Center
Bennett, Randy Elliot; And Others
1990-01-01
The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Perceived freedom-responsibility covariation among Cypriot adolescents.
Frangou, Georgia; Wilkerson, Keith; McGahan, Joseph R
2008-04-01
Participants were 67 Cypriot adolescents who responded to propositions regarding positive, negative, and noncontingent relations between freedom and responsibility. The authors framed items so that half dealt with freedom given responsibility, and the other half dealt with responsibility given freedom. Results indicated participants were more likely to endorse positive-contingency items than they were negative and noncontingency items when items were framed around freedom given responsibility. However, when items were framed around responsibility given freedom, no such differences emerged. The authors discuss results relative to cultural and sociopolitical differences and similarities between children in Cypress and participants in the United States and implications concerning the present study and previous studies regarding these constructs.
Devanand, D P; Nobler, Mitchell S; Cheng, Jocelyn; Turret, Nancy; Pelton, Gregory H; Roose, Steven P; Sackeim, Harold A
2005-01-01
The authors compared the efficacy and side effects of fluoxetine and placebo in elderly outpatients with dysthymic disorder. Patients were randomly assigned to fluoxetine (20 mg-60 mg/day) or placebo for 12 weeks in a double-blind trial. Of 90 randomized patients, 71 completed the trial. In the intent-to-treat sample, random regression analyses of the Hamilton Rating Scale for Depression (Ham-D; 24-item) and Cornell Dysthymia Rating Scale (CDRS) scores at each visit produced significant time x treatment group interactions favoring the fluoxetine group. Analysis of percentage change in Ham-D scores yielded no effect for treatment group, but a similar analysis of percentage change in CDRS scores yielded a main effect for treatment group, favoring fluoxetine over placebo. In the intent-to-treat sample, response rates were 27.3% for fluoxetine and 19.6% for placebo. In the completer sample, response rates were 37.5% for fluoxetine and 23.1% for placebo. Fluoxetine had limited efficacy in elderly dysthymic patients. The clinical features of elderly dysthymic patients are typically distinct from those of dysthymic disorder in young adults, and the findings suggest that treatments effective for young adult dysthymic patients may not be as useful in elderly dysthymic patients. Further research is needed to identify efficacious treatments for elderly patients with dysthymic disorder, and investigative tools such as electronic/computerized brain scans and neuropsychological testing may help identify the factors that moderate antidepressant treatment response and resistance.
Barbieri, Antoine; Anota, Amélie; Conroy, Thierry; Gourgou-Bourgade, Sophie; Juzyna, Beata; Bonnetain, Franck; Lavergne, Christian; Bascoul-Mollevi, Caroline
2016-07-01
A new longitudinal statistical approach was compared to the classical methods currently used to analyze health-related quality-of-life (HRQoL) data. The comparison was made using data in patients with metastatic pancreatic cancer. Three hundred forty-two patients from the PRODIGE4/ACCORD 11 study were randomly assigned to FOLFIRINOX versus gemcitabine regimens. HRQoL was evaluated using the European Organization for Research and Treatment of Cancer (EORTC) QLQ-C30. The classical analysis uses a linear mixed model (LMM), considering an HRQoL score as a good representation of the true value of the HRQoL, following EORTC recommendations. In contrast, built on the item response theory (IRT), our approach considered HRQoL as a latent variable directly estimated from the raw data. For polytomous items, we extended the partial credit model to a longitudinal analysis (longitudinal partial credit model [LPCM]), thereby modeling the latent trait as a function of time and other covariates. Both models gave the same conclusions on 11 of 15 HRQoL dimensions. HRQoL evolution was similar between the 2 treatment arms, except for the symptoms of pain. Indeed, regarding the LPCM, pain perception was significantly less important in the FOLFIRINOX arm than in the gemcitabine arm. For most of the scales, HRQoL changes over time, and no difference was found between treatments in terms of HRQoL. The use of LMM to study the HRQoL score does not seem appropriate. It is an easy-to-use model, but the basic statistical assumptions do not check. Our IRT model may be more complex but shows the same qualities and gives similar results. It has the additional advantage of being more precise and suitable because of its direct use of raw data. © The Author(s) 2015.
Williams, L M; Debattista, C; Duchemin, A-M; Schatzberg, A F; Nemeroff, C B
2016-05-03
Few reliable predictors indicate which depressed individuals respond to antidepressants. Several studies suggest that a history of early-life trauma predicts poorer response to antidepressant therapy but results are variable and limited in adults. The major goal of the present study was to evaluate the role of early-life trauma in predicting acute response outcomes to antidepressants in a large sample of well-characterized patients with major depressive disorder (MDD). The international Study to Predict Optimized Treatment for Depression (iSPOT-D) is a randomized clinical trial with enrollment from December 2008 to January 2012 at eight academic and nine private clinical settings in five countries. Patients (n=1008) meeting DSM-IV criteria for MDD and 336 matched healthy controls comprised the study sample. Six participants withdrew due to serious adverse events. Randomization was to 8 weeks of treatment with escitalopram, sertraline or venlafaxine with dosage adjusted by the participant's treating clinician per routine clinical practice. Exposure to 18 types of traumatic events before the age of 18 was assessed using the Early-Life Stress Questionnaire. Impact of early-life stressors-overall trauma 'load' and specific type of abuse-on treatment outcomes measures: response: (⩾50% improvement on the 17-item Hamilton Rating Scale for Depression, HRSD17 or on the 16-item Quick Inventory of Depressive Symptomatology-Self-Rated, QIDS_SR16) and remission (score ⩽7 on the HRSD17 and ⩽5 on the QIDS_SR16). Trauma prevalence in MDD was compared with controls. Depressed participants were significantly more likely to report early-life stress than controls; 62.5% of MDD participants reported more than two traumatic events compared with 28.4% of controls. The higher rate of early-life trauma was most apparent for experiences of interpersonal violation (emotional, sexual and physical abuses). Abuse and notably abuse occurring at ⩽7 years of age predicted poorer outcomes after 8 weeks of antidepressants, across the three treatment arms. In addition, the abuses occurring between ages 4 and 7 years differentially predicted the poorest outcome following the treatment with sertraline. Specific types of early-life trauma, particularly physical, emotional and sexual abuse, especially when occurring at ⩽7 years of age are important moderators of subsequent response to antidepressant therapy for MDD.
Williams, L M; Debattista, C; Duchemin, A-M; Schatzberg, A F; Nemeroff, C B
2016-01-01
Few reliable predictors indicate which depressed individuals respond to antidepressants. Several studies suggest that a history of early-life trauma predicts poorer response to antidepressant therapy but results are variable and limited in adults. The major goal of the present study was to evaluate the role of early-life trauma in predicting acute response outcomes to antidepressants in a large sample of well-characterized patients with major depressive disorder (MDD). The international Study to Predict Optimized Treatment for Depression (iSPOT-D) is a randomized clinical trial with enrollment from December 2008 to January 2012 at eight academic and nine private clinical settings in five countries. Patients (n=1008) meeting DSM-IV criteria for MDD and 336 matched healthy controls comprised the study sample. Six participants withdrew due to serious adverse events. Randomization was to 8 weeks of treatment with escitalopram, sertraline or venlafaxine with dosage adjusted by the participant's treating clinician per routine clinical practice. Exposure to 18 types of traumatic events before the age of 18 was assessed using the Early-Life Stress Questionnaire. Impact of early-life stressors—overall trauma ‘load' and specific type of abuse—on treatment outcomes measures: response: (⩾50% improvement on the 17-item Hamilton Rating Scale for Depression, HRSD17 or on the 16-item Quick Inventory of Depressive Symptomatology—Self-Rated, QIDS_SR16) and remission (score ⩽7 on the HRSD17 and ⩽5 on the QIDS_SR16). Trauma prevalence in MDD was compared with controls. Depressed participants were significantly more likely to report early-life stress than controls; 62.5% of MDD participants reported more than two traumatic events compared with 28.4% of controls. The higher rate of early-life trauma was most apparent for experiences of interpersonal violation (emotional, sexual and physical abuses). Abuse and notably abuse occurring at ⩽7 years of age predicted poorer outcomes after 8 weeks of antidepressants, across the three treatment arms. In addition, the abuses occurring between ages 4 and 7 years differentially predicted the poorest outcome following the treatment with sertraline. Specific types of early-life trauma, particularly physical, emotional and sexual abuse, especially when occurring at ⩽7 years of age are important moderators of subsequent response to antidepressant therapy for MDD. PMID:27138798
Estimating Skin Cancer Risk: Evaluating Mobile Computer-Adaptive Testing.
Djaja, Ngadiman; Janda, Monika; Olsen, Catherine M; Whiteman, David C; Chien, Tsair-Wei
2016-01-22
Response burden is a major detriment to questionnaire completion rates. Computer adaptive testing may offer advantages over non-adaptive testing, including reduction of numbers of items required for precise measurement. Our aim was to compare the efficiency of non-adaptive (NAT) and computer adaptive testing (CAT) facilitated by Partial Credit Model (PCM)-derived calibration to estimate skin cancer risk. We used a random sample from a population-based Australian cohort study of skin cancer risk (N=43,794). All 30 items of the skin cancer risk scale were calibrated with the Rasch PCM. A total of 1000 cases generated following a normal distribution (mean [SD] 0 [1]) were simulated using three Rasch models with three fixed-item (dichotomous, rating scale, and partial credit) scenarios, respectively. We calculated the comparative efficiency and precision of CAT and NAT (shortening of questionnaire length and the count difference number ratio less than 5% using independent t tests). We found that use of CAT led to smaller person standard error of the estimated measure than NAT, with substantially higher efficiency but no loss of precision, reducing response burden by 48%, 66%, and 66% for dichotomous, Rating Scale Model, and PCM models, respectively. CAT-based administrations of the skin cancer risk scale could substantially reduce participant burden without compromising measurement precision. A mobile computer adaptive test was developed to help people efficiently assess their skin cancer risk.
The Development and Validation of the Indian Family Violence and Control Scale
Kalokhe, Ameeta S.; Stephenson, Rob; Kelley, Mary E.; Dunkle, Kristin L.; Paranjape, Anuradha; Solas, Vikram; Karve, Latika; del Rio, Carlos; Sahay, Seema
2016-01-01
The high prevalence of domestic violence (DV) among married women in India and associated negative health repercussions highlight the need for effective prevention strategies and tools to measure the efficacy of such interventions. Literature supporting differing manifestations of DV by culture underscores the need for a culturally-tailored scale to more effectively measure DV in the Indian context. We therefore aimed to develop and validate such a tool, the Indian Family Violence and Control Scale (IFVCS), through a mixed-methods study. The psychometric development of IFVCS is herein discussed. After field pre-testing and expert review, a 63-item questionnaire was administered to a random sample of 630 married women from May-July 2013 in Pune, India. The item response theory approach for binary data to explore the IFVCS structure suggested that IFVCS is reliable, with the majority of items having high (>0.5) and significant factor loadings. Concurrent validity, assessed by comparing responses to IFVCS with the validated, abridged Conflict Tactics Scale-2, was high (r = 0.899, p<0.001) as was the construct validity, demonstrated by its significant association with several established DV correlates. Therefore, initial assessment of the IFVCS psychometric properties suggests that it is an effective tool for measuring DV among married women in India and speaks to its capacity for enhancing understanding of DV epidemiology and for evaluating the effectiveness of future DV interventions. PMID:26824611
The Development and Validation of the Indian Family Violence and Control Scale.
Kalokhe, Ameeta S; Stephenson, Rob; Kelley, Mary E; Dunkle, Kristin L; Paranjape, Anuradha; Solas, Vikram; Karve, Latika; del Rio, Carlos; Sahay, Seema
2016-01-01
The high prevalence of domestic violence (DV) among married women in India and associated negative health repercussions highlight the need for effective prevention strategies and tools to measure the efficacy of such interventions. Literature supporting differing manifestations of DV by culture underscores the need for a culturally-tailored scale to more effectively measure DV in the Indian context. We therefore aimed to develop and validate such a tool, the Indian Family Violence and Control Scale (IFVCS), through a mixed-methods study. The psychometric development of IFVCS is herein discussed. After field pre-testing and expert review, a 63-item questionnaire was administered to a random sample of 630 married women from May-July 2013 in Pune, India. The item response theory approach for binary data to explore the IFVCS structure suggested that IFVCS is reliable, with the majority of items having high (>0.5) and significant factor loadings. Concurrent validity, assessed by comparing responses to IFVCS with the validated, abridged Conflict Tactics Scale-2, was high (r = 0.899, p<0.001) as was the construct validity, demonstrated by its significant association with several established DV correlates. Therefore, initial assessment of the IFVCS psychometric properties suggests that it is an effective tool for measuring DV among married women in India and speaks to its capacity for enhancing understanding of DV epidemiology and for evaluating the effectiveness of future DV interventions.
Middle school students' reading comprehension of mathematical texts and algebraic equations
NASA Astrophysics Data System (ADS)
Duru, Adem; Koklu, Onder
2011-06-01
In this study, middle school students' abilities to translate mathematical texts into algebraic representations and vice versa were investigated. In addition, students' difficulties in making such translations and the potential sources for these difficulties were also explored. Both qualitative and quantitative methods were used to collect data for this study: questionnaire and clinical interviews. The questionnaire consisted of two general types of items: (1) selected-response (multiple-choice) items for which the respondent selects from multiple options and (2) open-ended items for which the respondent constructs a response. In order to further investigate the students' strategies while they were translating the given mathematical texts to algebraic equations and vice versa, five randomly chosen (n = 5) students were interviewed. Data were collected in the 2007-2008 school year from 185 middle-school students in five teachers' classrooms in three different schools in the city of Adıyaman, Turkey. After the analysis of data, it was found that students who participated in this study had difficulties in translating the mathematical texts into algebraic equations by using symbols. It was also observed that these students had difficulties in translating the symbolic representations into mathematical texts because of their weak reading comprehension. In addition, finding of this research revealed that students' difficulties in translating the given mathematical texts into symbolic representations or vice versa come from different sources.
Scaling sexual behavior or "sexual risk propensity" among men at risk for HIV in Kisumu, Kenya.
Mattson, C L; Campbell, Richard T; Karabatsos, George; Agot, Kawango; Ndinya-Achola, J O; Moses, Stephen; Bailey, Robert C
2010-02-01
We present a scale to measure sexual risk behavior or "sexual risk propensity" to evaluate risk compensation among men engaged in a randomized clinical trial of male circumcision. This statistical approach can be used to represent each respondent's level of sexual risk behavior as the sum of his responses on multiple dichotomous and rating scale (i.e. ordinal) items. This summary "score" can be used to summarize information on many sexual behaviors or to evaluate changes in sexual behavior with respect to an intervention. Our 18 item scale demonstrated very good reliability (Cronbach's alpha of 0.87) and produced a logical, unidimensional continuum to represent sexual risk behavior. We found no evidence of differential item function at different time points (except for reporting a concurrent partners when comparing 6 and 12 month follow-up visits) or with respect to the language with which the instrument was administered. Further, we established criterion validity by demonstrating a statistically significant association between the risk scale and the acquisition of incident sexually transmitted infections (STIs) at the 6 month follow-up and HIV at the 12 month follow-up visits. This method has broad applicability to evaluate sexual risk behavior in the context of other HIV and STI prevention interventions (e.g. microbicide or vaccine trials), or in response to treatment provision (e.g., anti-retroviral therapy).
ERIC Educational Resources Information Center
Pohl, Steffi; Gräfe, Linda; Rose, Norman
2014-01-01
Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
Multiple-Choice and Short-Answer Exam Performance in a College Classroom
ERIC Educational Resources Information Center
Funk, Steven C.; Dickson, K. Laurie
2011-01-01
The authors experimentally investigated the effects of multiple-choice and short-answer format exam items on exam performance in a college classroom. They randomly assigned 50 students to take a 10-item short-answer pretest or posttest on two 50-item multiple-choice exams in an introduction to personality course. Students performed significantly…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam
2014-01-01
The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…
Portrayal of Depression and Other Mental Illnesses in Australian Nonfiction Media
ERIC Educational Resources Information Center
Francis, Catherine; Pirkis, Jane; Blood, R. Warwick; Dunt, David; Burgess, Philip; Morley, Belinda; Stewart, Andrew
2005-01-01
This study describes Australian media portrayal of mental illnesses, focusing on depression. A random sample of 1,123 items was selected for analysis from a pool of 13,389 nonfictional media items about mental illness collected between March 2000 and February 2001. Depression was portrayed more frequently than other mental illnesses. Items about…
The Prediction of Item Parameters Based on Classical Test Theory and Latent Trait Theory
ERIC Educational Resources Information Center
Anil, Duygu
2008-01-01
In this study, the prediction power of the item characteristics based on the experts' predictions on conditions try-out practices cannot be applied was examined for item characteristics computed depending on classical test theory and two-parameters logistic model of latent trait theory. The study was carried out on 9914 randomly selected students…
Short Form of the Developmental Behaviour Checklist
ERIC Educational Resources Information Center
Taffe, John R.; Gray, Kylie M.; Einfeld, Stewart L.; Dekker, Marielle C.; Koot, Hans M.; Emerson, Eric; Koskentausta, Terhi; Tonge, Bruce J.
2007-01-01
A 24-item short form of the 96-item Developmental Behaviour Checklist was developed to provide a brief measure of Total Behaviour Problem Score for research purposes. The short form Developmental Behaviour Checklist (DBC-P24) was chosen for low bias and high precision from among 100 randomly selected item sets. The DBC-P24 was developed from…
Daher, Aqil Mohammad; Ahmad, Syed Hassan; Winn, Than; Selamat, Mohd Ikhsan
2015-01-01
Few studies have employed the item response theory in examining reliability. We conducted this study to examine the effect of Rating Scale Categories (RSCs) on the reliability and fit statistics of the Malay Spiritual Well-Being Scale, employing the Rasch model. The Malay Spiritual Well-Being Scale (SWBS) with the original six; three and four newly structured RSCs was distributed randomly among three different samples of 50 participants each. The mean age of respondents in the three samples ranged between 36 and 39 years old. The majority was female in all samples, and Islam was the most prevalent religion among the respondents. The predominating race was Malay, followed by Chinese and Indian. The original six RSCs indicated better targeting of 0.99 and smallest model error of 0.24. The Infit Mnsq (mean square) and Zstd (Z standard) of the six RSCs were "1.1"and "-0.1"respectively. The six RSCs achieved the highest person and item reliabilities of 0.86 and 0.85 respectively. These reliabilities yielded the highest person (2.46) and item (2.38) separation indices compared to other the RSCs. The person and item reliability and, to a lesser extent, the fit statistics, were better with the six RSCs compared to the four and three RSCs.
Rasch analysis of the Rosenberg Self-Esteem Scale with African Americans.
Chao, Ruth Chu-Lien; Vidacovich, Courtney; Green, Kathy E
2017-03-01
Effectively diagnosing African Americans' self-esteem has posed an unresolved challenge. To address this assessment issue, we conducted exploratory factor analysis and Rasch analysis to assess the psychometric characteristics of the Rosenberg Self-Esteem Scale (RSES, Rosenberg, 1965) for African American college students. The dimensional structure of the RSES was first identified with the first subsample (i.e., calibration subsample) and then held up under cross-validation with a second subsample (i.e., validation subsample). Exploratory factor analysis and Rasch analysis both supported unidimensionality of the measure, with that finding replicated for a random split of the sample. Response scale use was generally appropriate, items were endorsed at a high level reflecting high levels of self-esteem, and person separation and reliability of person separation were adequate, and reflected results similar to those found in prior research. However, as some categories were infrequently used, we also collapsed scale points and found a slight improvement in scale and item indices. No differential item functioning was found by sex or having received professional assistance versus not; there were no mean score differences by age group, marital status, or year in college. Two items were seen as problematic. Implications for theory and research on multicultural mental health are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Dimensions of the South Oaks Gambling Screen in Finland: A cross-sectional population study.
Salonen, Anne H; Rosenström, Tom; Edgren, Robert; Volberg, Rachel; Alho, Hannu; Castrén, Sari
2017-06-01
The underlying structure of problematic gambling behaviors, such as those assessed by the South Oaks Gambling Screen (SOGS), remain unknown: Can problem gambling be assessed unidimensionally or should multiple qualitatively different dimensions be taken into account, and if so, what do these qualitative dimensions indicate? How significant are the deviations from unidimensionality in practice? A cross-sectional random sample of Finns aged 15-74 (n = 4,484) was drawn from the Population Information Registry and surveyed in 2011-2012. Analyses were conducted using descriptive statistics, Confirmatory factor analysis (CFA) and multidimensional item response theory (MIRT) models. Altogether, 14.9% of the population endorsed at least one of the 20 SOGS items, but nine items had low endorsement rates (≤ 0.2%). CFA and MIRT techniques suggested that individuals differed from each other in two positively correlated (r = 0.70) underlying dimensions: "impact on self primarily" and "impact on others also". This two-factor correlated-factors model can be reinterpreted as a bifactor model with one general gambling-problem factor and two specific factors with similar interpretation as in the correlated-factors model but with non-overlapping items. The two specific factors may provide clinically useful information without extra costs of assessment. © 2017 Scandinavian Psychological Associations and John Wiley & Sons Ltd.
Risk of lymphoma subtypes and dietary habits in a Mediterranean area.
Campagna, Marcello; Cocco, Pierluigi; Zucca, Mariagrazia; Angelucci, Emanuele; Gabbas, Attilio; Latte, Gian Carlo; Uras, Antonella; Rais, Marco; Sanna, Sonia; Ennas, Maria Grazia
2015-12-01
Previous studies have suggested that diet might affect risk of lymphoma subtypes. We investigated risk of lymphoma and its major subtypes associated with diet in the Mediterranean island of Sardinia, Italy. In 1998-2004, 322 incident lymphoma cases and 446 randomly selected population controls participated in a case-control study on lymphoma etiology in central-southern Sardinia. Questionnaire interviews included frequency of intake of 112 food items. Risk associated with individual dietary items and groups thereof was explored by unconditional and polytomous logistic regression analysis, adjusting by age, gender and education. We observed an upward trend in risk of lymphoma (all subtypes combined) and B-cell lymphoma with frequency of intake of well done grilled/roasted chicken (p for trend=0.01), and pizza (p for trend=0.047), Neither adherence to Mediterranean diet nor a frequent intake of its individual components conveyed protection. We detected heterogeneity in risk associated with several food items and groups thereof by lymphoma subtypes although we could not rule out chance as responsible for the observed direct or inverse associations. Adherence to a Mediterranean diet does not seem to convey protection against the development of lymphoma. The association with specific food items might vary by lymphoma subtype. Copyright © 2015 Elsevier Ltd. All rights reserved.
Drake, Keith M; Hargraves, J Lee; Lloyd, Stephanie; Gallagher, Patricia M; Cleary, Paul D
2014-01-01
Objective To examine how different response scales, methods of survey administration, and survey format affect responses to the CAHPS (Consumer Assessment of Healthcare Providers and Systems) Clinician and Group (CG-CAHPS) survey. Study Design A total of 6,500 patients from a university health center were randomly assigned to receive the following: standard 12-page mail surveys using 4-category or 6-category response scales (on CG-CAHPS composite items), telephone surveys using 4-category or 6-category response scales, or four-page mail surveys. Principal Findings A total of 3,538 patients completed surveys. Composite score means and provider-level reliabilities did not differ between respondents receiving 4-category or 6-category response scale surveys or between 12-page and four-page mail surveys. Telephone respondents gave more positive responses than mail respondents. Conclusions We recommend using 4-category response scales and the four-page mail CG-CAHPS survey. PMID:24471975
Drake, Keith M; Hargraves, J Lee; Lloyd, Stephanie; Gallagher, Patricia M; Cleary, Paul D
2014-08-01
To examine how different response scales, methods of survey administration, and survey format affect responses to the CAHPS (Consumer Assessment of Healthcare Providers and Systems) Clinician and Group (CG-CAHPS) survey. A total of 6,500 patients from a university health center were randomly assigned to receive the following: standard 12-page mail surveys using 4-category or 6-category response scales (on CG-CAHPS composite items), telephone surveys using 4-category or 6-category response scales, or four-page mail surveys. A total of 3,538 patients completed surveys. Composite score means and provider-level reliabilities did not differ between respondents receiving 4-category or 6-category response scale surveys or between 12-page and four-page mail surveys. Telephone respondents gave more positive responses than mail respondents. We recommend using 4-category response scales and the four-page mail CG-CAHPS survey. © Health Research and Educational Trust.
ERIC Educational Resources Information Center
DeMars, Christine E.
2012-01-01
In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2-parameter item response theory (IRT) model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and…
Estimation of Item Response Theory Parameters in the Presence of Missing Data
ERIC Educational Resources Information Center
Finch, Holmes
2008-01-01
Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
Examination of Different Item Response Theory Models on Tests Composed of Testlets
ERIC Educational Resources Information Center
Kogar, Esin Yilmaz; Kelecioglu, Hülya
2017-01-01
The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…
A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing
ERIC Educational Resources Information Center
Wang, Chun; Fan, Zhewen; Chang, Hua-Hua; Douglas, Jeffrey A.
2013-01-01
The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Bi-dimensional acculturation and cultural response set in CES-D among Korean immigrants
Kim, Eunjung; Seo, Kumin; Cain, Kevin C.
2017-01-01
This study examined a cultural response set to positive affect items and depressive symptom items in CES-D among 172 Korean immigrants. Bi-dimensional acculturation approach, which considers maintenance of Korean Orientation and adoption of American Orientation, was utilized. As Korean immigrants increased American Orientation, they tended to score higher on positive affect items, while no changes occurred in depressive symptom items. Korean Orientation was not related to either positive affect items or depressive symptom items. Korean immigrants have response bias toward positive affect items in CES-D, which decreases as they adopt more American Orientation. CES-D lacks cultural equivalence for Korean immigrants. PMID:20701420
Vegetable parenting practices scale. Item response modeling analyses
Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom
2015-01-01
Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means.
Polak, Marike; de Rooij, Mark; Heiser, Willem J
2012-09-01
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) criterion of irrelevance, which is a graphical, exploratory method for evaluating the "relevance" of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) concerning opinions of capital punishment.
Item response theory analysis of the Pain Self-Efficacy Questionnaire.
Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K
2017-01-01
The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain. Copyright © 2016 Scandinavian Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.
2014-01-01
Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Item Response Theory Using Hierarchical Generalized Linear Models
ERIC Educational Resources Information Center
Ravand, Hamdollah
2015-01-01
Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Item Response Theory Equating Using Bayesian Informative Priors.
ERIC Educational Resources Information Center
de la Torre, Jimmy; Patz, Richard J.
This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…
Instrument Formatting with Computer Data Entry in Mind.
ERIC Educational Resources Information Center
Boser, Judith A.; And Others
Different formats for four types of research items were studied for ease of computer data entry. The types were: (1) numeric response items; (2) individual multiple choice items; (3) multiple choice items with the same response items; and (4) card column indicator placement. Each of the 13 experienced staff members of a major university's Data…
A comparative study: classification vs. user-based collaborative filtering for clinical prediction.
Hao, Fang; Blair, Rachael Hageman
2016-12-08
Recommender systems have shown tremendous value for the prediction of personalized item recommendations for individuals in a variety of settings (e.g., marketing, e-commerce, etc.). User-based collaborative filtering is a popular recommender system, which leverages an individuals' prior satisfaction with items, as well as the satisfaction of individuals that are "similar". Recently, there have been applications of collaborative filtering based recommender systems for clinical risk prediction. In these applications, individuals represent patients, and items represent clinical data, which includes an outcome. Application of recommender systems to a problem of this type requires the recasting a supervised learning problem as unsupervised. The rationale is that patients with similar clinical features carry a similar disease risk. As the "Big Data" era progresses, it is likely that approaches of this type will be reached for as biomedical data continues to grow in both size and complexity (e.g., electronic health records). In the present study, we set out to understand and assess the performance of recommender systems in a controlled yet realistic setting. User-based collaborative filtering recommender systems are compared to logistic regression and random forests with different types of imputation and varying amounts of missingness on four different publicly available medical data sets: National Health and Nutrition Examination Survey (NHANES, 2011-2012 on Obesity), Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT), chronic kidney disease, and dermatology data. We also examined performance using simulated data with observations that are Missing At Random (MAR) or Missing Completely At Random (MCAR) under various degrees of missingness and levels of class imbalance in the response variable. Our results demonstrate that user-based collaborative filtering is consistently inferior to logistic regression and random forests with different imputations on real and simulated data. The results warrant caution for the collaborative filtering for the purpose of clinical risk prediction when traditional classification is feasible and practical. CF may not be desirable in datasets where classification is an acceptable alternative. We describe some natural applications related to "Big Data" where CF would be preferred and conclude with some insights as to why caution may be warranted in this context.
Ayala, Guadalupe X.; Castro, Iana A.; Pickrel, Julie L.; Lin, Shih-Fan; Williams, Christine B.; Madanat, Hala; Jun, Hee-Jin; Zive, Michelle
2017-01-01
Evidence indicates that restaurant-based interventions have the potential to promote healthier purchasing and improve the nutrients consumed. This study adds to this body of research by reporting the results of a trial focused on promoting the sale of healthy child menu items in independently owned restaurants. Eight pair-matched restaurants that met the eligibility criteria were randomized to a menu-only versus a menu-plus intervention condition. Both of the conditions implemented new healthy child menu items and received support for implementation for eight weeks. The menu-plus condition also conducted a marketing campaign involving employee trainings and promotional materials. Process evaluation data captured intervention implementation. Sales of new and existing child menu items were tracked for 16 weeks. Results indicated that the interventions were implemented with moderate to high fidelity depending on the component. Sales of new healthy child menu items occurred immediately, but decreased during the post-intervention period in both conditions. Sales of existing child menu items demonstrated a time by condition effect with restaurants in the menu-plus condition observing significant decreases and menu-only restaurants observing significant increases in sales of existing child menu items. Additional efforts are needed to inform sustainable methods for improving access to healthy foods and beverages in restaurants. PMID:29194392
Ayala, Guadalupe X; Castro, Iana A; Pickrel, Julie L; Lin, Shih-Fan; Williams, Christine B; Madanat, Hala; Jun, Hee-Jin; Zive, Michelle
2017-12-01
Evidence indicates that restaurant-based interventions have the potential to promote healthier purchasing and improve the nutrients consumed. This study adds to this body of research by reporting the results of a trial focused on promoting the sale of healthy child menu items in independently owned restaurants. Eight pair-matched restaurants that met the eligibility criteria were randomized to a menu-only versus a menu-plus intervention condition. Both of the conditions implemented new healthy child menu items and received support for implementation for eight weeks. The menu-plus condition also conducted a marketing campaign involving employee trainings and promotional materials. Process evaluation data captured intervention implementation. Sales of new and existing child menu items were tracked for 16 weeks. Results indicated that the interventions were implemented with moderate to high fidelity depending on the component. Sales of new healthy child menu items occurred immediately, but decreased during the post-intervention period in both conditions. Sales of existing child menu items demonstrated a time by condition effect with restaurants in the menu-plus condition observing significant decreases and menu-only restaurants observing significant increases in sales of existing child menu items. Additional efforts are needed to inform sustainable methods for improving access to healthy foods and beverages in restaurants.
Consequences of Ignoring Guessing when Estimating the Latent Density in Item Response Theory
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters. In extant Monte Carlo evaluations of RC-IRT, the item response function (IRF) used to fit the data is the same one used to generate the data. The present simulation study examines RC-IRT when the IRF is imperfectly…
ERIC Educational Resources Information Center
Jones, Douglas H.
The progress of modern mental test theory depends very much on the techniques of maximum likelihood estimation, and many popular applications make use of likelihoods induced by logistic item response models. While, in reality, item responses are nonreplicate within a single examinee and the logistic models are only ideal, practitioners make…
Limits on Log Cross-Product Ratios for Item Response Models. Research Report. ETS RR-06-10
ERIC Educational Resources Information Center
Haberman, Shelby J.; Holland, Paul W.; Sinharay, Sandip
2006-01-01
Bounds are established for log cross-product ratios (log odds ratios) involving pairs of items for item response models. First, expressions for bounds on log cross-product ratios are provided for unidimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model.…
Detecting When “Quality of Life” Has Been “Enhanced”: Estimating Change in Quality of Life Ratings
Tractenberg, Rochelle E.; Yumoto, Futoshi; Aisen, Paul S.
2015-01-01
Objective To demonstrate challenges in the estimation of change in quality of life (QOL). Methods Data were taken from a completed clinical trial with negative results. Responses to 13 QOL items were obtained 12 months apart from 258 persons with Alzheimer’s disease (AD) participating in a randomized, placebo-controlled clinical trial with two treatment arms. Two analyses to estimate whether “change” in QOL occurred over 12 months are described. A simple difference (later - earlier) was calculated from total scores (standard approach). A Qualified Change algorithm (novel approach) was applied to each item: differences in ratings were classified as either: improved, worsened, stayed poor, or stayed “positive” (fair, good, excellent). The strengths of evidence supporting a claim that “QOL changed”, derived from the two analyses, were compared by considering plausible alternative explanations for, and interpretations of, results obtained under each approach. Results Total score approach: QOL total scores decreased, on average, in the two treatment (both −1.0, p < 0.05), but not the placebo (=−0.59, p > 0.3) groups. Qualified change approach: Roughly 60% of all change in QOL items was worsening in every arm; 17% - 42% of all subjects experienced change in each item. Conclusions Totalling the subjective QOL item ratings collapses over items, and suggests a potentially misleading “overall” level of change (or no change, as in the placebo arm). Leaving the items as individual components of “quality” of life they were intended to capture, and qualifying the direction and amount of change in each, suggests that at least 17% of any group experienced change on every item, with 60% of all observed change being worsening. Discussion Summarizing QOL item ratings as a total “score” collapses over the face-valid, multi-dimensional components of the construct “quality of life”. Qualified Change provides robust evidence of changes to QOL or “enhancements of” life quality. PMID:26213645
Adler, Lenard; Tanaka, Yoko; Williams, David; Trzepacz, Paula T; Goto, Taro; Allen, Albert J; Escobar, Rodrigo; Upadhyaya, Himanshu P
2014-08-01
We assessed the executive function in adults with attention-deficit/hyperactivity disorder (ADHD) during atomoxetine treatment in a randomized withdrawal trial. Responders (Conners' ADHD Rating Scale-Investigator Rated: Screening Version [adult prompts] ≥30% reduction from baseline and Clinical Global Impression Scale-ADHD Severity score ≤3) to open-label atomoxetine (40-100 mg/d, 12 weeks) entered a 37-week double-blind maintenance period. Patients who maintained response (double-blind atomoxetine for 12 weeks) were randomized 1:1 to atomoxetine (80-100 mg/d, n = 266) or placebo (n = 258) for 25 weeks (total duration, 1 year). Patients and investigators were blinded to response criteria and randomization timing. Change in executive function was assessed with the Behavior Rating Inventory of Executive Function-Adult Version (BRIEF-A) Self-Report and Informant T scores from the randomization to the last-observation-carried-forward postrandomization week 25 (after week 17). Of the enrolled patients (n = 2017; mean age, 33.2 years; male, 58.7%), 524 responders were randomized. During open-label atomoxetine, subscales and individual items on both BRIEF-A questionnaires showed significant improvement (P < 0.001). After randomization, the following T scores improved significantly (P ≤ 0.05) with patients in the atomoxetine group versus those in the placebo group: global executive composite, behavioral regulation, and metacognition indices; plan/organize, working memory, inhibit, task monitor and shift (both BRIEF-A questionnaires), emotional control and organization of materials (BRIEF-A Informant), and initiate (BRIEF-A Self-Report). Atomoxetine significantly improved the executive function compared with placebo, which was maintained for 25 weeks or more; the executive function of patients in the placebo group worsened but did not return to baseline levels after randomization.
Farage, Miranda A.; Rodenberg, Cindy; Chen, Jasmine
2013-01-01
The Farage Quality of Life™ questionnaire (FQoL™) was developed specifically to assess the impact of consumer products. The objective of this investigation was to achieve a Chinese language instrument. The FQoL™ underwent a forward and backward translation, with cognitive testing by 13 subjects. Slight modifications were made to the instrument, and an implementation study was conducted with 800 participants having a mean (±SD) age of 34.22 (±9.28) years. The subjects were randomly assigned to use 1 of 4 ultra absorbency pad products for the length of one menstrual cycle. Three pads (coded N, S and C) were products currently available on the retail market, a fourth (coded M) was an experimental product improvement on Product N. Subjects were asked to complete the FQoL™ once before (T1) and once after (T2) the start of their period, and the Least Square (LS) Means were determined. Within group comparisons for each item and FQoL™ subscale were conducted by comparing the LS Means for T1 vs. T2. Participants using Product N showed the highest number of significant (p<0.05) changes (11 items), demonstrating these subjects felt worse about items mainly in the subdomains for Emotions, Personal Pleasure, and Physical State. Participants using Product C showed significant changes in 7 items mainly in the subdomains for Emotion and Physical State. Participants using Product S and the experimental Product M showed significant changes in only 4 and 3 individual items, respectively. These were not associated with any particular domain or subdomain. Between group comparisons were conducted by comparing the LS Means for the T2 responses for each group. The group using Product N had LS Mean responses that were significantly worse than the group using Product M for the Emotion, Personal Pleasure and Physical State subdomains, the Energy/Vitality domain, and 2 individual items. The Product S group was worse than the Product M group for 2 individual items. The Product C group was worse than the Product M group for the Personal Pleasure and Physical State subdomains and 5 individual items. We found that the Chinese language FQoL™ detected changes in HRQoL during menstruation compared with before menstruation. Further, the measure was able to detect differences among groups of subjects using different menstrual protection products. PMID:23283031
Petscher, Yaacov; Mitchell, Alison M; Foorman, Barbara R
2015-01-01
A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.
Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.
2016-01-01
A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568
Berman, Anne H; Liu, Bojing; Ullman, Sara; Jadbäck, Isabel; Engström, Karin
2016-01-01
The KIDSCREEN-27 is a measure of child and adolescent quality of life (QoL), with excellent psychometric properties, available in child-report and parent-rating versions in 38 languages. This study provides child-reported and parent-rated norms for the KIDSCREEN-27 among Swedish 11-16 year-olds, as well as child-parent agreement. Sociodemographic correlates of self-reported wellbeing and parent-rated wellbeing were also measured. A random population sample consisting of 600 children aged 11-16, 100 per age group and one of their parents (N = 1200), were approached for response to self-reported and parent-rated versions of the KIDSCREEN-27. Parents were also asked about their education, employment status and their own QoL based on the 26-item WHOQOL-Bref. Based on the final sampling pool of 1158 persons, a 34.8% response rate of 403 individuals was obtained, including 175 child-parent pairs, 27 child singleton responders and 26 parent singletons. Gender and age differences for parent ratings and child-reported data were analyzed using t-tests and the Mann-Whitney U-test. Post-hoc Dunn tests were conducted for pairwise comparisons when the p-value for specific subscales was 0.05 or lower. Child-parent agreement was tested item-by-item, using the Prevalence- and Bias-Adjusted Kappa (PABAK) coefficient for ordinal data (PABAK-OS); dimensional and total score agreement was evaluated based on dichotomous cut-offs for lower well-being, using the PABAK and total, continuous scores were evaluated using Bland-Altman plots. Compared to European norms, Swedish children in this sample scored lower on Physical wellbeing (48.8 SE/49.94 EU) but higher on the other KIDSCREEN-27 dimensions: Psychological wellbeing (53.4/49.77), Parent relations and autonomy (55.1/49.99), Social Support and peers (54.1/49.94) and School (55.8/50.01). Older children self-reported lower wellbeing than younger children. No significant self-reported gender differences occurred and parent ratings showed no gender or age differences. Item-by-item child-parent agreement was slight for 14 items (51.9%), fair for 12 items (44.4%), and less than chance for one item (3.7%), but agreement on all dimensions as well as the total score was substantial according to the PABAK-OS. Visual interpretation of the Bland-Altman plot suggested that when children's average wellbeing score was lower parents seemed to rate their children as having relatively higher total wellbeing, but as children's average wellbeing score increased, parents tended to rate their children as having relatively lower total wellbeing. Children living with both parents had higher wellbeing than those who lived with only one parent. Results agreed with European findings that adolescent wellbeing decreases with age but contrasted with some prior Swedish research identifying better wellbeing for boys on all dimensions but Social support and peers. The study suggests the importance of considering children's own reports and not only parental or other informant ratings. Future research should be conducted at regular intervals and encompass larger samples.
Children's Judgments of Inequitable Distributions That Conform to Gender Norms
ERIC Educational Resources Information Center
Conry-Murray, Clare
2015-01-01
To evaluate whether distributions by sex are judged to be unfair, children at ages 6, 8, and 10, and adults (N = 96), judged an authority distributing items to children by using different methods (i.e., randomly or by sex), types of items (i.e., related or unrelated to gender norms), and differences in the equivalency of the items (i.e.,…
The Accuracy of Estimated Total Test Statistics. Final Report.
ERIC Educational Resources Information Center
Kleinke, David J.
In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…
Just, Katja S; Hubrich, Svenja; Schmidtke, Daniel; Scheifes, Andrea; Gerbershagen, Mark U; Wappler, Frank; Grensemann, Joern
2015-04-01
We aimed to test the effectiveness of checklists for emergency procedures on medical staff performance in intensive care crises. This is a prospective single-center randomized trial in a high-fidelity simulation center modeling an intensive care unit (ICU) in a tertiary care hospital in Germany. Teams consisted of 1 ICU resident and 2 ICU nurses (in total, n = 48). All completed 4 crisis scenarios, in which they were randomized to use checklists or to perform without any aid. In 2 of the scenarios, checklists could be used immediately (type 1 scenarios); and for the remaining, some further steps, for example, confirming diagnosis, were required first (type 2 scenarios). Outcome measurements were number of predefined items and time to completion of more than 50% and more than 75% of steps, respectively. When using checklists, participants initiated items faster and more completely according to appropriate treatment guidelines (9 vs 7 items with and without checklists, P < .05). Benefit of checklists was better in type 2 scenarios than in type 1 scenarios (2 vs 1 additional item, P < .05). In type 2 scenarios, time to complete 50% and 75% of items was faster with the use of checklists (P < .005). Use of checklists in ICU crises has a benefit on the completion of critical treatment steps. Within the type 2 scenarios, items were fulfilled faster with checklists. The implementation of checklists for intensive care crises is a promising approach that may improve patients' care. Copyright © 2014 Elsevier Inc. All rights reserved.
Campbell, John; Smith, Patten; Nissen, Sonja; Bower, Peter; Elliott, Marc; Roland, Martin
2009-08-22
The UK National GP Patient Survey is one of the largest ever survey programmes of patients registered to receive primary health care, inviting five million respondents to report their experience of NHS primary healthcare. The third such annual survey (2008/9) involved the development of a new survey instrument. We describe the process of that development, and the findings of an extensive pilot survey in UK primary healthcare. The survey was developed following recognised guidelines and involved expert and stakeholder advice, cognitive testing of early versions of the survey instrument, and piloting of the questionnaire in a cross sectional pilot survey of 1,500 randomly selected individuals from the UK electoral register with two reminders to non-respondents. The questionnaire comprises 66 items addressing a range of aspects of UK primary healthcare. A response rate of 590/1500 (39.3%) was obtained. Non response to individual items ranged from 0.8% to 15.3% (median 5.2%). Participants did not always follow internal branching instructions in the questionnaire although electronic controls allow for correction of this problem in analysis. There was marked skew in the distribution of responses to a number of items indicating an overall favourable impression of care. Principal components analysis of 23 items offering evaluation of various aspects of primary care identified three components (relating to doctor or nurse care, or addressing access to care) accounting for 68.3% of the variance in the sample. The GP Patient Survey has been carefully developed and pilot-tested. Survey findings, aggregated at practice level, will be used to inform the distribution of pound sterling 65 million ($107 million) of UK NHS resource in 2008/9 and this offers the opportunity for NHS service planners and providers to take account of users' experiences of health care in planning and delivering primary healthcare in the UK.
Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items
ERIC Educational Resources Information Center
Cher Wong, Cheow
2015-01-01
Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory
ERIC Educational Resources Information Center
Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi
2016-01-01
High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
ERIC Educational Resources Information Center
Sengul Avsar, Asiye; Tavsancil, Ezel
2017-01-01
This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Rasch Measurement and Item Banking: Theory and Practice.
ERIC Educational Resources Information Center
Nakamura, Yuji
The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Item Response Theory Models for Wording Effects in Mixed-Format Scales
ERIC Educational Resources Information Center
Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu
2015-01-01
Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
Vegetable parenting practices scale: Item response modeling analyses
USDA-ARS?s Scientific Manuscript database
Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...
A HO-IRT Based Diagnostic Assessment System with Constructed Response Items
ERIC Educational Resources Information Center
Yang, Chih-Wei; Kuo, Bor-Chen; Liao, Chen-Huei
2011-01-01
The aim of the present study was to develop an on-line assessment system with constructed response items in the context of elementary mathematics curriculum. The system recorded the problem solving process of constructed response items and transfered the process to response codes for further analyses. An inference mechanism based on artificial…
ERIC Educational Resources Information Center
Sen, Rohini
2012-01-01
In the last five decades, research on the uses of response time has extended into the field of psychometrics (Schnikpe & Scrams, 1999; van der Linden, 2006; van der Linden, 2007), where interest has centered around the usefulness of response time information in item calibration and person measurement within an item response theory. framework.…
A Primer on the 2- and 3-Parameter Item Response Theory Models.
ERIC Educational Resources Information Center
Thornton, Artist
Item response theory (IRT) is a useful and effective tool for item response measurement if used in the proper context. This paper discusses the sets of assumptions under which responses can be modeled while exploring the framework of the IRT models relative to response testing. The one parameter model, or one parameter logistic model, is perhaps…
ERIC Educational Resources Information Center
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-01-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
Formulation and Application of the Hierarchical Generalized Random-Situation Random-Weight MIRID
ERIC Educational Resources Information Center
Hung, Lai-Fa
2011-01-01
The process-component approach has become quite popular for examining many psychological concepts. A typical example is the model with internal restrictions on item difficulty (MIRID) described by Butter (1994) and Butter, De Boeck, and Verhelst (1998). This study proposes a hierarchical generalized random-situation random-weight MIRID. The…
Cao, Rui; Nosofsky, Robert M; Shiffrin, Richard M
2017-05-01
In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across trials. In item-response learning, subjects learn long-term mappings between individual items and target versus foil responses. In category learning, subjects learn high-level codes corresponding to separate sets of items and learn to attach old versus new responses to these category codes. To distinguish between these 2 forms of learning, we tested subjects in categorized varied mapping (CV) conditions: There were 2 distinct categories of items, but the assignment of categories to target versus foil responses varied across trials. In cases involving arbitrary categories, CV performance closely resembled standard varied-mapping performance without categories and departed dramatically from CM performance, supporting the item-response-learning hypothesis. In cases involving prelearned categories, CV performance resembled CM performance, as long as there was sufficient practice or steps taken to reduce trial-to-trial category-switching costs. This pattern of results supports the category-coding hypothesis for sufficiently well-learned categories. Thus, item-response learning occurs rapidly and is used early in CM training; category learning is much slower but is eventually adopted and is used to increase the efficiency of search beyond that available from item-response learning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Sequenced treatment alternatives to relieve depression (STAR*D): rationale and design.
Rush, A John; Fava, Maurizio; Wisniewski, Stephen R; Lavori, Philip W; Trivedi, Madhukar H; Sackeim, Harold A; Thase, Michael E; Nierenberg, Andrew A; Quitkin, Frederic M; Kashner, T Michael; Kupfer, David J; Rosenbaum, Jerrold F; Alpert, Jonathan; Stewart, Jonathan W; McGrath, Patrick J; Biggs, Melanie M; Shores-Wilson, Kathy; Lebowitz, Barry D; Ritz, Louise; Niederehe, George
2004-02-01
STAR*D is a multisite, prospective, randomized, multistep clinical trial of outpatients with nonpsychotic major depressive disorder. The study compares various treatment options for those who do not attain a satisfactory response with citalopram, a selective serotonin reuptake inhibitor antidepressant. The study enrolls 4000 adults (ages 18-75) from both primary and specialty care practices who have not had either a prior inadequate response or clear-cut intolerance to a robust trial of protocol treatments during the current major depressive episode. After receiving citalopram (level 1), participants without sufficient symptomatic benefit are eligible for randomization to level 2 treatments, which entail four switch options (sertraline, bupropion, venlafaxine, cognitive therapy) and three citalopram augment options (bupropion, buspirone, cognitive therapy). Those who receive cognitive therapy (switch or augment options) at level 2 without sufficient improvement are eligible for randomization to one of two level 2A switch options (venlafaxine or bupropion). Level 2 and 2A participants are eligible for random assignment to two switch options (mirtazapine or nortriptyline) and to two augment options (lithium or thyroid hormone) added to the primary antidepressant (citalopram, bupropion, sertraline, or venlafaxine) (level 3). Those without sufficient improvement at level 3 are eligible for level 4 random assignment to one of two switch options (tranylcypromine or the combination of mirtazapine and venlafaxine). The primary outcome is the clinician-rated, 17-item Hamilton Rating Scale for Depression, administered at entry and exit from each treatment level through telephone interviews by assessors masked to treatment assignments. Secondary outcomes include self-reported depressive symptoms, physical and mental function, side-effect burden, client satisfaction, and health care utilization and cost. Participants with an adequate symptomatic response may enter the 12-month naturalistic follow-up phase with brief monthly and more complete quarterly assessments.
Sekely, Angela; Taylor, Graeme J; Bagby, R Michael
2018-03-17
The Toronto Structured Interview for Alexithymia (TSIA) was developed to provide a structured interview method for assessing alexithymia. One drawback of this instrument is the amount of time it takes to administer and score. The current study used item response theory (IRT) methods to analyze data from a large heterogeneous multi-language sample (N = 842) to investigate whether a subset of items could be selected to create a short version of the instrument. Samejima's (1969) graded response model was used to fit the item responses. Items providing maximum information were retained in the short model, resulting in the elimination of 12-items from the original 24-items. Despite the 50% reduction in the number of items, 65.22% of the information was retained. Further studies are needed to validate the short version. A short version of the TSIA is potentially of practical value to clinicians and researchers with time constraints. Copyright © 2018. Published by Elsevier B.V.
Contextual behavior and neural circuits
Lee, Inah; Lee, Choong-Hee
2013-01-01
Animals including humans engage in goal-directed behavior flexibly in response to items and their background, which is called contextual behavior in this review. Although the concept of context has long been studied, there are differences among researchers in defining and experimenting with the concept. The current review aims to provide a categorical framework within which not only the neural mechanisms of contextual information processing but also the contextual behavior can be studied in more concrete ways. For this purpose, we categorize contextual behavior into three subcategories as follows by considering the types of interactions among context, item, and response: contextual response selection, contextual item selection, and contextual item–response selection. Contextual response selection refers to the animal emitting different types of responses to the same item depending on the context in the background. Contextual item selection occurs when there are multiple items that need to be chosen in a contextual manner. Finally, when multiple items and multiple contexts are involved, contextual item–response selection takes place whereby the animal either chooses an item or inhibits such a response depending on item–context paired association. The literature suggests that the rhinal cortical regions and the hippocampal formation play key roles in mnemonically categorizing and recognizing contextual representations and the associated items. In addition, it appears that the fronto-striatal cortical loops in connection with the contextual information-processing areas critically control the flexible deployment of adaptive action sets and motor responses for maximizing goals. We suggest that contextual information processing should be investigated in experimental settings where contextual stimuli and resulting behaviors are clearly defined and measurable, considering the dynamic top-down and bottom-up interactions among the neural systems for contextual behavior. PMID:23675321
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.
Eichenbaum, Alexander E; Marcus, David K; French, Brian F
2017-06-01
This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
Shen, Minxue; Hu, Ming; Sun, Zhenqiu
2015-01-01
Few studies on nutrition and food safety education intervention for students in remote areas of China were reported. The study aimed to assess the questionnaire used to measure the knowledge, attitude and behavior with respect to nutrition and food safety, and to evaluate the effectiveness of a quasi-experimental nutrition and food safety education intervention among primary school students in poverty-stricken counties of west China. Twelve primary schools in west China were randomly selected from Zhen'an of Shaanxi province and Huize of Yunnan province. Six geographically dispersed schools were assigned to the intervention group in a nonrandom way. Knowledge, attitude and behavior questionnaire was developed, assessed, and used for outcome measurement. Students were investigated at baseline and the end of the study respectively without follow-up. Students in intervention group received targeted nutrition and food safety lectures 0.5 hour per week for two semesters. Item response theory was applied for assessment of questionnaire, and a two-level difference-in-differences model was applied to assess the effectiveness of the intervention. The Cronbach's alpha of the original questionnaire was 0.84. According to item response model, 22 knowledge items, 6 attitude items and 8 behavior items showed adequate discrimination parameter and were retained. 378 and 478 valid questionnaires were collected at baseline and the end point. Differences of demographic characteristics were statistically insignificant between the two groups. Two-level difference-in-differences models showed that health education improved 2.92 (95% CI: 2.06-3.78) and 2.92 (95% CI: 1.37-4.47) in knowledge and behavior scores respectively, but had no effect on attitude. The questionnaire met the psychometric standards and showed good internal consistence and discrimination power. The nutrition and food safety education was effective in improving the knowledge and behavior of primary school students in the two poverty-stricken counties of China.
Kirsch, Thomas; Siddiqui, Muhammad Ahmed; Perrin, Paul Clayton; Robinson, W Courtland; Sauer, Lauren M; Doocy, Shannon
2013-07-01
Ascertain recipients' level of satisfaction with humanitarian response efforts. A multi-stage, 80×20 cluster sample randomized survey (1800 households) with probability proportional to size of households affected by the 2010 Indus river floods in Pakistan. The floods affected over 18 million households and led to more than 8 billion USD in response dollars. Less than 20% of respondents reported being satisfied with response, though a small increase in satisfaction levels was observed over the three time periods of interest. Within the first month, receipt of hygiene items, food and household items was most strongly predictive of overall satisfaction. At 6 months, positive receipt of medicines was also highly predictive of satisfaction. The proportion of households reporting unmet needs remained elevated throughout the 6-month period following the floods and varied from 50% to 80%. Needs were best met between 1 and 3 months postflood, when response was at its peak. Unmet needs were the greatest at 6 months, when response was being phased down. Access-limiting issues were rarely captured during routine monitoring and evaluation efforts and seem to be a significant predictor in dissatisfaction with relief efforts, at least in the case of Pakistan, another argument in favor of independent, population-based surveys of this kind. There is also need to better identify and serve those not residing in camps. Direct surveys of the affected population can be used operationally to assess ongoing needs, more appropriately redirect humanitarian resources, and ultimately, judge the overall quality of a humanitarian response.
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
ERIC Educational Resources Information Center
Sahin, Alper; Anil, Duygu
2017-01-01
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
ERIC Educational Resources Information Center
Arce-Ferrer, Alvaro J.; Bulut, Okan
2017-01-01
This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…
ERIC Educational Resources Information Center
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.
2016-01-01
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
ERIC Educational Resources Information Center
Tian, Wei; Cai, Li; Thissen, David; Xin, Tao
2013-01-01
In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…
Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.
2011-01-01
Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Generalizability in Item Response Modeling
ERIC Educational Resources Information Center
Briggs, Derek C.; Wilson, Mark
2007-01-01
An approach called generalizability in item response modeling (GIRM) is introduced in this article. The GIRM approach essentially incorporates the sampling model of generalizability theory (GT) into the scaling model of item response theory (IRT) by making distributional assumptions about the relevant measurement facets. By specifying a random…
Quantifying Local, Response Dependence between Two Polytomous Items Using the Rasch Model
ERIC Educational Resources Information Center
Andrich, David; Humphry, Stephen M.; Marais, Ida
2012-01-01
Models of modern test theory imply statistical independence among responses, generally referred to as "local independence." One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation as a process in the dichotomous Rasch model,…
Using Response Times for Item Selection in Adaptive Testing
ERIC Educational Resources Information Center
van der Linden, Wim J.
2008-01-01
Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the…
The Influence of Item Response Indecision on the Self-Directed Search
ERIC Educational Resources Information Center
Sampson, James P., Jr.; Shy, Jonathan D.; Hartley, Sarah Lucas; Reardon, Robert C.; Peterson, Gary W.
2009-01-01
Students (N = 247) responded to Self-Directed Search (SDS) per the standard response format and were also instructed to record a question mark (?) for items about which they were uncertain (item response indecision [IRI]). The initial responses of the 114 participants with a (?) were then reversed and a second SDS summary code was obtained and…
Probabilistic motor sequence learning in a virtual reality serial reaction time task.
Sense, Florian; van Rijn, Hedderik
2018-01-01
The serial reaction time task is widely used to study learning and memory. The task is traditionally administered by showing target positions on a computer screen and collecting responses using a button box or keyboard. By comparing response times to random or sequenced items or by using different transition probabilities, various forms of learning can be studied. However, this traditional laboratory setting limits the number of possible experimental manipulations. Here, we present a virtual reality version of the serial reaction time task and show that learning effects emerge as expected despite the novel way in which responses are collected. We also show that response times are distributed as expected. The current experiment was conducted in a blank virtual reality room to verify these basic principles. For future applications, the technology can be used to modify the virtual reality environment in any conceivable way, permitting a wide range of previously impossible experimental manipulations.
Wheaton, Michael G; Galfalvy, Hanga; Steinman, Shari A; Wall, Melanie M; Foa, Edna B; Simpson, H Blair
2016-10-01
Exposure and response prevention (EX/RP) is an evidence-based treatment for obsessive-compulsive disorder (OCD), yet not all patients achieve wellness with EX/RP. The degree to which patients adhere to EX/RP procedures outside of sessions has been found to predict therapy outcomes, including who achieves post-treatment wellness. We sought to investigate which components of treatment adherence most relate to outcome and to develop adherence benchmarks to identify who does and does not become well to provide clinicians with prognostic tools. Adherence data came from 37 adult patients with DSM-IV OCD who received 17 sessions of EX/RP as part of a randomized controlled trial of augmentation strategies for incomplete response to serotonin reuptake inhibitors (SRIs). Therapists rated between-session patient adherence at each exposure session by quantifying: 1) the quantity of homework exposures attempted; 2) the quality of attempted exposures; and 3) the degree of success with response prevention. Each adherence item significantly correlated with post-treatment OCD severity. Success with response prevention proved particularly strongly linked to therapy outcome. Time course analysis of this item accurately identified, relatively early in treatment, who would achieve post-treatment wellness. These data provide an efficient method for differentiating between those patients who will and will not achieve wellness after EX/RP augmentation of SRIs. Limitations and clinical implications of the current findings are discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kounali, Daphne Z; Button, Katherine S; Lewis, Glyn; Ades, Anthony E
2016-09-01
We present a meta-analytic method that combines information on treatment effects from different instruments from a network of randomized trials to estimate instrument relative responsiveness. Five depression-test instruments [Beck Depression Inventory (BDI I/II), Patient Health Questionnaire (PHQ9), Hamilton Rating for Depression 17 and 24 items, Montgomery-Asberg Depression Rating] and three generic quality of life measures [EuroQoL (EQ-5D), SF36 mental component summary (SF36 MCS), and physical component summary (SF36 PCS)] were compared. Randomized trials of treatments for depression reporting outcomes on any two or more of these instruments were identified. Information on the within-trial ratios of standardized treatment effects was pooled across the studies to estimate relative responsiveness. The between-instrument ratios of standardized treatment effects vary across trials, with a coefficient of variation of 13% (95% credible interval: 6%, 25%). There were important differences between the depression measures, with PHQ9 being the most responsive instrument and BDI the least. Responsiveness of the EQ-5D and SF36 PCS was poor. SF36 MCS performed similarly to depression instruments. Information on relative responsiveness of several test instruments can be pooled across networks of trials reporting at least two outcomes, allowing comparison and ranking of test instruments that may never have been compared directly. Copyright © 2016 Elsevier Inc. All rights reserved.
Improving measurement of injection drug risk behavior using item response theory.
Janulis, Patrick
2014-03-01
Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
Measuring sexual orientation in adolescent health surveys: evaluation of eight school-based surveys.
Saewyc, Elizabeth M; Bauer, Greta R; Skay, Carol L; Bearinger, Linda H; Resnick, Michael D; Reis, Elizabeth; Murphy, Aileen
2004-10-01
To examine the performance of various items measuring sexual orientation within 8 school-based adolescent health surveys in the United States and Canada from 1986 through 1999. Analyses examined nonresponse and unsure responses to sexual orientation items compared with other survey items, demographic differences in responses, tests for response set bias, and congruence of responses to multiple orientation items; analytical methods included frequencies, contingency tables with Chi-square, and ANOVA with least significant differences (LSD)post hoc tests; all analyses were conducted separately by gender. In all surveys, nonresponse rates for orientation questions were similar to other sexual questions, but not higher; younger students, immigrants, and students with learning disabilities were more likely to skip items or select "unsure." Sexual behavior items had the lowest nonresponse, but fewer than half of all students reported sexual behavior, limiting its usefulness for indicating orientation. Item placement in the survey, wording, and response set bias all appeared to influence nonresponse and unsure rates. Specific recommendations include standardizing wording across future surveys, and pilot testing items with diverse ages and ethnic groups of teens before use. All three dimensions of orientation should be assessed where possible; when limited to single items, sexual attraction may be the best choice. Specific wording suggestions are offered for future surveys.
Screening for Moral Injury: The Moral Injury Symptom Scale - Military Version Short Form.
Koenig, Harold G; Ames, Donna; Youssef, Nagy A; Oliver, John P; Volk, Fred; Teng, Ellen J; Haynes, Kerry; Erickson, Zachary D; Arnold, Irina; O'Garo, Keisha; Pearce, Michelle
2018-03-26
To develop a short form (SF) of the 45-item multidimensional Moral Injury Symptom Scale - Military Version (MISS-M) to use when screening for moral injury and monitoring treatment response in veterans and active duty military with PTSD. A total of 427 veterans and active duty military with PTSD symptoms were recruited from VA Medical Centers in Augusta, GA; Los Angeles, CA; Durham, NC; Houston, TX; and San Antonio, TX; and from Liberty University, Lynchburg, Virginia. The sample was randomly split in two. In the first half (n = 214), exploratory factor analysis identified the highest loading item on each of the 10 MISS scales (guilt, shame, moral concerns, loss of meaning, difficulty forgiving, loss of trust, self-condemnation, religious struggle, and loss of religious faith) to form the 10-item MISS-M-SF; confirmatory factor analysis was then performed to replicate results in the second half of the sample (n = 213). Internal reliability, test-retest reliability, and convergent, discriminant, and concurrent validity were examined in the overall sample. The study was approved by the institutional review boards and the Research & Development (R&D) Committees at Veterans Administration medical centers in Durham, Los Angeles, Augusta, Houston, and San Antonio, and the Liberty University and Duke University Medical Center institutional review boards. The 10-item MISS-M-SF had a median of 50 and a range of 12-91 (possible range 10-100). Over 70% scored a 9 or 10 (highest possible) on at least one item. Cronbach's alpha was 0.73 (95% CI 0.69-0.76), and test-retest reliability was 0.87 (95% CI 0.79-0.92). Convergent validity with the 45-item MISS-M was r = 0.92. Discriminant validity was demonstrated by relatively weak correlations with social, religious, and physical health constructs (r = 0.21-0.35), and concurrent validity was indicated by strong correlations with PTSD, depression, and anxiety symptoms (r = 0.54-0.58). The MISS-M-SF is a reliable and valid measure of MI symptoms that can be used to screen for MI and monitor response to treatment in veterans and active duty military with PTSD.
ERIC Educational Resources Information Center
Li, Yanmei; Li, Shuhong; Wang, Lin
2010-01-01
Many standardized educational tests include groups of items based on a common stimulus, known as "testlets". Standard unidimensional item response theory (IRT) models are commonly used to model examinees' responses to testlet items. However, it is known that local dependence among testlet items can lead to biased item parameter estimates…
Assessing the Utility of Item Response Theory Models: Differential Item Functioning.
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd
The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…
ERIC Educational Resources Information Center
Eignor, Daniel R.; Douglass, James B.
This paper attempts to provide some initial information about the use of a variety of item response theory (IRT) models in the item selection process; its purpose is to compare the information curves derived from the selection of items characterized by several different IRT models and their associated parameter estimation programs. These…
Learners' Perspectives on Authenticity.
ERIC Educational Resources Information Center
Chavez, Monika M. Th.
A survey investigated the attitudes of second language learners about authentic texts, written and oral, used for language instruction. Respondents were 186 randomly-selected university students of German. The students were administered a 212-item questionnaire (the items are appended) that requested information concerning student demographic…
ERIC Educational Resources Information Center
Magnus, Brooke E.; Thissen, David
2017-01-01
Questionnaires that include items eliciting count responses are becoming increasingly common in psychology. This study proposes methodological techniques to overcome some of the challenges associated with analyzing multivariate item response data that exhibit zero inflation, maximum inflation, and heaping at preferred digits. The modeling…
Nested Logit Models for Multiple-Choice Item Response Data
ERIC Educational Resources Information Center
Suh, Youngsuk; Bolt, Daniel M.
2010-01-01
Nested logit item response models for multiple-choice data are presented. Relative to previous models, the new models are suggested to provide a better approximation to multiple-choice items where the application of a solution strategy precedes consideration of response options. In practice, the models also accommodate collapsibility across all…
The Dutch Identity: A New Tool for the Study of Item Response Models.
ERIC Educational Resources Information Center
Holland, Paul W.
1990-01-01
The Dutch Identity is presented as a useful tool for expressing the basic equations of item response models that relate the manifest probabilities to the item response functions and the latent trait distribution. Ways in which the identity may be exploited are suggested and illustrated. (SLD)
Ghalichi, Leila; Mohammad, Kazem; Majdzadeh, Reza; Hoseini, Mostafa; Pournik, Omid; Nedjat, Saharnaz
2012-01-01
Background: Residence characteristics can affect health of residents. This paper reports the development of an instrument assessing these aspects of neighborhoods. Materials and Methods: Literature search and focus group discussions with residents were carried out and relevant items were extracted. Five experts reviewed and commented on the items. An observation instrument with 54 items was composed and completed by two independent observers in 20 randomly selected locations. Due to lack of acceptable reliability in some items, the checklist was revised. The new 22-items checklist in four categories (general characteristics, public green area characteristics, access to services and undesirable features) was completed by two independent trained observers in 28 randomly selected locations. Results: The items in the final checklist had kappa statistics ranging from 0.63 to 1, with an exception of the item assessing “presence of beggars, homeless or working/street children”, with kappa as low as 0.27 due to variability of their presence in different times. Average Kappa statistics was 0.78 for general characteristics, 0.79 for public green area characteristics, 0.84 for access to services, and 0.54 for undesirable features. Conclusion: Neighborhood and health observation instrument seems to have good reliability in city of Tehran. It can probably be used in other large cities of Iran and similar cities elsewhere. PMID:23626633
Item response theory analysis of the mechanics baseline test
NASA Astrophysics Data System (ADS)
Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.
2012-02-01
Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments
ERIC Educational Resources Information Center
Wolkowitz, Amanda A.; Skorupski, William P.
2013-01-01
When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…
NASA Astrophysics Data System (ADS)
Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth
The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.
Currell, Siobhan; Christodoulides, Thomas; Siitarinen, Jonna; Dudley, Robert
2016-07-01
Randomized controlled trials have established that cognitive behavioural therapy (CBT) is effective in helping people with psychosis, though there is enormous variability in outcome. It is not clear what patient factors contribute to good outcomes. In fact, most studies considering client factors have excluded people with psychosis. It is clinicians who are deciding who is likely to benefit from CBT for psychosis (CBTp), though little is understood in terms of their views on who benefits from CBTp. This study investigated clinicians' views on client characteristics that influence outcome in CBTp. A Q-set of 61 client characteristics was developed from a literature search and interviews with clinicians experienced in working with CBT and/or psychosis. Twenty-one participants (familiar with psychosis and CBT through education, profession, practice or knowledge) rated the items based on their importance in effecting a positive outcome, on a forced normal distribution. 21 completed Q-sorts yielded four factors, named as: acceptance and application of the cognitive model; attending to the present; secure base; meaningful active collaboration. Items regarding therapeutic alliance were highly endorsed throughout all factors. Some empirically-based items were not endorsed, although overall, clinician responses were consistent with prior research.
Validation of a short qualitative food frequency list used in several German large scale surveys.
Winkler, G; Döring, A
1998-09-01
Our study aimed to test the validity of a short, qualitative food frequency list (FFL) used in several German large scale surveys. In the surveys of the MONICA project Augsburg, the FFL was used in randomly selected adults. In 1984/85, a dietary survey with 7-day records (DR) was conducted within the subsample of men aged 45 to 64 (response 70%). The 899 DR were used to validate the FFL. Mean weekly food intake frequency and mean daily food intake were compared and Spearman rank order correlation coefficients and classification into tertiles with values of the statistic Kappa were calculated. Spearman correlations range between 0.15 for the item "Other sweets (candies, compote)" and 0.60 for the items "Curds, yoghurt, sour milk", "Milk including butter milk" and "Mineral water"; values for statistic Kappa vary between 0.04 ("White bread, brown bread, crispbread") and 0.41 ("Flaked oats, muesli, cornflakes" and "milk including butter milk"). With the exception of two items, FFL data can be used for analysis on group level. Analysis on individual level should be done with caution. It seems, as if some food groups are generally easier to ask for in FFL than others.
Park, Jong Cook; Kim, Kwang Sig
2012-03-01
The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Dalal, Anand A; Nelson, Lauren; Gilligan, Theresa; McLeod, Lori; Lewis, Sandy; DeMuro-Mercon, Carla
2011-01-01
The goal of this study was to provide recommended steps to assess measurement comparability using a crossover study design and to demonstrate these steps using a short patient-reported outcome (PRO) instrument as an example. The example PRO instrument was administered via paper, Web, interactive voice response system, and interview; a randomized crossover design was used to gather data across the multiple administration types. Participants completed the PRO instrument, demographic and health questions, and a short preference questionnaire. Evaluation included comparisons of the item-level responses and agreement, comparison of mean scale scores, score classifications, and questions designed to collect usability and administration preference. Here the authors provide a four-step evaluation guide to evaluate measurement comparability and illustrate these steps using a case-finding tool. In the example, item-level kappa statistics between the paper and the alternate versions ranged from good to excellent, intraclass correlation coefficient for mean scores were above 0.70, and the rate of disagreement ranged from 2% to 14%. In addition, although participants had an administration preference, they reported few difficulties with the versions they were assigned. The steps described in this article provide a guide for evaluating whether to combine scores across administration versions to simplify analyses and interpretation under a crossover design. The guide recommends the investigation of item-level responses, summary scores, and participant usability/preference when comparing versions, with each step providing unique information to support comprehensive evaluation and informed decisions regarding whether to combine data. Copyright © 2011 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Maurer, Marcus; Mathias, Susan D; Crosby, Ross D; Rajput, Yamina; Zazzali, James L
2018-03-19
Chronic spontaneous urticaria (CSU), also known as chronic idiopathic urticaria (CIU), may produce hives, itch, and angioedema. The Urticaria Activity and Impact Measure (U-AIM) is a newly developed 9-item patient-reported measure designed for use in routine clinical practice to assess CSU activity and impact over the previous 7 days. To evaluate validity, responsiveness, and clinically meaningful change of the U-AIM. Data from a 24-week open-label single-arm period of a randomized, placebo-controlled study of omalizumab were used to assess the psychometric properties of U-AIM items for itch, hives, and angioedema. 206 patients (75% female, mean age 44.6 years) were enrolled. At baseline, U-AIM results included prevalent severe itch (55%) and >12 hives (67%), angioedema (15%), and bother by itch (84%), hives (84%), and angioedema (49%). Urticaria Patient Daily Diary (UPDD) mean weekly scores were 15.4 (itch severity), 16.8 (number of hives), and 32.2 (Urticaria Activity Score [UAS7]). At baseline, Weeks 12 and 24, U-AIM itch and hives items and UAS7 proxy scores (the sum of itch severity and number of hives over 7 days) demonstrated strong correlation coefficients with their corresponding measures from the UPDD (itch severity: 0.634-0.806; hives number: 0.735-0.843; UAS7 proxy: 0.724-0.852). Changes in U-AIM scores differentiated patients by their perspective of symptom improvement. Meaningful change thresholds were established for itch severity and number of hives scores (0.8-1.0 for both) and the UAS7 proxy score (10.5-12.5). The U-AIM is valid and responsive to change, and may help clinicians monitor CSU activity and track treatment effectiveness. Copyright © 2018. Published by Elsevier Inc.
Theta oscillations promote temporal sequence learning.
Crivelli-Decker, Jordan; Hsieh, Liang-Tien; Clarke, Alex; Ranganath, Charan
2018-05-17
Many theoretical models suggest that neural oscillations play a role in learning or retrieval of temporal sequences, but the extent to which oscillations support sequence representation remains unclear. To address this question, we used scalp electroencephalography (EEG) to examine oscillatory activity over learning of different object sequences. Participants made semantic decisions on each object as they were presented in a continuous stream. For three "Consistent" sequences, the order of the objects was always fixed. Activity during Consistent sequences was compared to "Random" sequences that consisted of the same objects presented in a different order on each repetition. Over the course of learning, participants made faster semantic decisions to objects in Consistent, as compared to objects in Random sequences. Thus, participants were able to use sequence knowledge to predict upcoming items in Consistent sequences. EEG analyses revealed decreased oscillatory power in the theta (4-7 Hz) band at frontal sites following decisions about objects in Consistent sequences, as compared with objects in Random sequences. The theta power difference between Consistent and Random only emerged in the second half of the task, as participants were more effectively able to predict items in Consistent sequences. Moreover, we found increases in parieto-occipital alpha (10-13 Hz) and beta (14-28 Hz) power during the pre-response period for objects in Consistent sequences, relative to objects in Random sequences. Linear mixed effects modeling revealed that single trial theta oscillations were related to reaction time for future objects in a sequence, whereas beta and alpha oscillations were only predictive of reaction time on the current trial. These results indicate that theta and alpha/beta activity preferentially relate to future and current events, respectively. More generally our findings highlight the importance of band-specific neural oscillations in the learning of temporal order information. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)
ERIC Educational Resources Information Center
Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn
2018-01-01
The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Measuring Student Learning with Item Response Theory
ERIC Educational Resources Information Center
Lee, Young-Jin; Palazzo, David J.; Warnakulasooriya, Rasil; Pritchard, David E.
2008-01-01
We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT) to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics…
Higher-Order Item Response Models for Hierarchical Latent Traits
ERIC Educational Resources Information Center
Huang, Hung-Yu; Wang, Wen-Chung; Chen, Po-Hsi; Su, Chi-Ming
2013-01-01
Many latent traits in the human sciences have a hierarchical structure. This study aimed to develop a new class of higher order item response theory models for hierarchical latent traits that are flexible in accommodating both dichotomous and polytomous items, to estimate both item and person parameters jointly, to allow users to specify…
Evaluating Item Fit for Multidimensional Item Response Models
ERIC Educational Resources Information Center
Zhang, Bo; Stone, Clement A.
2008-01-01
This research examines the utility of the s-x[superscript 2] statistic proposed by Orlando and Thissen (2000) in evaluating item fit for multidimensional item response models. Monte Carlo simulation was conducted to investigate both the Type I error and statistical power of this fit statistic in analyzing two kinds of multidimensional test…
An Item Response Theory Model for Test Bias.
ERIC Educational Resources Information Center
Shealy, Robin; Stout, William
This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
NASA Astrophysics Data System (ADS)
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-12-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Calorimetry of low mass Pu239 items
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cremers, Teresa L; Sampson, Thomas E
2010-01-01
Calorimetric assay has the reputation of providing the highest precision and accuracy of all nondestructive assay measurements. Unfortunately, non-destructive assay practitioners and measurement consumers often extend, inappropriately, the high precision and accuracy of calorimetric assay to very low mass items. One purpose of this document is to present more realistic expectations for the random uncertainties associated with calorimetric assay for weapons grade plutonium items with masses of 200 grams or less.
ERIC Educational Resources Information Center
Klein, Thomas W.
Steps involved in the item analysis and scaling of the 1990 edition of Forms A and B of the Nevada High School Proficiency Examinations (NHSPEs) are described. Pilot tests of Forms A and B of the 47-item reading and 45-item mathematics tests were each administered to random samples of more than 600 eleventh-grade students. A computer program was…
Macizo, Pedro; Bajo, Teresa; Soriano, Maria Felipa
2006-02-01
Working Memory (WM) span predicts subjects' performance in control executive tasks and, in addition, it has been related to the capacity to inhibit irrelevant information. In this paper we investigate the role of WM span in two executive tasks focusing our attention on inhibitory components of both tasks. High and low span participants recalled targets words rejecting irrelevant items at the same time (Experiment 1) and they generated random numbers (Experiment 2). Results showed a clear relation between WM span and performance in both tasks. In addition, analyses of intrusion errors (Experiment 1) and stereotyped responses (Experiment 2) indicated that high span individuals were able to efficiently use the inhibitory component implied in both tasks. The pattern of data provides support to the relation between WM span and control executive tasks through an inhibitory mechanism.
2010-01-01
Background Fatigue is a common and debilitating symptom in multiple sclerosis (MS). Best-practice guidelines suggest that health services should repeatedly assess fatigue in persons with MS. Several fatigue scales are available but concern has been expressed about their validity. The objective of this study was to examine the reliability and validity of a new scale for MS fatigue, the Neurological Fatigue Index (NFI-MS). Methods Qualitative analysis of 40 MS patient interviews had previously contributed to a coherent definition of fatigue, and a potential 52 item set representing the salient themes. A draft questionnaire was mailed out to 1223 people with MS, and the resulting data subjected to both factor and Rasch analysis. Results Data from 635 (51.9% response) respondents were split randomly into an 'evaluation' and 'validation' sample. Exploratory factor analysis identified four potential subscales: 'physical', 'cognitive', 'relief by diurnal sleep or rest' and 'abnormal nocturnal sleep and sleepiness'. Rasch analysis led to further item reduction and the generation of a Summary scale comprising items from the Physical and Cognitive subscales. The scales were shown to fit Rasch model expectations, across both the evaluation and validation samples. Conclusion A simple 10-item Summary scale, together with scales measuring the physical and cognitive components of fatigue, were validated for MS fatigue. PMID:20152031
Felder-Puig, Rosemarie; Griebler, Robert; Samdal, Oddrun; King, Matthew A; Freeman, John; Duer, Wolfgang
2012-09-01
Given the pressure that educators and policy makers are under to achieve academic standards for students, understanding the relationship of academic success to various aspects of health is important. The international Health Behavior in School-Aged Children (HBSC) questionnaire, being used in 41 countries with different school and grading systems, has contained an item assessing perceived school performance (PSP) since 1986. Whereas the test-retest reliability of this item has been reported previously, we determined its convergent and discriminant validity. This cross-sectional study used anonymous self-report data from Austrian (N = 266), Norwegian (N = 240), and Canadian (N = 9,717) samples. Students were between 10 and 17 years old. PSP responses were compared to the self-reported average school grades in 6 subjects (Austria) or 8 subjects (Norway), respectively, or to a general, 5-category-based appraisal of most recent school grades (Canada). Correlations between PSP and self-reported average school grade scores were between 0.51 and 0.65, representing large effect sizes. Differences between the median school grades in the 4 categories of the PSP item were statistically significant in all 3 samples. The PSP item showed predominantly small associations with some randomly selected HBSC items or scales designed to measure different concepts. The PSP item seems to be a valid and useful question that can distinguish groups of respondents that get good grades at school from those that do not. The meaning of PSP may be context-specific and may have different connotations across student populations from different countries with different school systems. © 2012, American School Health Association.
Shkalim, Eleanor; Ben-Porath, Yossef S; Handel, Richard W; Almagor, Moshe; Tellegen, Auke
2016-01-01
In this study we examined the utility of the Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011; Tellegen & Ben-Porath, 2008/2011) Variable Response Inconsistency-Revised (VRIN-r) and True Response Inconsistency-Revised (TRIN-r) scales, including alternative versions of the scales, in the Hebrew translation of the test. First, we examined the applicability of the U.S. VRIN-r and TRIN-r scales in an Israeli Hebrew-speaking mixed clinical sample, and replaced original item pairs that did not meet the development criteria with substitution item pairs that did. Then, using the Israeli normative sample and a pure clinical sample, we compared the psychometric functioning of the adapted Hebrew-language VRIN-r and TRIN-r scales with that of the original versions of these scales under various conditions of simulated non-content-based (random and fixed) responding. Overall, results showed that the adapted versions of the scales did not improve on the original ones. We therefore recommend using the U.S. VRIN-r and TRIN-r versions, which could also facilitate cross-cultural comparisons.
Desai, N; Taylor-Davies, A; Barnett, D B
1983-01-01
1 The effect of oral doses of diazepam (5 mg) and oxprenolol (80 mg) on short term memory of normal individuals stratified for 'state' anxiety levels has been investigated. 2 Normal student volunteers were stratified into high and low anxiety groups on the basis of responses to the Spielberger 'A-state' scale. Subjects were then randomly administered active drug or placebo and given a form of running memory test performed under a variety of conditions in which variable rate of item presentation and articulatory suppression were used. 3 Diazepam significantly reduced the errors of recall in the running memory test in the high anxiety group and produced a distinct separation of response from the low anxiety group under the test conditions of slow item presentation with articulatory suppression. Oxprenolol had no effect on the short term memory test in either high or low anxiety groups in any experimental test situation. 4 These results are compared to previous work in which generally a deleterious effect of diazepam on short term memory in normal volunteers has been reported. The implications of these findings are further discussed in relationship to possible models of memory function. PMID:6849754
Wilkerson, Keith; McGahan, Joseph R; Stevens, Rick; Williamson, David; Low, Jean
2009-12-01
The goal of this study was to determine whether differential response formats to covariation problems influence corresponding response latencies. The authors provided participants with 3 trials of 16 statements addressing positive and negative relations between freedom and responsibility. The authors framed half of the items around responsibility given freedom and the other half around freedom given responsibility. Response formats comprised true-false, agree-disagree, and yes-no answers as a between-participants factor. Results indicated that the manipulation of response format did not affect latencies. However, latencies differed according to the framing of the items. For items framed around freedom given responsibility, latencies were shorter. In addition, participants were more likely to report a positive relation between freedom and responsibility when items were framed around freedom given responsibility. The authors discuss implications relative to previous research in this area and give recommendations for future research.
Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong
2018-06-01
Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Automatic Scoring of Paper-and-Pencil Figural Responses. Research Report.
ERIC Educational Resources Information Center
Martinez, Michael E.; And Others
Large-scale testing is dominated by the multiple-choice question format. Widespread use of the format is due, in part, to the ease with which multiple-choice items can be scored automatically. This paper examines automatic scoring procedures for an alternative item type: figural response. Figural response items call for the completion or…
Introduction to Multilevel Item Response Theory Analysis: Descriptive and Explanatory Models
ERIC Educational Resources Information Center
Sulis, Isabella; Toland, Michael D.
2017-01-01
Item response theory (IRT) models are the main psychometric approach for the development, evaluation, and refinement of multi-item instruments and scaling of latent traits, whereas multilevel models are the primary statistical method when considering the dependence between person responses when primary units (e.g., students) are nested within…
An Extension of IRT-Based Equating to the Dichotomous Testlet Response Theory Model
ERIC Educational Resources Information Center
Tao, Wei; Cao, Yi
2016-01-01
Current procedures for equating number-correct scores using traditional item response theory (IRT) methods assume local independence. However, when tests are constructed using testlets, one concern is the violation of the local item independence assumption. The testlet response theory (TRT) model is one way to accommodate local item dependence.…
ERIC Educational Resources Information Center
Gadermann, Anne M.; Guhn, Martin; Zumbo, Bruno D.
2012-01-01
This paper provides a conceptual, empirical, and practical guide for estimating ordinal reliability coefficients for ordinal item response data (also referred to as Likert, Likert-type, ordered categorical, or rating scale item responses). Conventionally, reliability coefficients, such as Cronbach's alpha, are calculated using a Pearson…
IRTPRO 2.1 for Windows (Item Response Theory for Patient-Reported Outcomes)
ERIC Educational Resources Information Center
Paek, Insu; Han, Kyung T.
2013-01-01
This article reviews a new item response theory (IRT) model estimation program, IRTPRO 2.1, for Windows that is capable of unidimensional and multidimensional IRT model estimation for existing and user-specified constrained IRT models for dichotomously and polytomously scored item response data. (Contains 1 figure and 2 notes.)
The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence.
ERIC Educational Resources Information Center
Ackerman, Terry A.
One of the important underlying assumptions of all item response theory (IRT) models is that of local independence. This assumption requires that the response to an item on a test not be influenced by the response to any other items. This assumption is often taken for granted, with little or no scrutiny of the response process required to answer…
Choi, Jiae; Jun, Ji Hee; Kang, Byoung Kab; Kim, Kun Hyung; Lee, Myeong Soo
2014-11-05
The aim of this study was to assess the endorsement of reporting guidelines in Korean traditional medicine (TM) journals by reviewing their instructions to authors. We examined the instructions to authors in all of the TM journals published in Korea to assess the appropriate use of reporting guidelines for research studies. The randomized controlled trials (RCTs) published after 2010 in journals that endorsed reporting guidelines were obtained. The reporting quality was assessed using the following guidelines: the 38-item Consolidated Standards of Reporting Trials (CONSORT) statement for non-pharmacological trials (NPT); the 17-item Standards for Reporting Interventions in Clinical Trials of Acupuncture (STRICTA) statement, instead of the 5-item CONSORT for acupuncture trials; and the 22-item CONSORT extensions for herbal medicine trials. The overall item score was calculated and expressed as a proportion.One journal that endorsed reporting guidelines was identified. Twenty-nine RCTs published in this journal after 2010 met the selection criteria. General editorial policies such as those of the International Committee of Medical Journal Editors (ICMJE) were endorsed by 15 journals. In each of the CONSORT-NPT articles, 21.6 to 56.8% of the items were reported, with an average of 11.3 items (29.7%) being reported. In the 24 RCTs (24/29, 82.8%) appraised using the STRICTA items, an average of 10.6 items (62.5%) were addressed, with a range of 41.2 to 100%. For the herbal intervention reporting, 17 items (77.27%) were reported. In the RCT studies before and after the endorsement of CONSORT and STRICTA guidelines by each journal, all of the STRICTA items had significant improvement, whereas the CONSORT-NPT items improved without statistical significance.The endorsement of reporting guidelines is limited in the TM journals in Korea. Authors should adhere to the reporting guidelines, and editorial departments should refer authors to the various reporting guidelines to improve the quality of their articles.
Social cognitive mediators of parent-child sexual communication.
Evans, W Douglas; Blitstein, Jonathan L; Davis, Kevin C
2011-07-01
To test a social cognitive behavior change model and identify mediators of the effects of the Parents Speak Up National Campaign (PSUNC) on parent-child sexual communication. Investigators used 5 waves of data from an online randomized controlled trial. Latent variables were developed based on item response theory and confirmatory factor analysis. Structural equation modeling was used to test mediation. Outcome expectations mediated effects of social norms and self-efficacy on sexual communication. Other hypothesized mediators were not confirmed. Interventions to promote parent-child sexual communication should target outcome expectations. Future research should investigate parents' health information seeking.
Miller, Jordan; MacDermid, Joy C; Walton, David M; Richardson, Julie
2015-10-14
Previous research suggests that self-management programs for people with chronic pain improve knowledge and self-efficacy but result in negligible effects on function. This study will investigate the effectiveness self-management support with pain science education and exercise on improving function for people with chronic pain in comparison to a wait-list control. A secondary objective is to determine which variables help to predict response to the intervention. This study will be an unblinded, randomized controlled trial with 110 participants comparing a 6-week program that includes self-management support, pain science education and exercise to a wait-list control. The primary outcome will be function measured by the Short Musculoskeletal Function Assessment - Dysfunction Index. Secondary outcomes will include pain intensity measured by a numeric pain rating scale, pain interference measured by the eight-item PROMIS pain interference item-bank, how much patients are bothered by functional problems measured by the Short Musculoskeletal Function Assessment - Bother Index, catastrophic thinking measured by the Pain Catastrophizing Scale, fear of movement/re-injury measured by the 11-item Tampa Scale of Kinesiophobia, sense of perceived injustice measured by the Injustice Experience Questionnaire, self-efficacy measured by the Pain Self-Efficacy Questionnaire, pain sensitivity measured by pressure pain threshold and cold sensitivity testing, fatigue measured by a numeric fatigue rating scale, pain neurophysiology knowledge measured by the Neurophysiology of Pain Questionnaire, healthcare utilization measured by number of visits to a healthcare provider, and work status. Assessments will be completed at baseline, 7 and 18 weeks. After the 18-week assessment, the groups will crossover; however, we anticipate carry-over effects with the treatment. Therefore, data from after the crossover will be used to estimate within-group changes and to determine predictors of response that are not for direct between-group comparisons. Mixed effects modelling will be used to determine between-group differences for all primary and secondary outcomes. A series of multiple regression models will be used to determine predictors of treatment response. This study has the potential to inform future self-management programming through evaluation of a self-management program that aims to improve function as the primary outcome. ClinicalTrials.gov NCT02422459 , registered on 13 April 2015.
Differentiating Visual from Response Sequencing during Long-term Skill Learning.
Lynch, Brighid; Beukema, Patrick; Verstynen, Timothy
2017-01-01
The dual-system model of sequence learning posits that during early learning there is an advantage for encoding sequences in sensory frames; however, it remains unclear whether this advantage extends to long-term consolidation. Using the serial RT task, we set out to distinguish the dynamics of learning sequential orders of visual cues from learning sequential responses. On each day, most participants learned a new mapping between a set of symbolic cues and responses made with one of four fingers, after which they were exposed to trial blocks of either randomly ordered cues or deterministic ordered cues (12-item sequence). Participants were randomly assigned to one of four groups (n = 15 per group): Visual sequences (same sequence of visual cues across training days), Response sequences (same order of key presses across training days), Combined (same serial order of cues and responses on all training days), and a Control group (a novel sequence each training day). Across 5 days of training, sequence-specific measures of response speed and accuracy improved faster in the Visual group than any of the other three groups, despite no group differences in explicit awareness of the sequence. The two groups that were exposed to the same visual sequence across days showed a marginal improvement in response binding that was not found in the other groups. These results indicate that there is an advantage, in terms of rate of consolidation across multiple days of training, for learning sequences of actions in a sensory representational space, rather than as motoric representations.
Item response theory scoring and the detection of curvilinear relationships.
Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A
2017-03-01
Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Assessing Construct Validity Using Multidimensional Item Response Theory.
ERIC Educational Resources Information Center
Ackerman, Terry A.
The concept of a user-specified validity sector is discussed. The idea of the validity sector combines the work of M. D. Reckase (1986) and R. Shealy and W. Stout (1991). Reckase developed a methodology to represent an item in a multidimensional latent space as a vector. Item vectors are computed using multidimensional item response theory item…
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2007-01-01
The validation of cognitive attributes required for correct answers on binary test items or tasks has been addressed in previous research through the integration of cognitive psychology and psychometric models using parametric or nonparametric item response theory, latent class modeling, and Bayesian modeling. All previous models, each with their…
ERIC Educational Resources Information Center
Bilir, Mustafa Kuzey
2009-01-01
This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
Item Response Theory and Health Outcomes Measurement in the 21st Century
Hays, Ron D.; Morales, Leo S.; Reise, Steve P.
2006-01-01
Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088
The effect of response modality on immediate serial recall in dementia of the Alzheimer type.
Macé, Anne-Laure; Ergis, Anne-Marie; Caza, Nicole
2012-09-01
Contrary to traditional models of verbal short-term memory (STM), psycholinguistic accounts assume that temporary retention of verbal materials is an intrinsic property of word processing. Therefore, memory performance will depend on the nature of the STM tasks, which vary according to the linguistic representations they engage. The aim of this study was to explore the effect of response modality on verbal STM performance in individuals with dementia of the Alzheimer Type (DAT), and its relationship with the patients' word-processing deficits. Twenty individuals with mild DAT and 20 controls were tested on an immediate serial recall (ISR) task using the same items across two response modalities (oral and picture pointing) and completed a detailed language assessment. When scoring of ISR performance was based on item memory regardless of item order, a response modality effect was found for all participants, indicating that they recalled more items with picture pointing than with oral response. However, this effect was less marked in patients than in controls, resulting in an interaction. Interestingly, when recall of both item and order was considered, results indicated similar performance between response modalities in controls, whereas performance was worse for pointing than for oral response in patients. Picture-naming performance was also reduced in patients relative to controls. However, in the word-to-picture matching task, a similar pattern of responses was found between groups for incorrectly named pictures of the same items. The finding of a response modality effect in item memory for all participants is compatible with the assumption that semantic influences are greater in picture pointing than in oral response, as predicted by psycholinguistic models. Furthermore, patients' performance was modulated by their word-processing deficits, showing a reduced advantage relative to controls. Overall, the response modality effect observed in this study for item memory suggests that verbal STM performance is intrinsically linked with word processing capacities in both healthy controls and individuals with mild DAT, supporting psycholinguistic models of STM.
ERIC Educational Resources Information Center
Wallace, Colin S.; Prather, Edward E.; Duncan, Douglas K.
2012-01-01
This is the third of five papers detailing our national study of general education astronomy students' conceptual and reasoning difficulties with cosmology. In this paper, we use item response theory to analyze students' responses to three out of the four conceptual cosmology surveys we developed. The specific item response theory model we use is…
ERIC Educational Resources Information Center
Flowers, Claudia P.; Raju, Nambury S.; Oshima, T. C.
Current interest in the assessment of measurement equivalence emphasizes two methods of analysis, linear, and nonlinear procedures. This study simulated data using the graded response model to examine the performance of linear (confirmatory factor analysis or CFA) and nonlinear (item-response-theory-based differential item function or IRT-Based…
A Polytomous Item Response Theory Analysis of Social Physique Anxiety Scale
ERIC Educational Resources Information Center
Fletcher, Richard B.; Crocker, Peter
2014-01-01
The present study investigated the social physique anxiety scale's factor structure and item properties using confirmatory factor analysis and item response theory. An additional aim was to identify differences in response patterns between groups (gender). A large sample of high school students aged 11-15 years (N = 1,529) consisting of n =…
Item Response Theory at Subject- and Group-Level. Research Report 90-1.
ERIC Educational Resources Information Center
Tobi, Hilde
This paper reviews the literature about item response models for the subject level and aggregated level (group level). Group-level item response models (IRMs) are used in the United States in large-scale assessment programs such as the National Assessment of Educational Progress and the California Assessment Program. In the Netherlands, these…
ERIC Educational Resources Information Center
Schilling, Stephen G.
2007-01-01
In this paper the author examines the role of item response theory (IRT), particularly multidimensional item response theory (MIRT) in test validation from a validity argument perspective. The author provides justification for several structural assumptions and interpretations, taking care to describe the role he believes they should play in any…
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
ERIC Educational Resources Information Center
Anderson, Daniel; Kahn, Joshua D.; Tindal, Gerald
2017-01-01
Unidimensionality and local independence are two common assumptions of item response theory. The former implies that all items measure a common latent trait, while the latter implies that responses are independent, conditional on respondents' location on the latent trait. Yet, few tests are truly unidimensional. Unmodeled dimensions may result in…
Hobgood, Cherri; Sherwood, Gwen; Frush, Karen; Hollar, David; Maynard, Laura; Foster, Beverly; Sawning, Susan; Woodyard, Donald; Durham, Carol; Wright, Melanie; Taekman, Jeffrey
2010-12-01
The authors conducted a randomised controlled trial of four pedagogical methods commonly used to deliver teamwork training and measured the effects of each method on the acquisition of student teamwork knowledge, skills, and attitudes. The authors recruited 203 senior nursing students and 235 fourth-year medical students (total N = 438) from two major universities for a 1-day interdisciplinary teamwork training course. All participants received a didactic lecture and then were randomly assigned to one of four educational methods didactic (control), audience response didactic, role play and human patient simulation. Student performance was assessed for teamwork attitudes, knowledge and skills using: (a) a 36-item teamwork attitudes instrument (CHIRP), (b) a 12-item teamwork knowledge test, (c) a 10-item standardised patient (SP) evaluation of student teamwork skills performance and (d) a 20-item modification of items from the Mayo High Performance Teamwork Scale (MHPTS). All four cohorts demonstrated an improvement in attitudes (F(1,370) = 48.7, p = 0.001) and knowledge (F(1,353) = 87.3, p = 0.001) pre- to post-test. No educational modality appeared superior for attitude (F(3,370) = 0.325, p = 0.808) or knowledge (F(3,353) = 0.382, p = 0.766) acquisition. No modality demonstrated a significant change in teamwork skills (F(3,18) = 2.12, p = 0.134). Each of the four modalities demonstrated significantly improved teamwork knowledge and attitudes, but no modality was demonstrated to be superior. Institutions should feel free to utilise educational modalities, which are best supported by their resources to deliver interdisciplinary teamwork training.
Cella, David; Escudier, Bernard; Tannir, Nizar M; Powles, Thomas; Donskov, Frede; Peltola, Katriina; Schmidinger, Manuela; Heng, Daniel Y C; Mainwaring, Paul N; Hammers, Hans J; Lee, Jae Lyun; Roth, Bruce J; Marteau, Florence; Williams, Paul; Baer, John; Mangeshkar, Milan; Scheffold, Christian; Hutson, Thomas E; Pal, Sumanta; Motzer, Robert J; Choueiri, Toni K
2018-03-10
Purpose In the phase III METEOR trial ( ClinicalTrials.gov identifier: NCT01865747), 658 previously treated patients with advanced renal cell carcinoma were randomly assigned 1:1 to receive cabozantinib or everolimus. The cabozantinib arm had improved progression-free survival, overall survival, and objective response rate compared with everolimus. Changes in quality of life (QoL), an exploratory end point, are reported here. Patients and Methods Patients completed the 19-item Functional Assessment of Cancer Therapy-Kidney Symptom Index (FKSI-19) and the five-level EuroQol (EQ-5D-5L) questionnaires at baseline and throughout the study. The nine-item FKSI-Disease-Related Symptoms (FKSI-DRS), a subset of FKSI-19, was also investigated. Data were summarized descriptively and by repeated-measures analysis (for which a clinically relevant difference was an effect size ≥ 0.3). Time to deterioration (TTD) was defined as the earlier of date of death, radiographic progressive disease, or ≥ 4-point decrease from baseline in FKSI-DRS. Results The QoL questionnaire completion rates remained ≥ 75% through week 48 in each arm. There was no difference over time for FKSI-19 Total, FKSI-DRS, or EQ-5D data between the cabozantinib and everolimus arms. Among the individual FKSI-19 items, cabozantinib was associated with worse diarrhea and nausea; everolimus was associated with worse shortness of breath. These differences are consistent with the adverse event profile of each drug. Cabozantinib improved TTD overall, with a marked improvement in patients with bone metastases at baseline. Conclusion In patients with advanced renal cell carcinoma, relative to everolimus, cabozantinib generally maintained QoL to a similar extent. Compared with everolimus, cabozantinib extended TTD overall and markedly improved TTD in patients with bone metastases.
Outcomes of an early feeding practices intervention to prevent childhood obesity.
Daniels, Lynne Allison; Mallan, Kimberley Margaret; Nicholson, Jan Maree; Battistutta, Diana; Magarey, Anthea
2013-07-01
The goal of this study was to evaluate outcomes of a universal intervention to promote protective feeding practices that commenced in infancy and aimed to prevent childhood obesity. The NOURISH randomized controlled trial enrolled 698 first-time mothers (mean ± SD age: 30.1 ± 5.3 years) with healthy term infants (51% female) aged 4.3 ± 1.0 months at baseline. Mothers were randomly allocated to self-directed access to usual care or to attend two 6-session interactive group education modules that provided anticipatory guidance on early feeding practices. Outcomes were assessed 6 months after completion of the second information module, 20 months from baseline and when the children were 2 years old. Maternal feeding practices were self-reported by using validated questionnaires and study-developed items. Study-measured child height and weight were used to calculate BMI z scores. Retention at follow-up was 78%. Mothers in the intervention group reported using responsive feeding more frequently on 6 of 9 subscales and 8 of 8 items (all, P ≤ .03) and overall less controlling feeding practices (P < .001). They also more frequently used feeding practices (3 of 4 items; all, P < .01) likely to enhance food acceptance. No statistically significant differences were noted in anthropometric outcomes (BMI z score: P = .10) nor in prevalence of overweight/obesity (control 17.9% vs intervention 13.8%; P = .23). Evaluation of NOURISH data at child age 2 years found that anticipatory guidance on complementary feeding, tailored to developmental stage, increased use by first-time mothers of "protective" feeding practices that potentially support the development of healthy eating and growth patterns in young children.
Sun, Yuxiao; Wang, Jianan; Heine, Lizette; Huang, Wangshan; Wang, Jing; Hu, Nantu; Hu, Xiaohua; Fang, Xiaohui; Huang, Supeng; Laureys, Steven; Di, Haibo
2018-04-12
Behavioral assessment has been acted as the gold standard for the diagnosis of disorders of consciousness (DOC) patients. The item "Functional Object Use" in the motor function sub-scale in the Coma Recovery Scale-Revised (CRS-R) is a key item in differentiating between minimally conscious state (MCS) and emergence from MCS (EMCS). However, previous studies suggested that certain specific stimuli, especially something self-relevant can affect DOC patients' scores of behavioral assessment scale. So, we attempted to find out if personalized objects can improve the diagnosis of EMCS in the assessment of Functional Object Use by comparing the use of patients' favorite objects and other common objects in MCS patients. Twenty-one post-comatose patients diagnosed as MCS were prospectively included. The item "Functional Object Use" was assessed by using personalized objects (e.g., cigarette, paper) and non-personalized objects, which were presented in a random order. The rest assessments were performed following the standard protocol of the CRS-R. The differences between functional uses of the two types of objects were analyzed by the McNemar test. The incidence of Functional Object Use was significantly higher using personalized objects than non-personalized objects in the CRS-R. Five out of the 21 MCS studied patients, who were assessed with non-personalized objects, were re-diagnosed as EMCS with personalized objects (χ 2 = 5, df = 1, p < 0.05). Personalized objects employed here seem to be more effective to elicit patients' responses as compared to non-personalized objects during the assessment of Functional Object Use in DOC patients. Clinical Trials.gov: NCT02988206 ; Date of registration: 2016/12/12.
Malec, James F; Whiteneck, Gale G; Bogner, Jennifer A
2016-02-01
To integrate previous approaches to scoring the Participation Assessment with Recombined Tools-Objective (PART-O) in a unidimensional scale. Retrospective analysis of PART-O data from the Traumatic Brain Injury Model Systems. Community. Data from individuals (N=469) selected randomly from participants who completed 1-year follow-up in the Traumatic Brain Injury Model Systems were used in Rasch model development. The model was subsequently tested on data from additional random samples of similar size at 1-, 2-, 5-, 10-, and >15-year follow-ups. Not applicable. PART-O. After combining items for productivity and social interaction, the initial analysis at 1-year follow-up indicated relatively good fit to the Rasch model (person reliability=.80) but also suggested item misfit and that the 0-to-5 scale used for most items did not consistently show clear separation between rating levels. Reducing item rating scales to 3 levels (except combined and dichotomous items) resolved these issues and demonstrated good item level discrimination, fit, and person reliability (.81), with no evidence of multidimensionality. These results replicated in analyses at each additional follow-up period. Modifications to item scoring for the PART-O resulted in a unidimensional parametric equivalent measure that addresses previous concerns about competing item relations, and it fit the Rasch model consistently across follow-up periods. The person-item map shows a progression toward greater community participation from solitary and dyadic activities, such as leaving the house and having a friend through social and productivity activities, to group activities with others who share interests or beliefs. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea
2016-12-01
To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The Act of Answering Questions Elicited Differentiated Responses in a Concealed Information Test.
Otsuka, Takuro; Mizutani, Mitsuyoshi; Yagi, Akihiro; Katayama, Jun'ichi
2018-04-17
The concealed information test (CIT), a psychophysiological detection of deception test, compares physiological responses between crime-related and crime-unrelated items. In previous studies, whether the act of answering questions affected physiological responses was unclear. This study examined effects of both question-related and answer-related processes on physiological responses. Twenty participants received a modified CIT, in which the interval between presentation of questions and answering them was 27 s. Differentiated respiratory movements and cardiovascular responses between items were observed for both questions (items) and answers, while differentiated skin conductance response was observed only for questions. These results suggest that physiological responses to questions reflected orientation to a crime-related item, while physiological responses during answering reflected inhibition of psychological arousal caused by orienting. Regarding the CIT's accuracy, participants' perception of the questions themselves more strongly influenced physiological responses than answering them. © 2018 American Academy of Forensic Sciences.
Development and validation of an item response theory-based Social Responsiveness Scale short form.
Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T
2017-09-01
Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
Gifford, Katherine A; Liu, Dandan; Romano, Raymond; Jones, Richard N; Jefferson, Angela L
2015-12-01
Subjective cognitive decline (SCD) may indicate unhealthy cognitive changes, but no standardized SCD measurement exists. This pilot study aims to identify reliable SCD questions. 112 cognitively normal (NC, 76±8 years, 63% female), 43 mild cognitive impairment (MCI; 77±7 years, 51% female), and 33 diagnostically ambiguous participants (79±9 years, 58% female) were recruited from a research registry and completed 57 self-report SCD questions. Psychometric methods were used for item-reduction. Factor analytic models assessed unidimensionality of the latent trait (SCD); 19 items were removed with extreme response distribution or trait-fit. Item response theory (IRT) provided information about question utility; 17 items with low information were dropped. Post-hoc simulation using computerized adaptive test (CAT) modeling selected the most commonly used items (n=9 of 21 items) that represented the latent trait well (r=0.94) and differentiated NC from MCI participants (F(1,146)=8.9, p=0.003). Item response theory and computerized adaptive test modeling identified nine reliable SCD items. This pilot study is a first step toward refining SCD assessment in older adults. Replication of these findings and validation with Alzheimer's disease biomarkers will be an important next step for the creation of a SCD screener.
Cross-Cultural Validation of the Quality of Life in Hand Eczema Questionnaire (QOLHEQ).
Ofenloch, Robert F; Oosterhaven, Jart A F; Susitaival, Päivikki; Svensson, Åke; Weisshaar, Elke; Minamoto, Keiko; Onder, Meltem; Schuttelaar, Marie Louise A; Bulbul Baskan, Emel; Diepgen, Thomas L; Apfelbacher, Christian
2017-07-01
The Quality of Life in Hand Eczema Questionnaire (QOLHEQ) is the only instrument assessing disease-specific health-related quality of life in patients with hand eczema. It is available in eight language versions. In this study we assessed if the items of different language versions of the QOLHEQ yield comparable values across countries. An international multicenter study was conducted with participating centers in Finland, Germany, Japan, The Netherlands, Sweden, and Turkey. Methods of item response theory were applied to each subscale to assess differential item functioning for items among countries. Overall, 662 hand eczema patients were recruited into the study. Single items were removed or split according to the item response theory model by country to resolve differential item functioning. After this adjustment, none of the four subscales of the QOLHEQ showed significant misfit to the item response theory model (P < 0.01), and a Person Separation Index of greater than 0.7 showed good internal consistency for each subscale. By adapting the scoring of the QOLHEQ using the methods of item response theory, it was possible to obtain QOLHEQ values that are comparable across countries. Cross-cultural variations in the interpretation of single items were resolved. The QOLHEQ is now ready to be used in international studies assessing the health-related quality of life impact of hand eczema. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Analyzing force concept inventory with item response theory
NASA Astrophysics Data System (ADS)
Wang, Jing; Bao, Lei
2010-10-01
Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Item Response Theory Models for Performance Decline during Testing
ERIC Educational Resources Information Center
Jin, Kuan-Yu; Wang, Wen-Chung
2014-01-01
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.
ERIC Educational Resources Information Center
Kaskowitz, Gary S.; De Ayala, R. J.
2001-01-01
Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
ERIC Educational Resources Information Center
Gu, Fei; Skorupski, William P.; Hoyle, Larry; Kingston, Neal M.
2011-01-01
Ramsay-curve item response theory (RC-IRT) is a nonparametric procedure that estimates the latent trait using splines, and no distributional assumption about the latent trait is required. For item parameters of the two-parameter logistic (2-PL), three-parameter logistic (3-PL), and polytomous IRT models, RC-IRT can provide more accurate estimates…
O'Connor, A M; Sargeant, J M; Gardner, I A; Dickson, J S; Torrence, M E; Dewey, C E; Dohoo, I R; Evans, R B; Gray, J T; Greiner, M; Keefe, G; Lefebvre, S L; Morley, P S; Ramirez, A; Sischo, W; Smith, D R; Snedeker, K; Sofos, J; Ward, M P; Wills, R
2010-01-01
The conduct of randomized controlled trials in livestock with production, health, and food-safety outcomes presents unique challenges that may not be adequately reported in trial reports. The objective of this project was to modify the CONSORT (Consolidated Standards of Reporting Trials) statement to reflect the unique aspects of reporting these livestock trials. A two-day consensus meeting was held on November 18-19, 2008 in Chicago, IL, United States of America, to achieve the objective. Prior to the meeting, a Web-based survey was conducted to identify issues for discussion. The 24 attendees were biostatisticians, epidemiologists, food-safety researchers, livestock-production specialists, journal editors, assistant editors, and associate editors. Prior to the meeting, the attendees completed a Web-based survey indicating which CONSORT statement items may need to be modified to address unique issues for livestock trials. The consensus meeting resulted in the production of the REFLECT (Reporting Guidelines For Randomized Control Trials) statement for livestock and food safety (LFS) and 22-item checklist. Fourteen items were modified from the CONSORT checklist, and an additional sub-item was proposed to address challenge trials. The REFLECT statement proposes new terminology, more consistent with common usage in livestock production, to describe study subjects. Evidence was not always available to support modification to or inclusion of an item. The use of the REFLECT statement, which addresses issues unique to livestock trials, should improve the quality of reporting and design for trials reporting production, health, and food-safety outcomes.
O'Connor, A M; Sargeant, J M; Gardner, I A; Dickson, J S; Torrence, M E; Dewey, C E; Dohoo, I R; Evans, R B; Gray, J T; Greiner, M; Keefe, G; Lefebvre, S L; Morley, P S; Ramirez, A; Sischo, W; Smith, D R; Snedeker, K; Sofos, J; Ward, M P; Wills, R
2010-03-01
The conduct of randomized controlled trials in livestock with production, health and food-safety outcomes presents unique challenges that may not be adequately reported in trial reports. The objective of this project was to modify the CONSORT (Consolidated Standards of Reporting Trials) statement to reflect the unique aspects of reporting these livestock trials. A 2-day consensus meeting was held on 18-19 November 2008 in Chicago, IL, USA, to achieve the objective. Prior to the meeting, a Web-based survey was conducted to identify issues for discussion. The 24 attendees were biostatisticians, epidemiologists, food-safety researchers, livestock-production specialists, journal editors, assistant editors and associate editors. Prior to the meeting, the attendees completed a Web-based survey indicating which CONSORT statement items may need to be modified to address unique issues for livestock trials. The consensus meeting resulted in the production of the REFLECT (Reporting Guidelines for Randomized Control Trials) statement for livestock and food safety and 22-item checklist. Fourteen items were modified from the CONSORT checklist and an additional sub-item was proposed to address challenge trials. The REFLECT statement proposes new terminology, more consistent with common usage in livestock production, to describe study subjects. Evidence was not always available to support modification to or inclusion of an item. The use of the REFLECT statement, which addresses issues unique to livestock trials, should improve the quality of reporting and design for trials reporting production, health and food-safety outcomes.
O'Connor, A M; Sargeant, J M; Gardner, I A; Dickson, J S; Torrence, M E; Dewey, C E; Dohoo, I R; Evans, R B; Gray, J T; Greiner, M; Keefe, G; Lefebvre, S L; Morley, P S; Ramirez, A; Sischo, W; Smith, D R; Snedeker, K; Sofos, J N; Ward, M P; Wills, R
2010-01-01
The conduct of randomized controlled trials in livestock with production, health, and food-safety outcomes presents unique challenges that may not be adequately reported in trial reports. The objective of this project was to modify the CONSORT (Consolidated Standards of Reporting Trials) statement to reflect the unique aspects of reporting these livestock trials. A two-day consensus meeting was held on November 18-19, 2008 in Chicago, Ill, United States of America, to achieve the objective. Prior to the meeting, a Web-based survey was conducted to identify issues for discussion. The 24 attendees were biostatisticians, epidemiologists, food-safety researchers, livestock production specialists, journal editors, assistant editors, and associate editors. Prior to the meeting, the attendees completed a Web-based survey indicating which CONSORT statement items may need to be modified to address unique issues for livestock trials. The consensus meeting resulted in the production of the REFLECT (Reporting Guidelines for Randomized Control Trials) statement for livestock and food safety (LFS) and 22-item checklist. Fourteen items were modified from the CONSORT checklist, and an additional sub-item was proposed to address challenge trials. The REFLECT statement proposes new terminology, more consistent with common usage in livestock production, to describe study subjects. Evidence was not always available to support modification to or inclusion of an item. The use of the REFLECT statement, which addresses issues unique to livestock trials, should improve the quality of reporting and design for trials reporting production, health, and food-safety outcomes.
PREDICTION OF RELIABILITY IN BIOGRAPHICAL QUESTIONNAIRES.
ERIC Educational Resources Information Center
STARRY, ALLAN R.
THE OBJECTIVES OF THIS STUDY WERE (1) TO DEVELOP A GENERAL CLASSIFICATION SYSTEM FOR LIFE HISTORY ITEMS, (2) TO DETERMINE TEST-RETEST RELIABILITY ESTIMATES, AND (3) TO ESTIMATE RESISTANCE TO EXAMINEE FAKING, FOR REPRESENTATIVE BIOGRAPHICAL QUESTIONNAIRES. TWO 100-ITEM QUESTIONNAIRES WERE CONSTRUCTED THROUGH RANDOM ASSIGNMENT BY CONTENT AREA OF 200…
Bitran, Stella; Farabaugh, Amy H; Ameral, Victoria E; LaRocca, Rachel A; Clain, Alisabet J; Fava, Maurizio; Mischoulon, David
2011-01-01
Objective To assess whether early changes in HAM-D-17 anxiety/somatization items predict remission in two controlled studies of hypericum perforatum (St. John’s wort) versus an SSRI for major depressive disorder (MDD). Methods The Hypericum Depression Trial Study Group (NIMH) study randomized 340 subjects to hypericum, sertraline, or placebo for 8 weeks. The MGH study randomized 135 subjects to hypericum, fluoxetine, or placebo for 12 weeks. We examined whether remission was associated with early changes in anxiety/somatization symptoms. Results In the NIMH study, significant associations were observed between remission and early improvement in the anxiety-psychic item (sertraline arm), somatic-gastrointestinal item (hypericum arm), and somatic symptoms-general (placebo arm). None of the three treatment arms of the MGH study showed significant associations between anxiety/somatization symptoms and remission. When both study samples were pooled, we found associations for anxiety-psychic (SSRI arm), somatic-gastrointestinal and hypochondriasis (hypericum arm), and anxiety-psychic and somatic symptoms-general (placebo arm). In the entire sample, remission was associated with improvement in the anxiety-psychic, somatic-gastrointestinal, and somatic symptoms-general items. Conclusions The number and type of anxiety/somatization items associated with remission varied depending on the intervention. Early scrutiny of the HAM-D-17 anxiety/somatization items may help predict remission of MDD. PMID:21278577
Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd
2017-01-01
The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis.
Shedden-Mora, Meike C.; Löwe, Bernd
2017-01-01
Objective The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Methods Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. Results The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. Conclusion The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis. PMID:28771530
Do large-scale assessments measure students' ability to integrate scientific knowledge?
NASA Astrophysics Data System (ADS)
Lee, Hee-Sun
2010-03-01
Large-scale assessments are used as means to diagnose the current status of student achievement in science and compare students across schools, states, and countries. For efficiency, multiple-choice items and dichotomously-scored open-ended items are pervasively used in large-scale assessments such as Trends in International Math and Science Study (TIMSS). This study investigated how well these items measure secondary school students' ability to integrate scientific knowledge. This study collected responses of 8400 students to 116 multiple-choice and 84 open-ended items and applied an Item Response Theory analysis based on the Rasch Partial Credit Model. Results indicate that most multiple-choice items and dichotomously-scored open-ended items can be used to determine whether students have normative ideas about science topics, but cannot measure whether students integrate multiple pieces of relevant science ideas. Only when the scoring rubric is redesigned to capture subtle nuances of student open-ended responses, open-ended items become a valid and reliable tool to assess students' knowledge integration ability.
Accounting for Local Dependence with the Rasch Model: The Paradox of Information Increase.
Andrich, David
Test theories imply statistical, local independence. Where local independence is violated, models of modern test theory that account for it have been proposed. One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation between two items in the dichotomous Rasch model, this paper derives three related implications. First, it formalises how the polytomous Rasch model for an item constituted by summing the scores of the dependent items absorbs the dependence in its threshold structure. Second, it shows that as a consequence the unit when the dependence is accounted for is not the same as if the items had no response dependence. Third, it explains the paradox, known, but not explained in the literature, that the greater the dependence of the constituent items the greater the apparent information in the constituted polytomous item when it should provide less information.
NASA Astrophysics Data System (ADS)
Reynolds, A. M.
2008-07-01
The results of numerical simulations indicate that deterministic walks with inverse-square power-law scaling are a robust emergent property of predators that use chemotaxis to locate randomly and sparsely distributed stationary prey items. It is suggested that chemotactic destructive foraging accounts for the apparent Lévy flight movement patterns of Oxyrrhis marina microzooplankton in still water containing prey items. This challenges the view that these organisms are executing an innate optimal Lévy flight searching strategy. Crucial for the emergence of inverse-square power-law scaling is the tendency of chemotaxis to occasionally cause predators to miss the nearest prey item, an occurrence which would not arise if prey were located through the employment of a reliable cognitive map or if prey location were visually cued and perfect.
Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank
2010-10-01
The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Kent, Justine M; Daly, Ella; Kezic, Iva; Lane, Rosanne; Lim, Pilar; De Smedt, Heidi; De Boer, Peter; Van Nueten, Luc; Drevets, Wayne C; Ceusters, Marc
2016-06-03
This phase 2a, randomized, multicenter, double-blind, proof-of-concept study was designed to evaluate, efficacy, safety and tolerability of JNJ-40411813/ADX71149, a novel metabotropic glutamate 2 receptor positive allosteric modulator as an adjunctive treatment for major depressive disorder (MDD) with significant anxiety symptoms. Eligible patients (18-64 years) had a DSM-IV diagnosis of MDD, Hamilton Depression Rating Scale-17 (HDRS17) score of ≥ 18, HDRS17 anxiety/somatization factor score of ≥ 7, and an insufficient response to current treatment with a selective serotonin reuptake inhibitor or serotonin-norepinephrine reuptake inhibitor. The doubly-randomized, 8-week double-blind treatment phase was comprised of two 4-week periods, from which a combined test statistic was generated, with pre-determined weights assigned to each of the 2 treatment periods. Period 1: patients (n=121) were randomly assigned (1:1) to JNJ-40411813 (n=62; 50mg to 150 mg b.i.d, flexibly dosed) or placebo (n=59); Period 2: placebo-treated patients (n=22) who continued to meet entry severity criteria were re-randomized (1:1) to JNJ-40411813 or placebo, while other patients underwent sham re-randomization and continued on their same treatment. Of 121 randomized patients, 100 patients (82.6%) were completers. No efficacy signal was detected on the primary endpoint, the 6-item Hamilton Anxiety Subscale (HAM-A6, p=0.51). Efficacy signals (based on prespecified 1-sided p<0.20) were evident on several secondary outcome measures of both depression (HDRS17 total score, 6-item subscale of HDRS17 assessing core depressive symptoms [HAM-D6], and Inventory of Depressive Symptomatology [IDS-C30]) and anxiety (HDRS17 anxiety/somatization factor, IDS-C30 anxiety subscale). Although well-tolerated, the results do not suggest efficacy for JNJ-40411813 as an adjunctive treatment for patients with MDD with significant anxious symptoms in the dose range studied. Copyright © 2016 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Kleinke, David J.
Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…
Person Response Functions and the Definition of Units in the Social Sciences
ERIC Educational Resources Information Center
Engelhard, George, Jr.; Perkins, Aminah F.
2011-01-01
Humphry (this issue) has written a thought-provoking piece on the interpretation of item discrimination parameters as scale units in item response theory. One of the key features of his work is the description of an item response theory (IRT) model that he calls the logistic measurement function that combines aspects of two traditions in IRT that…
ERIC Educational Resources Information Center
Raykov, Tenko; Marcoulides, George A.
2016-01-01
The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…
ERIC Educational Resources Information Center
Fu, Jianbin
2016-01-01
The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…
ERIC Educational Resources Information Center
Tsutakawa, Robert K.; Lin, Hsin Ying
Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…
Modeling Answer Change Behavior: An Application of a Generalized Item Response Tree Model
ERIC Educational Resources Information Center
Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim
2017-01-01
We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…
Mallinckrodt, Brent; Tekie, Yacob T
2016-11-01
The Working Alliance Inventory (WAI) has made great contributions to psychotherapy research. However, studies suggest the 7-point response format and 3-factor structure of the client version may have psychometric problems. This study used Rasch item response theory (IRT) to (a) improve WAI response format, (b) compare two brief 12-item versions (WAI-sr; WAI-s), and (c) develop a new 16-item Brief Alliance Inventory (BAI). Archival data from 1786 counseling center and community clients were analyzed. IRT findings suggested problems with crossed category thresholds. A rescoring scheme that combines neighboring responses to create 5- and 4-point scales sharply reduced these problems. Although subscale variance was reduced by 11-26%, rescoring yielded improved reliability and generally higher correlations with therapy process (session depth and smoothness) and outcome measures (residual gain symptom improvement). The 16-item BAI was designed to maximize "bandwidth" of item difficulty and preserve a broader range of WAI sensitivity than WAI-s or WAI-sr. Comparisons suggest the BAI performed better in several respects than the WAI-s or WAI-sr and equivalent to the full WAI on several performance indicators.
2016-01-01
We aimed to validate the Inventory of Complicated Grief (ICG)-Korean version among 1,138 Korean adolescents, representing a response rate of 57% of 1,997 students. Participants completed a set of questionnaires including demographic variables (age, sex, years of education, experience of grief), the ICG, the Children's Depression Inventory (CDI) and the Lifetime Incidence of Traumatic Events-Child (LITE-C). Exploratory factor analysis was performed to determine whether the ICG items indicated complicated grief in Korean adolescents. The internal consistency of the ICG-Korean version was Cronbach's α=0.87. The test-retest reliability for a randomly selected sample of 314 participants in 2 weeks was r=0.75 (P<0.001). Concurrent validity was assessed using a correlation between the ICG total scores and the CDI total scores (r=0.75, P<0.001). The criterion-related validity based on the comparison of ICG total scores between adolescents without complicated grief (1.2±3.7) and adolescent with complicated grief (3.2±6.6) groups was relatively high (t=5.71, P<0.001). The data acquired from the 1,138 students was acceptable for a factor analysis (Kaiser-Meyer-Olkin Measure of Sampling Adequacy=0.911; Bartlett's Test of Sphericity, χ2=13,144.7, P<0.001). After omission of 3 items, the value of Cronbach's α increased from 0.87 for the 19-item ICG-Korean version to 0.93 for the 16-item ICG-Korean version. These results suggest that the ICG is a useful tool in assessing for complicated grief in Korean adolescents. However, the 16-item version of the ICG appeared to be more valid compared to the 19-item version of the ICG. We suggest that the 16-item version of the ICG be used to screen for complicated grief in Korean adolescents. PMID:26770046
Han, Doug Hyun; Lee, Jung Jae; Moon, Duk-Soo; Cha, Myoung-Jin; Kim, Min A; Min, Seonyeong; Yang, Ji Hoon; Lee, Eun Jeong; Yoo, Seo Koo; Chung, Un-Sun
2016-01-01
We aimed to validate the Inventory of Complicated Grief (ICG)-Korean version among 1,138 Korean adolescents, representing a response rate of 57% of 1,997 students. Participants completed a set of questionnaires including demographic variables (age, sex, years of education, experience of grief), the ICG, the Children's Depression Inventory (CDI) and the Lifetime Incidence of Traumatic Events-Child (LITE-C). Exploratory factor analysis was performed to determine whether the ICG items indicated complicated grief in Korean adolescents. The internal consistency of the ICG-Korean version was Cronbach's α=0.87. The test-retest reliability for a randomly selected sample of 314 participants in 2 weeks was r=0.75 (P<0.001). Concurrent validity was assessed using a correlation between the ICG total scores and the CDI total scores (r=0.75, P<0.001). The criterion-related validity based on the comparison of ICG total scores between adolescents without complicated grief (1.2 ± 3.7) and adolescent with complicated grief (3.2 ± 6.6) groups was relatively high (t=5.71, P<0.001). The data acquired from the 1,138 students was acceptable for a factor analysis (Kaiser-Meyer-Olkin Measure of Sampling Adequacy=0.911; Bartlett's Test of Sphericity, χ(2)=13,144.7, P<0.001). After omission of 3 items, the value of Cronbach's α increased from 0.87 for the 19-item ICG-Korean version to 0.93 for the 16-item ICG-Korean version. These results suggest that the ICG is a useful tool in assessing for complicated grief in Korean adolescents. However, the 16-item version of the ICG appeared to be more valid compared to the 19-item version of the ICG. We suggest that the 16-item version of the ICG be used to screen for complicated grief in Korean adolescents.
Schwingshackl, Lukas; Knüppel, Sven; Schwedhelm, Carolina; Hoffmann, Georg; Missbach, Benjamin; Stelmach-Mardas, Marta; Dietrich, Stefan; Eichelmann, Fabian; Kontopantelis, Evangelos; Iqbal, Khalid; Aleksandrova, Krasimira; Lorkowski, Stefan; Leitzmann, Michael F; Kroke, Anja; Boeing, Heiner
2016-11-01
The objective of this study was to develop a scoring system (NutriGrade) to evaluate the quality of evidence of randomized controlled trial (RCT) and cohort study meta-analyses in nutrition research, building upon previous tools and expert recommendations. NutriGrade aims to assess the meta-evidence of an association or effect between different nutrition factors and outcomes, taking into account nutrition research-specific requirements not considered by other tools. In a pretest study, 6 randomly selected meta-analyses investigating diet-disease relations were evaluated with NutriGrade by 5 independent raters. After revision, NutriGrade was applied by the same raters to 30 randomly selected meta-analyses in the same thematic area. The reliability of ratings of NutriGrade items was calculated with the use of a multirater κ, and reliability of the total (summed scores) was calculated with the use of intraclass correlation coefficients (ICCs). The following categories for meta-evidence evaluation were established: high (8-10), moderate (6-7.99), low (4-5.99), and very low (0-3.99). The NutriGrade scoring system (maximum of 10 points) comprises the following items: 1) risk of bias, study quality, and study limitations, 2) precision, 3) heterogeneity, 4) directness, 5) publication bias, 6) funding bias, 7) study design, 8) effect size, and 9) dose-response. The NutriGrade score varied between 2.9 (very low meta-evidence) and 8.8 (high meta-evidence) for meta-analyses of RCTs, and it ranged between 3.1 and 8.8 for meta-analyses of cohort studies. The κ value of the ratings for each scoring item varied from 0.32 (95% CI: 0.22, 0.42) for risk of bias for cohort studies and 0.95 (95% CI: 0.91, 0.99) for study design, with a mean κ of 0.66 (95% CI: 0.53, 0.79). The ICC of the total score was 0.81 (95% CI: 0.69, 0.90). The NutriGrade scoring system showed good agreement and reliability. The initial findings regarding the performance of this newly established scoring system need further evaluation in independent analyses. © 2016 American Society for Nutrition.
Knüppel, Sven; Schwedhelm, Carolina; Hoffmann, Georg; Missbach, Benjamin; Stelmach-Mardas, Marta; Dietrich, Stefan; Eichelmann, Fabian; Kontopanteils, Evangelos; Iqbal, Khalid; Aleksandrova, Krasimira; Lorkowski, Stefan; Leitzmann, Michael F; Kroke, Anja; Boeing, Heiner
2016-01-01
The objective of this study was to develop a scoring system (NutriGrade) to evaluate the quality of evidence of randomized controlled trial (RCT) and cohort study meta-analyses in nutrition research, building upon previous tools and expert recommendations. NutriGrade aims to assess the meta-evidence of an association or effect between different nutrition factors and outcomes, taking into account nutrition research–specific requirements not considered by other tools. In a pretest study, 6 randomly selected meta-analyses investigating diet–disease relations were evaluated with NutriGrade by 5 independent raters. After revision, NutriGrade was applied by the same raters to 30 randomly selected meta-analyses in the same thematic area. The reliability of ratings of NutriGrade items was calculated with the use of a multirater κ, and reliability of the total (summed scores) was calculated with the use of intraclass correlation coefficients (ICCs). The following categories for meta-evidence evaluation were established: high (8–10), moderate (6–7.99), low (4–5.99), and very low (0–3.99). The NutriGrade scoring system (maximum of 10 points) comprises the following items: 1) risk of bias, study quality, and study limitations, 2) precision, 3) heterogeneity, 4) directness, 5) publication bias, 6) funding bias, 7) study design, 8) effect size, and 9) dose-response. The NutriGrade score varied between 2.9 (very low meta-evidence) and 8.8 (high meta-evidence) for meta-analyses of RCTs, and it ranged between 3.1 and 8.8 for meta-analyses of cohort studies. The κ value of the ratings for each scoring item varied from 0.32 (95% CI: 0.22, 0.42) for risk of bias for cohort studies and 0.95 (95% CI: 0.91, 0.99) for study design, with a mean κ of 0.66 (95% CI: 0.53, 0.79). The ICC of the total score was 0.81 (95% CI: 0.69, 0.90). The NutriGrade scoring system showed good agreement and reliability. The initial findings regarding the performance of this newly established scoring system need further evaluation in independent analyses. PMID:28140319
Khorramdel, Lale; von Davier, Matthias
2014-01-01
This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.
Ozaki, Norio; Otsubo, Tempei; Kato, Masaki; Higuchi, Teruhiko; Ono, Hiroaki; Kamijima, Kunitoshi
2015-01-01
Results from this randomized, placebo-controlled study of aripiprazole augmentation to antidepressant therapy (ADT) in Japanese patients with major depressive disorder (MDD) (the Aripiprazole Depression Multicenter Efficacy [ADMIRE] study) revealed that aripiprazole augmentation was superior to ADT alone and was well tolerated. In subgroup analyses, we investigated the influence of demographic- and disease-related factors on the observed responses. We also examined how individual symptom improvement was related to overall improvement in MDD. Data from the ADMIRE study were analyzed. Subgroup analyses were performed on the primary outcome measures: the mean change in the Montgomery-Åsberg Depression Rating Scale (MADRS) total score from the end of selective serotonin reuptake inhibitor (SSRI)/serotonin norepinephrine reuptake inhibitor (SNRI) treatment to the end of the randomized treatment. Changes in the MADRS total scores were consistently greater with aripiprazole than placebo in each of the subgroups. Efficacy was not related to sex, age, number of adequate ADT trials in the current episode, MDD diagnosis, number of depressive episodes, duration of the current episode, age at first depressive episode, time since the first depressive episode, type of SSRI/SNRI, or severity at the end of SSRI/SNRI treatment phase. Compared to placebo, aripiprazole resulted in significant and rapid improvement on seven of the 10 MADRS items, including sadness. These post-hoc analyses indicated that aripiprazole was effective for a variety of Japanese patients with MDD who had exhibited inadequate responses to ADT. Additionally, we suggest that aripiprazole significantly and rapidly improved the core depressive symptoms. © 2014 The Authors. Psychiatry and Clinical Neurosciences © 2014 Japanese Society of Psychiatry and Neurology.
Prisciandaro, James J; Tolliver, Bryan K
2016-11-15
The Young Mania Rating Scale (YMRS) and Montgomery-Asberg Depression Rating Scale (MADRS) are among the most widely used outcome measures for clinical trials of medications for Bipolar Disorder (BD). Nonetheless, very few studies have examined the measurement characteristics of the YMRS and MADRS in individuals with BD using modern psychometric methods. The present study evaluated the YMRS and MADRS in the Systematic Treatment Enhancement Program for BD (STEP-BD) study using Item Response Theory (IRT). Baseline data from 3716 STEP-BD participants were available for the present analysis. The Graded Response Model (GRM) was fit separately to YMRS and MADRS item responses. Differential item functioning (DIF) was examined by regressing a variety of clinically relevant covariates (e.g., sex, substance dependence) on all test items and on the latent symptom severity dimension, within each scale. Both scales: 1) contained several items that provided little or no psychometric information, 2) were inefficient, in that the majority of item response categories did not provide incremental psychometric information, 3) poorly measured participants outside of a narrow band of severity, 4) evidenced DIF for nearly all items, suggesting that item responses were, in part, determined by factors other than symptom severity. Limited to outpatients; DIF analysis only sensitive to certain forms of DIF. The present study provides evidence for significant measurement problems involving the YMRS and MADRS. More work is needed to refine these measures and/or develop suitable alternative measures of BD symptomatology for clinical trials research. Copyright © 2016 Elsevier B.V. All rights reserved.
1984-02-01
measurable impact if changed. The following items were included in the sample: * Mark Zero Items -Low demand insurance items which represent about three...R&D efforts reviewed. The resulting assessment highlighted the generic enabling technologies and cross- cutting R&D projects required to focus current...supplied by spot buys, and which may generate Navy Inventory Control Numbers (NICN). Random samples of data were extracted from the Master Data File ( MDF
Better assessment of physical function: item improvement is neglected but essential
2009-01-01
Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.
Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E
2009-01-01
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
Acoustic and Vibration Environment for Crew Launch Vehicle Mobile Launcher
NASA Technical Reports Server (NTRS)
Vu, Bruce T.
2007-01-01
A launch-induced acoustic environment represents a dynamic load on the exposed facilities and ground support equipment (GSE) in the form of random pressures fluctuating around the ambient atmospheric pressure. In response to these fluctuating pressures, structural vibrations are generated and transmitted throughout the structure and to the equipment items supported by the structure. Certain equipment items are also excited by the direct acoustic input as well as by the vibration transmitted through the supporting structure. This paper presents the predicted acoustic and vibration environments induced by the launch of the Crew Launch Vehicle (CLV) from Launch Complex (LC) 39. The predicted acoustic environment depicted in this paper was calculated by scaling the statistically processed measured data available from Saturn V launches to the anticipated environment of the CLV launch. The scaling was accomplished by using the 5-segment Solid Rocket Booster (SRB) engine parameters. Derivation of vibration environment for various Mobile Launcher (ML) structures throughout the base and tower was accomplished by scaling the Saturn V vibration environment.
Bakken, Suzanne; Cimino, James J.; Haskell, Robert; Kukafka, Rita; Matsumoto, Cindi; Chan, Garrett K.; Huff, Stanley M.
2000-01-01
Objective: The purpose of this study was to test the adequacy of the Clinical LOINC (Logical Observation Identifiers, Names, and Codes) semantic structure as a terminology model for standardized assessment measures. Methods: After extension of the definitions, 1,096 items from 35 standardized assessment instruments were dissected into the elements of the Clinical LOINC semantic structure. An additional coder dissected at least one randomly selected item from each instrument. When multiple scale types occurred in a single instrument, a second coder dissected one randomly selected item representative of each scale type. Results: The results support the adequacy of the Clinical LOINC semantic structure as a terminology model for standardized assessments. Using the revised definitions, the coders were able to dissect into the elements of Clinical LOINC all the standardized assessment items in the sample instruments. Percentage agreement for each element was as follows: component, 100 percent; property, 87.8 percent; timing, 82.9 percent; system/sample, 100 percent; scale, 92.6 percent; and method, 97.6 percent. Discussion: This evaluation was an initial step toward the representation of standardized assessment items in a manner that facilitates data sharing and re-use. Further clarification of the definitions, especially those related to time and property, is required to improve inter-rater reliability and to harmonize the representations with similar items already in LOINC. PMID:11062226
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10
ERIC Educational Resources Information Center
Livingston, Samuel A.; Dorans, Neil J.
2004-01-01
This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
ERIC Educational Resources Information Center
Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia
2016-01-01
The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…
ERIC Educational Resources Information Center
Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.
2016-01-01
Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
ERIC Educational Resources Information Center
Swygert, Kimberly A.
In this study, data from an operational computerized adaptive test (CAT) were examined in order to gather information concerning item response times in a CAT environment. The CAT under study included multiple-choice items measuring verbal, quantitative, and analytical reasoning. The analyses included the fitting of regression models describing the…
Item response theory in personality assessment: a demonstration using the MMPI-2 depression scale.
Childs, R A; Dahlstrom, W G; Kemp, S M; Panter, A T
2000-03-01
Item response theory (IRT) analyses have, over the past 3 decades, added much to our understanding of the relationships among and characteristics of test items, as revealed in examinees response patterns. Assessment instruments used outside the educational context have only infrequently been analyzed using IRT, however. This study demonstrates the relevance of IRT to personality data through analyses of Scale 2 (the Depression Scale) on the revised Minnesota Multiphasic Personality Inventory (MMPI-2). A rich set of hypotheses regarding the items on this scale, including contrasts among the Harris-Lingoes and Wiener-Harmon subscales and differences in the items measurement characteristics for men and women, are investigated through the IRT analyses.
Cohen, Matthew L; Kisala, Pamela A; Dyson-Hudson, Trevor A; Tulsky, David S
2018-05-01
To develop modern patient-reported outcome measures that assess pain interference and pain behavior after spinal cord injury (SCI). Grounded-theory based qualitative item development; large-scale item calibration field-testing; confirmatory factor analyses; graded response model item response theory analyses; statistical linking techniques to transform scores to the Patient Reported Outcome Measurement Information System (PROMIS) metric. Five SCI Model Systems centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. N/A. Spinal Cord Injury - Quality of Life (SCI-QOL) Pain Interference item bank, SCI-QOL Pain Interference short form, and SCI-QOL Pain Behavior scale. Seven hundred fifty-seven individuals with traumatic SCI completed 58 items addressing various aspects of pain. Items were then separated by whether they assessed pain interference or pain behavior, and poorly functioning items were removed. Confirmatory factor analyses confirmed that each set of items was unidimensional, and item response theory analyses were used to estimate slopes and thresholds for the items. Ultimately, 7 items (4 from PROMIS) comprised the Pain Behavior scale and 25 items (18 from PROMIS) comprised the Pain Interference item bank. Ten of these 25 items were selected to form the Pain Interference short form. The SCI-QOL Pain Interference item bank and the SCI-QOL Pain Behavior scale demonstrated robust psychometric properties. The Pain Interference item bank is available as a computer adaptive test or short form for research and clinical applications, and scores are transformed to the PROMIS metric.
Reliability and validity of a short form household food security scale in a Caribbean community.
Gulliford, Martin C; Mahabir, Deepak; Rocke, Brian
2004-06-16
We evaluated the reliability and validity of the short form household food security scale in a different setting from the one in which it was developed. The scale was interview administered to 531 subjects from 286 households in north central Trinidad in Trinidad and Tobago, West Indies. We evaluated the six items by fitting item response theory models to estimate item thresholds, estimating agreement among respondents in the same households and estimating the slope index of income-related inequality (SII) after adjusting for age, sex and ethnicity. Item-score correlations ranged from 0.52 to 0.79 and Cronbach's alpha was 0.87. Item responses gave within-household correlation coefficients ranging from 0.70 to 0.78. Estimated item thresholds (standard errors) from the Rasch model ranged from -2.027 (0.063) for the 'balanced meal' item to 2.251 (0.116) for the 'hungry' item. The 'balanced meal' item had the lowest threshold in each ethnic group even though there was evidence of differential functioning for this item by ethnicity. Relative thresholds of other items were generally consistent with US data. Estimation of the SII, comparing those at the bottom with those at the top of the income scale, gave relative odds for an affirmative response of 3.77 (95% confidence interval 1.40 to 10.2) for the lowest severity item, and 20.8 (2.67 to 162.5) for highest severity item. Food insecurity was associated with reduced consumption of green vegetables after additionally adjusting for income and education (0.52, 0.28 to 0.96). The household food security scale gives reliable and valid responses in this setting. Differing relative item thresholds compared with US data do not require alteration to the cut-points for classification of 'food insecurity without hunger' or 'food insecurity with hunger'. The data provide further evidence that re-evaluation of the 'balanced meal' item is required.
Computerized Adaptive Testing with Item Clones. Research Report.
ERIC Educational Resources Information Center
Glas, Cees A. W.; van der Linden, Wim J.
To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a…
Pitchford, Melanie; Ball, Linden J.; Hunt, Thomas E.; Steel, Richard
2017-01-01
We report a study examining the role of ‘cognitive miserliness’ as a determinant of poor performance on the standard three-item Cognitive Reflection Test (CRT). The cognitive miserliness hypothesis proposes that people often respond incorrectly on CRT items because of an unwillingness to go beyond default, heuristic processing and invest time and effort in analytic, reflective processing. Our analysis (N = 391) focused on people’s response times to CRT items to determine whether predicted associations are evident between miserly thinking and the generation of incorrect, intuitive answers. Evidence indicated only a weak correlation between CRT response times and accuracy. Item-level analyses also failed to demonstrate predicted response-time differences between correct analytic and incorrect intuitive answers for two of the three CRT items. We question whether participants who give incorrect intuitive answers on the CRT can legitimately be termed cognitive misers and whether the three CRT items measure the same general construct. PMID:29099840
Development of the Contact Lens User Experience: CLUE Scales
Wirth, R. J.; Edwards, Michael C.; Henderson, Michael; Henderson, Terri; Olivares, Giovanna; Houts, Carrie R.
2016-01-01
ABSTRACT Purpose The field of optometry has become increasingly interested in patient-reported outcomes, reflecting a common trend occurring across the spectrum of healthcare. This article reviews the development of the Contact Lens User Experience: CLUE system designed to assess patient evaluations of contact lenses. CLUE was built using modern psychometric methods such as factor analysis and item response theory. Methods The qualitative process through which relevant domains were identified is outlined as well as the process of creating initial item banks. Psychometric analyses were conducted on the initial item banks and refinements were made to the domains and items. Following this data-driven refinement phase, a second round of data was collected to further refine the items and obtain final item response theory item parameters estimates. Results Extensive qualitative work identified three key areas patients consider important when describing their experience with contact lenses. Based on item content and psychometric dimensionality assessments, the developing CLUE instruments were ultimately focused around four domains: comfort, vision, handling, and packaging. Item response theory parameters were estimated for the CLUE item banks (377 items), and the resulting scales were found to provide precise and reliable assignment of scores detailing users’ subjective experiences with contact lenses. Conclusions The CLUE family of instruments, as it currently exists, exhibits excellent psychometric properties. PMID:27383257
Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander
2014-04-01
An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant. © 2013 International Union of Psychological Science.
The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.
Engelhard, Matthew M; Schmidt, Karen M; Engel, Casey E; Brenton, J Nicholas; Patek, Stephen D; Goldman, Myla D
2016-12-01
The Multiple Sclerosis Walking Scale (MSWS-12) is the predominant patient-reported measure of multiple sclerosis (MS) -elated walking ability, yet it had not been analyzed using item response theory (IRT), the emerging standard for patient-reported outcome (PRO) validation. This study aims to reduce MSWS-12 measurement error and facilitate computerized adaptive testing by creating an IRT model of the MSWS-12 and distributing it online. MSWS-12 responses from 284 subjects with MS were collected by mail and used to fit and compare several IRT models. Following model selection and assessment, subpopulations based on age and sex were tested for differential item functioning (DIF). Model comparison favored a one-dimensional graded response model (GRM). This model met fit criteria and explained 87 % of response variance. The performance of each MSWS-12 item was characterized using category response curves (CRCs) and item information. IRT-based MSWS-12 scores correlated with traditional MSWS-12 scores (r = 0.99) and timed 25-foot walk (T25FW) speed (r = -0.70). Item 2 showed DIF based on age (χ 2 = 19.02, df = 5, p < 0.01), and Item 11 showed DIF based on sex (χ 2 = 13.76, df = 5, p = 0.02). MSWS-12 measurement error depends on walking ability, but could be lowered by improving or replacing items with low information or DIF. The e-MSWS-12 includes IRT-based scoring, error checking, and an estimated T25FW derived from MSWS-12 responses. It is available at https://ms-irt.shinyapps.io/e-MSWS-12 .
2013-01-01
Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M
2013-03-04
Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Measuring the quality of life in hypertension according to Item Response Theory
Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia
2017-01-01
ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764
ERIC Educational Resources Information Center
Samejima, Fumiko
In latent trait theory the latent space, or space of the hypothetical construct, is usually represented by some unidimensional or multi-dimensional continuum of real numbers. Like the latent space, the item response can either be treated as a discrete variable or as a continuous variable. Latent trait theory relates the item response to the latent…
ERIC Educational Resources Information Center
Reise, Steven P.; Meijer, Rob R.; Ainsworth, Andrew T.; Morales, Leo S.; Hays, Ron D.
2006-01-01
Group-level parametric and non-parametric item response theory models were applied to the Consumer Assessment of Healthcare Providers and Systems (CAHPS[R]) 2.0 core items in a sample of 35,572 Medicaid recipients nested within 131 health plans. Results indicated that CAHPS responses are dominated by within health plan variation, and only weakly…
ERIC Educational Resources Information Center
Wang, Wen-Chung; Liu, Chen-Wei; Wu, Shiu-Lien
2013-01-01
The random-threshold generalized unfolding model (RTGUM) was developed by treating the thresholds in the generalized unfolding model as random effects rather than fixed effects to account for the subjective nature of the selection of categories in Likert items. The parameters of the new model can be estimated with the JAGS (Just Another Gibbs…
Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert
2011-01-01
We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
Validation of a clinical critical thinking skills test in nursing.
Shin, Sujin; Jung, Dukyoo; Kim, Sungeun
2015-01-27
The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing
2015-01-01
Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
Lambert, Michael Canute; Ferguson, Gail M; Rowan, George T
2016-03-01
Cross-national study of adolescents' psychological adjustment requires measures that permit reliable and valid assessment across informants and nations, but such measures are virtually nonexistent. Item-response-theory-based linking is a promising yet underutilized methodological procedure that permits more accurate assessment across informants and nations. To demonstrate this procedure, the Resilience Scale of the Behavioral Assessment for Children of African Heritage (Lambert et al., 2005) was administered to 250 African American and 294 Jamaican nonreferred adolescents and their caregivers. Multiple items without significant differential item functioning emerged, allowing scale linking across informants and nations. Calibrating item parameters via item response theory linking can permit cross-informant cross-national assessment of youth. (c) 2016 APA, all rights reserved).
Kooy, Marcel Jan; Van Geffen, Erica C G; Heerdink, Eibert R; Van Dijk, Liset; Bouvy, Marcel L
2015-06-01
Assess effects of pharmacists' counseling by telephone on patients' satisfaction with counseling, satisfaction with information and beliefs about medicines for newly prescribed medicines. A cluster randomized trial in Dutch community pharmacies. Patients ≥18 years were included when starting with antidepressants, bisphosphonates, RAS-inhibitors or statins. The intervention comprised counseling by telephone to address barriers to adherent behavior. It was supported by an interview protocol. Controls received usual care. Outcomes were effects on beliefs about medication, satisfaction with information and counseling. Data was collected with a questionnaire. Responses of 211 patients in nine pharmacies were analyzed. More intervention arm patients were satisfied with counseling (adj. OR 2.2 (95% CI 1.3, 3.6)). Patients with counseling were significantly more satisfied with information on 4 items, had less concerns and less frequently had a 'skeptical' attitude towards medication (adj. OR 0.5 (0.3-0.9)). Effects on most outcomes were more pronounced in men than in women. Telephone counseling by pharmacists improved satisfaction with counseling and satisfaction with information on some items. It had a small effect on beliefs about medicines. Pharmacists can use counseling by telephone, but more research is needed to find out which patients benefit most. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Marfeo, Elizabeth E; Ni, Pengsheng; Chan, Leighton; Rasch, Elizabeth K; Jette, Alan M
2014-07-01
The goal of this article was to investigate optimal functioning of using frequency vs. agreement rating scales in two subdomains of the newly developed Work Disability Functional Assessment Battery: the Mood & Emotions and Behavioral Control scales. A psychometric study comparing rating scale performance embedded in a cross-sectional survey used for developing a new instrument to measure behavioral health functioning among adults applying for disability benefits in the United States was performed. Within the sample of 1,017 respondents, the range of response category endorsement was similar for both frequency and agreement item types for both scales. There were fewer missing values in the frequency items than the agreement items. Both frequency and agreement items showed acceptable reliability. The frequency items demonstrated optimal effectiveness around the mean ± 1-2 standard deviation score range; the agreement items performed better at the extreme score ranges. Findings suggest an optimal response format requires a mix of both agreement-based and frequency-based items. Frequency items perform better in the normal range of responses, capturing specific behaviors, reactions, or situations that may elicit a specific response. Agreement items do better for those whose scores are more extreme and capture subjective content related to general attitudes, behaviors, or feelings of work-related behavioral health functioning. Copyright © 2014 Elsevier Inc. All rights reserved.
Maindal, Helle Terkildsen; Sokolowski, Ineta; Vedsted, Peter
2009-06-29
The Patient Activation Measure (PAM) is a measure that assesses patient knowledge, skill, and confidence for self-management. This study validates the Danish translation of the 13-item Patient Activation Measure (PAM13) in a Danish population with dysglycaemia. 358 people with screen-detected dysglycaemia participating in a primary care health education study responded to PAM13. The PAM13 was translated into Danish by a standardised forward-backward translation. Data quality was assessed by mean, median, item response, missing values, floor and ceiling effects, internal consistency (Cronbach's alpha and average inter-item correlation) and item-rest correlations. Scale properties were assessed by Rasch Rating Scale models. The item response was high with a small number of missing values (0.8-4.2%). Floor effect was small (range 0.6-3.6%), but the ceiling effect was above 15% for all items (range 18.6-62.7%). The alpha-coefficient was 0.89 and the average inter-item correlation 0.38. The Danish version formed a unidimensional, probabilistic Guttman-like scale explaining 43.2% of the variance. We did however, find a different item sequence compared to the original scale. A Danish version of PAM13 with acceptable validity and reliability is now available. Further development should focus on single items, response categories in relation to ceiling effects and further validation of reproducibility and responsiveness.
Detection of Differential Item Functioning Using the Lasso Approach
ERIC Educational Resources Information Center
Magis, David; Tuerlinckx, Francis; De Boeck, Paul
2015-01-01
This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C
2016-03-12
Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
Informed and Uninformed Naïve Assessment Constructors' Strategies for Item Selection
ERIC Educational Resources Information Center
Fives, Helenrose; Barnes, Nicole
2017-01-01
We present a descriptive analysis of 53 naïve assessment constructors' explanations for selecting test items to include on a summative assessment. We randomly assigned participants to an informed and uninformed condition (i.e., informed participants read an article describing a Table of Specifications). Through recursive thematic analyses of…
Challenges Facing Women Academic Leadership in Secondary Schools of Irbid Educational Area
ERIC Educational Resources Information Center
Al-Jaradat, Mahmoud Khaled Mohammad
2014-01-01
This study aimed at identifying the challenges facing women academic leadership in secondary schools of Irbid Educational Area. A random sample of 187 female leaders were chosen. They responded to a 49-item questionnaire prepared by the researcher. The items were distributed into four domains: organizational, personal, social and physical…
Sequential Computerized Mastery Tests--Three Simulation Studies
ERIC Educational Resources Information Center
Wiberg, Marie
2006-01-01
A simulation study of a sequential computerized mastery test is carried out with items modeled with the 3 parameter logistic item response theory model. The examinees' responses are either identically distributed, not identically distributed, or not identically distributed together with estimation errors in the item characteristics. The…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fishbone, L.G.; Moussalli, G.; Naegele, G.
1994-04-01
An approach of short-notice random inspections (SNRIs) for inventory-change verification can enhance the effectiveness and efficiency of international safeguards at natural or low-enriched uranium (LEU) fuel fabrication plants. According to this approach, the plant operator declares the contents of nuclear material items before knowing if an inspection will occur to verify them. Additionally, items about which declarations are newly made should remain available for verification for an agreed time. This report details a six-month field test of the feasibility of such SNRIs which took place at the Westinghouse Electric Corporation Commercial Nuclear Fuel Division. Westinghouse personnel made daily declarations aboutmore » both feed and product items, uranium hexafluoride cylinders and finished fuel assemblies, using a custom-designed computer ``mailbox``. Safeguards inspectors from the IAEA conducted eight SNRIs to verify these declarations. Items from both strata were verified during the SNRIs by means of nondestructive assay equipment. The field test demonstrated the feasibility and practicality of key elements of the SNRI approach for a large LEU fuel fabrication plant.« less
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data.
Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L J; Maris, Gunter
2016-01-01
We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two 'one-process' models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a 'two-process' model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses.
What can we learn from PISA?: Investigating PISA's approach to scientific literacy
NASA Astrophysics Data System (ADS)
Schwab, Cheryl Jean
This dissertation is an investigation of the relationship between the multidimensional conception of scientific literacy and its assessment. The Programme for International Student Assessment (PISA), developed under the auspices of the Organization for Economic Cooperation and Development (OECD), offers a unique opportunity to evaluate the assessment of scientific literacy. PISA developed a continuum of performance for scientific literacy across three competencies (i.e., process, content, and situation). Foundational to the interpretation of PISA science assessment is PISA's definition of scientific literacy, which I argue incorporates three themes drawn from history: (a) scientific way of thinking, (b) everyday relevance of science, and (c) scientific literacy for all students. Three coordinated studies were conducted to investigate the validity of PISA science assessment and offer insight into the development of items to assess scientific 2 literacy. Multidimensional models of the internal structure of the PISA 2003 science items were found not to reflect the complex character of PISA's definition of scientific literacy. Although the multidimensional models across the three competencies significantly decreased the G2 statistic from the unidimensional model, high correlations between the dimensions suggest that the dimensions are similar. A cognitive analysis of student verbal responses to PISA science items revealed that students were using competencies of scientific literacy, but the competencies were not elicited by the PISA science items at the depth required by PISA's definition of scientific literacy. Although student responses contained only knowledge of scientific facts and simple scientific concepts, students were using more complex skills to interpret and communicate their responses. Finally the investigation of different scoring approaches and item response models illustrated different ways to interpret student responses to assessment items. These analyses highlighted the complexities of students' responses to the PISA science items and the use of the ordered partition model to accommodate different but equal item responses. The results of the three investigations are used to discuss ways to improve the development and interpretation of PISA's science items.
Reliability and validity of a scale for health-promoting schools.
Lee, Eun Young; Shin, Young-Jeon; Choi, Bo Youl; Cho, Ho Soon Michelle
2014-12-01
Despite a growing body of research regarding the health-promoting schools (HPS) concept from the World Health Organization (WHO), research on measuring of the HPS is limited. This study aims to develop a scale for assessing the status of the HPS based on the WHO guidelines and to evaluate the reliability and validity of the scale. After completing the translation and back-translation process, the content validity of the 50-item scale for HPS (SHPS) was assessed by an expert committee review and pretested with 17 teachers. A stratified, random sampling design was used. A total of 728 teachers from 94 schools completed a self-administered questionnaire. The total sample was randomly divided into three groups for exploratory factor analysis (EFA), confirmatory factor analysis (CFA) and cross-validation. The EFA suggested seven factors, including 37 items, and the CFA confirmed these factors. In a second-order factor analysis, the second-order seven-factor model had acceptable fit indices (root mean square error of approximation 0.07, comparative fit index 0.98) with stability over validation sample and whole sample. Thus, the first-order seven factors (school nutrition services [three-item, α = 0.87], healthy school policies [six-item, α = 0.87], school's physical environment [10-item, α = 0.91], school's social environment [four-item, α = 0.88], community links [six-item, α = 0.91], individual health skills and action competencies [three-item, α = 0.89], and health services [five-item, α = 0.86]) loaded significantly onto the second-order factor (HPS [37-item, α = 0.97]). In conclusion, the SHPS is a reliable and valid measurement tool for assessing the states of the HPS in the Korean school context. It will be useful for comprehensively assessing schools' needs and monitoring the progress of school health interventions. © The Author (2013). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D
2017-01-01
Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica
2018-02-01
The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
ERIC Educational Resources Information Center
Arffman, Inga
2016-01-01
Open-ended (OE) items are widely used to gather data on student performance in international achievement studies. However, several factors may threaten validity when using such items. This study examined Finnish coders' opinions about threats to validity when coding responses to OE items in the PISA 2012 problem-solving test. A total of 6…
ERIC Educational Resources Information Center
Cao, Yi; Lu, Ru; Tao, Wei
2014-01-01
The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
ERIC Educational Resources Information Center
Ferrando, Pere J.
2004-01-01
This study used kernel-smoothing procedures to estimate the item characteristic functions (ICFs) of a set of continuous personality items. The nonparametric ICFs were compared with the ICFs estimated (a) by the linear model and (b) by Samejima's continuous-response model. The study was based on a conditioned approach and used an error-in-variables…
ERIC Educational Resources Information Center
Watson, Kathy; Baranowski, Tom; Thompson, Debbe; Jago, Russell; Baranowski, Janice; Klesges, Lisa M.
2006-01-01
This study examined multidimensional item response theory (MIRT) modeling to assess social desirability (SocD) influences on self-reported physical activity self-efficacy (PASE) and fruit and vegetable self-efficacy (FVSE). The observed sample included 473 Houston-area adolescent males (10-14 years). SocD (nine items), PASE (19 items) and FVSE (21…
The Structure of the Narcissistic Personality Inventory With Binary and Rating Scale Items.
Boldero, Jennifer M; Bell, Richard C; Davies, Richard C
2015-01-01
Narcissistic Personality Inventory (NPI) items typically have a forced-choice format, comprising a narcissistic and a nonnarcissistic statement. Recently, some have presented the narcissistic statements and asked individuals to either indicate whether they agree or disagree that the statements are self-descriptive (i.e., a binary response format) or to rate the extent to which they agree or disagree that these statements are self-descriptive on a Likert scale (i.e., a rating response format). The current research demonstrates that when NPI items have a binary or a rating response format, the scale has a bifactor structure (i.e., the items load on a general factor and on 6 specific group factors). Indexes of factor strength suggest that the data are unidimensional enough for the NPI's general factor to be considered a measure of a narcissism latent trait. However, the rating item general factor assessed more narcissism components than the binary item one. The positive correlations of the NPI's general factor, assessed when items have a rating response format, were moderate with self-esteem, strong with a measure of narcissistic grandiosity, and weak with 2 measures of narcissistic vulnerability. Together, the results suggest that using a rating format for items enhances the information provided by the NPI.
2013-01-01
Background Assessing the risk of bias of randomized controlled trials (RCTs) is crucial to understand how biases affect treatment effect estimates. A number of tools have been developed to evaluate risk of bias of RCTs; however, it is unknown how these tools compare to each other in the items included. The main objective of this study was to describe which individual items are included in RCT quality tools used in general health and physical therapy (PT) research, and how these items compare to those of the Cochrane Risk of Bias (RoB) tool. Methods We used comprehensive literature searches and a systematic approach to identify tools that evaluated the methodological quality or risk of bias of RCTs in general health and PT research. We extracted individual items from all quality tools. We calculated the frequency of quality items used across tools and compared them to those in the RoB tool. Comparisons were made between general health and PT quality tools using Chi-squared tests. Results In addition to the RoB tool, 26 quality tools were identified, with 19 being used in general health and seven in PT research. The total number of quality items included in general health research tools was 130, compared with 48 items across PT tools and seven items in the RoB tool. The most frequently included items in general health research tools (14/19, 74%) were inclusion and exclusion criteria, and appropriate statistical analysis. In contrast, the most frequent items included in PT tools (86%, 6/7) were: baseline comparability, blinding of investigator/assessor, and use of intention-to-treat analysis. Key items of the RoB tool (sequence generation and allocation concealment) were included in 71% (5/7) of PT tools, and 63% (12/19) and 37% (7/19) of general health research tools, respectively. Conclusions There is extensive item variation across tools that evaluate the risk of bias of RCTs in health research. Results call for an in-depth analysis of items that should be used to assess risk of bias of RCTs. Further empirical evidence on the use of individual items and the psychometric properties of risk of bias tools is needed. PMID:24044807
Armijo-Olivo, Susan; Fuentes, Jorge; Ospina, Maria; Saltaji, Humam; Hartling, Lisa
2013-09-17
Assessing the risk of bias of randomized controlled trials (RCTs) is crucial to understand how biases affect treatment effect estimates. A number of tools have been developed to evaluate risk of bias of RCTs; however, it is unknown how these tools compare to each other in the items included. The main objective of this study was to describe which individual items are included in RCT quality tools used in general health and physical therapy (PT) research, and how these items compare to those of the Cochrane Risk of Bias (RoB) tool. We used comprehensive literature searches and a systematic approach to identify tools that evaluated the methodological quality or risk of bias of RCTs in general health and PT research. We extracted individual items from all quality tools. We calculated the frequency of quality items used across tools and compared them to those in the RoB tool. Comparisons were made between general health and PT quality tools using Chi-squared tests. In addition to the RoB tool, 26 quality tools were identified, with 19 being used in general health and seven in PT research. The total number of quality items included in general health research tools was 130, compared with 48 items across PT tools and seven items in the RoB tool. The most frequently included items in general health research tools (14/19, 74%) were inclusion and exclusion criteria, and appropriate statistical analysis. In contrast, the most frequent items included in PT tools (86%, 6/7) were: baseline comparability, blinding of investigator/assessor, and use of intention-to-treat analysis. Key items of the RoB tool (sequence generation and allocation concealment) were included in 71% (5/7) of PT tools, and 63% (12/19) and 37% (7/19) of general health research tools, respectively. There is extensive item variation across tools that evaluate the risk of bias of RCTs in health research. Results call for an in-depth analysis of items that should be used to assess risk of bias of RCTs. Further empirical evidence on the use of individual items and the psychometric properties of risk of bias tools is needed.
Handling missing values in the MDS-UPDRS.
Goetz, Christopher G; Luo, Sheng; Wang, Lu; Tilley, Barbara C; LaPelle, Nancy R; Stebbins, Glenn T
2015-10-01
This study was undertaken to define the number of missing values permissible to render valid total scores for each Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) part. To handle missing values, imputation strategies serve as guidelines to reject an incomplete rating or create a surrogate score. We tested a rigorous, scale-specific, data-based approach to handling missing values for the MDS-UPDRS. From two large MDS-UPDRS datasets, we sequentially deleted item scores, either consistently (same items) or randomly (different items) across all subjects. Lin's Concordance Correlation Coefficient (CCC) compared scores calculated without missing values with prorated scores based on sequentially increasing missing values. The maximal number of missing values retaining a CCC greater than 0.95 determined the threshold for rendering a valid prorated score. A second confirmatory sample was selected from the MDS-UPDRS international translation program. To provide valid part scores applicable across all Hoehn and Yahr (H&Y) stages when the same items are consistently missing, one missing item from Part I, one from Part II, three from Part III, but none from Part IV can be allowed. To provide valid part scores applicable across all H&Y stages when random item entries are missing, one missing item from Part I, two from Part II, seven from Part III, but none from Part IV can be allowed. All cutoff values were confirmed in the validation sample. These analyses are useful for constructing valid surrogate part scores for MDS-UPDRS when missing items fall within the identified threshold and give scientific justification for rejecting partially completed ratings that fall below the threshold. © 2015 International Parkinson and Movement Disorder Society.
Oberauer, Klaus
2018-03-12
To function properly, working memory must be rapidly updated. Updating requires the removal of information no longer relevant. I present six experiments designed to explore the boundary conditions and the time course of removal. A condition in which three out of six memory items can be removed was compared to two baseline conditions in which either three or six items were encoded and maintained in working memory. The time for removal was varied. In experiment 1, in the removal condition, a distinct subset of three words was cued to be irrelevant after encoding all six words. With longer removal time, response times in the removal condition approximated those in the set-size 3 baseline, but accuracies stayed at the set-size 6 level. In experiment 2, in which a random subset of three words was cued as irrelevant, there was no evidence for removal. Experiments 3 and 4 showed that when each item is cued as relevant or irrelevant after its encoding, irrelevant items can be removed rapidly and completely. Experiments 5 and 6 showed that complete removal was no longer possible when words had to be processed before being cued as irrelevant. The pattern of findings can be explained by distinguishing two forms of removal: deactivation removes working-memory contents from the set of competitors for retrieval; unbinding contents from their contexts removes them from working memory entirely, so that they also cease to compete for limited capacity. © 2018 New York Academy of Sciences.
Taffarel, Marilda Onghero; Luna, Stelio Pacca Loureiro; de Oliveira, Flavia Augusta; Cardoso, Guilherme Schiess; Alonso, Juliana de Moura; Pantoja, Jose Carlos; Brondani, Juliana Tabarelli; Love, Emma; Taylor, Polly; White, Kate; Murrell, Joanna C
2015-04-01
Quantification of pain plays a vital role in the diagnosis and management of pain in animals. In order to refine and validate an acute pain scale for horses a prospective, randomized, blinded study was conducted. Twenty-four client owned adult horses were recruited and allocated to one of four following groups: anaesthesia only (GA); pre-emptive analgesia and anaesthesia (GAA,); anaesthesia, castration and postoperative analgesia (GC); or pre-emptive analgesia, anaesthesia and castration (GCA). One investigator, unaware of the treatment group, assessed all horses at time-points before and after intervention and completed the pain scale. Videos were also obtained at these time-points and were evaluated by a further four blinded evaluators who also completed the scale. The data were used to investigate the relevance, specificity, criterion validity and inter- and intra-observer reliability of each item on the pain scale, and to evaluate construct validity and responsiveness of the scale. Construct validity was demonstrated by the observed differences in scores between the groups, four hours after anaesthetic recovery and before administration of systemic analgesia in the GC group. Inter- and intra-observer reliability for the items was only satisfactory. Subsequently the pain scale was refined, based on results for relevance, specificity and total item correlation. Scale refinement and exclusion of items that did not meet predefined requirements generated a selection of relevant pain behaviours in horses. After further validation for reliability, these may be used to evaluate pain under clinical and experimental conditions.
Pilkonis, Paul A.; Yu, Lan; Dodds, Nathan E.; Johnston, Kelly L.; Lawrence, Suzanne; Hilton, Thomas F.; Daley, Dennis C.; Patkar, Ashwin A.; McCarty, Dennis
2015-01-01
Background Two item banks for substance use were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®): severity of substance use and positive appeal of substance use. Methods Qualitative item analysis (including focus groups, cognitive interviewing, expert review, and item revision) reduced an initial pool of more than 5,300 items for substance use to 119 items included in field testing. Items were written in a first-person, past-tense format, with 5 response options reflecting frequency or severity. Both 30-day and 3-month time frames were tested. The calibration sample of 1,336 respondents included 875 individuals from the general population (ascertained through an internet panel) and 461patients from addiction treatment centers participating in the National Drug Abuse Treatment Clinical Trials Network. Results Final banks of 37 and 18 items were calibrated for severity of substance use and positive appeal of substance use, respectively, using the two-parameter graded response model from item response theory (IRT). Initial calibrations were similar for the 30-day and 3-month time frames, and final calibrations used data combined across the time frames, making the items applicable with either interval. Seven-item static short forms were also developed from each item bank. Conclusions Test information curves showed that the PROMIS item banks provided substantial information in a broad range of severity, making them suitable for treatment, observational, and epidemiological research in both clinical and community settings. PMID:26423364
Cho, Sun-Joo; Preacher, Kristopher J.; Bottge, Brian A.
2015-01-01
Multilevel modeling (MLM) is frequently used to detect group differences, such as an intervention effect in a pre-test–post-test cluster-randomized design. Group differences on the post-test scores are detected by controlling for pre-test scores as a proxy variable for unobserved factors that predict future attributes. The pre-test and post-test scores that are most often used in MLM are summed item responses (or total scores). In prior research, there have been concerns regarding measurement error in the use of total scores in using MLM. To correct for measurement error in the covariate and outcome, a theoretical justification for the use of multilevel structural equation modeling (MSEM) has been established. However, MSEM for binary responses has not been widely applied to detect intervention effects (group differences) in intervention studies. In this article, the use of MSEM for intervention studies is demonstrated and the performance of MSEM is evaluated via a simulation study. Furthermore, the consequences of using MLM instead of MSEM are shown in detecting group differences. Results of the simulation study showed that MSEM performed adequately as the number of clusters, cluster size, and intraclass correlation increased and outperformed MLM for the detection of group differences. PMID:29881032
Assessing adherence to the evidence base in the management of poststroke dysphagia.
Burton, Christopher; Pennington, Lindsay; Roddam, Hazel; Russell, Ian; Russell, Daphne; Krawczyk, Karen; Smith, Hilary A
2006-01-01
To evaluate the reliability and responsiveness to change of an audit tool to assess adherence to evidence of effectiveness in the speech and language therapy (SLT) management of poststroke dysphagia. The tool was used to review SLT practice as part of a randomized study of different education strategies. Medical records were audited before and after delivery of the trial intervention. Seventeen SLT departments in the north-west of England participated in the study. The assessment tool was used to assess the medical records of 753 patients before and 717 patients after delivery of the trial intervention across the 17 departments. A target of 10 records per department per month was sought, using systematic sampling with a random start. Inter- and intra-rater reliability were explored, together with the tool's internal consistency and responsiveness to change. The assessment tool had high face validity, although internal consistency was low (ra = 0.37). Composite scores on the tool were however responsive to differences between SLT departments. Both inter- and intra-rater reliability ranged from 'substantial' to 'near perfect' across all items. The audit tool has high face validity and measurement reliability. The use of a composite adherence score should, however, proceed with caution as internal consistency is low.
Cho, Sun-Joo; Preacher, Kristopher J; Bottge, Brian A
2015-11-01
Multilevel modeling (MLM) is frequently used to detect group differences, such as an intervention effect in a pre-test-post-test cluster-randomized design. Group differences on the post-test scores are detected by controlling for pre-test scores as a proxy variable for unobserved factors that predict future attributes. The pre-test and post-test scores that are most often used in MLM are summed item responses (or total scores). In prior research, there have been concerns regarding measurement error in the use of total scores in using MLM. To correct for measurement error in the covariate and outcome, a theoretical justification for the use of multilevel structural equation modeling (MSEM) has been established. However, MSEM for binary responses has not been widely applied to detect intervention effects (group differences) in intervention studies. In this article, the use of MSEM for intervention studies is demonstrated and the performance of MSEM is evaluated via a simulation study. Furthermore, the consequences of using MLM instead of MSEM are shown in detecting group differences. Results of the simulation study showed that MSEM performed adequately as the number of clusters, cluster size, and intraclass correlation increased and outperformed MLM for the detection of group differences.
Practical Guide to Conducting an Item Response Theory Analysis
ERIC Educational Resources Information Center
Toland, Michael D.
2014-01-01
Item response theory (IRT) is a psychometric technique used in the development, evaluation, improvement, and scoring of multi-item scales. This pedagogical article provides the necessary information needed to understand how to conduct, interpret, and report results from two commonly used ordered polytomous IRT models (Samejima's graded…
Analyzing Longitudinal Item Response Data via the Pairwise Fitting Method
ERIC Educational Resources Information Center
Fu, Zhi-Hui; Tao, Jian; Shi, Ning-Zhong; Zhang, Ming; Lin, Nan
2011-01-01
Multidimensional item response theory (MIRT) models can be applied to longitudinal educational surveys where a group of individuals are administered different tests over time with some common items. However, computational problems typically arise as the dimension of the latent variables increases. This is especially true when the latent variable…
Item Construction and Psychometric Models Appropriate for Constructed Responses
1991-08-01
which involve only one attribute per item. This is especially true when we are dealing with constructed-response items, we have to measure much more...Service University of Ilinois Educacional Testing Service Rosedal Road Capign. IL 61801 Princeton. K3 08541 Princeton. N3 08541 Dr. Charles LeiS Dr
Different Approaches to Covariate Inclusion in the Mixture Rasch Model
ERIC Educational Resources Information Center
Li, Tongyun; Jiao, Hong; Macready, George B.
2016-01-01
The present study investigates different approaches to adding covariates and the impact in fitting mixture item response theory models. Mixture item response theory models serve as an important methodology for tackling several psychometric issues in test development, including the detection of latent differential item functioning. A Monte Carlo…
Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory
ERIC Educational Resources Information Center
Lee, Won-Chan
2010-01-01
In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…
Nagai, Kaori; Saito, Akiko M; Saito, Toshiki I; Kaneko, Noriyo
2017-12-28
To allow for correct evaluation of clinical trial results, readers require comprehensive, clear, and highly transparent information on the methodology used and the results obtained. This study aimed to evaluate the quality of reporting in articles on randomized controlled trials (RCTs) of antiretroviral therapy (ART) in the field of HIV/AIDS. We searched for original articles on RCTs of ART developed in the field of HIV/AIDS in PubMed database by 5 April 2016. Searched articles were divided into three groups based on the revision year in which the Consolidated Standards of Reporting Trials (CONSORT) guidelines were published: Period 1 (1996-2001); Period 2 (2002-2010); and Period 3 (2011-2016). We evaluated the articles using the reporting rates of the 37 items in the CONSORT 2010 checklist, five items in the protocol deviation, and the three items in the ethics. Fifty-two articles were extracted and included in this study. Many of the reporting rates calculated using the CONSORT 2010 checklist showed a significantly increasing trend over the successive periods (65% in Period 1, 67% in Period 2, 79% in Period 3; p < 0.0001). The items with reporting rates < 50% were "the presence or absence of a protocol change and the reason for such a change," "randomization and blinding," and "where the full trial protocol can be accessed." Reporting rates of deviations were as low as < 30%, while the reporting rates for patient compliance were the highest (>80% in Period 3) among the five items. The reporting rates for obtaining informed consent and approval by the ethics committee or institutional review board were high (>88%), regardless of the time period assessed. In terms of representative RCT articles in the field of HIV/AIDS, the reporting rate of the items defined by CONSORT was approximately 70%, improving over the successive CONSORT statement revision periods.
Robust Estimation of Latent Ability in Item Response Models
ERIC Educational Resources Information Center
Schuster, Christof; Yuan, Ke-Hai
2011-01-01
Because of response disturbances such as guessing, cheating, or carelessness, item response models often can only approximate the "true" individual response probabilities. As a consequence, maximum-likelihood estimates of ability will be biased. Typically, the nature and extent to which response disturbances are present is unknown, and, therefore,…
Theoretical and Empirical Comparisons between Two Models for Continuous Item Responses.
ERIC Educational Resources Information Center
Ferrando, Pere J.
2002-01-01
Analyzed the relations between two continuous response models intended for typical response items: the linear congeneric model and Samejima's continuous response model (CRM). Illustrated the relations described using an empirical example and assessed the relations through a simulation study. (SLD)
Rhodes, Matthew G; Jacoby, Larry L
2007-03-01
The authors examined whether participants can shift their criterion for recognition decisions in response to the probability that an item was previously studied. Participants in 3 experiments were given recognition tests in which the probability that an item was studied was correlated with its location during the test. Results from all 3 experiments indicated that participants' response criteria were sensitive to the probability that an item was previously studied and that shifts in criterion were robust. In addition, awareness of the bases for criterion shifts and feedback on performance were key factors contributing to the observed shifts in decision criteria. These data suggest that decision processes can operate in a dynamic fashion, shifting from item to item.
Smith, Tracey J; Barrett, Ann; Anderson, Danielle; Wilson, Marques A; Young, Andrew J; Montain, Scott J
2015-05-01
Development of n-3 fortified, shelf-stable foods is facilitated by encapsulated docosahexaenoic acid (DHA) and eicosapentaenoic acid (EPA), since natural n-3 food sources cannot withstand high temperature and prolonged shelf life. Organoleptic stability of n-3 fortified, shelf-stable foods has been demonstrated, but chemical changes in the food matrix throughout storage could conceivably impact digestibility of the protein-based encapsulant thereby compromising n-3 bioavailability. We assessed the effect of prolonged high-temperature storage and variations in food matrix (proteinaceous or carbohydrate) on the time course and magnitude of blood fatty acids changes associated with ingestion of n-3 fortified foods. Low-protein (i.e., cake) and high-protein (i.e., meat sticks) items were supplemented with 600 mg encapsulated DHA+EPA, and frozen either immediately after production (FRESH) or after 6 months storage at 100°F (STORED). Fourteen volunteers consumed one item per week (randomized) for 4 weeks. Blood samples obtained at baseline, 2, 4, and 6 h post-consumption were analyzed for circulating long-chain omega 3 fatty acids (LCn3). There was no difference in LCn3 area under the curve between items. LCn3 in response to cakes peaked at 2-h (FRESH: 54.0 ± 16.8 μg/mL, +18%; STORED: 53.0 ± 13.2 μg/mL, +20%), while meats peaked at 4-h (FRESH: 51.9 ± 12.5 μg/mL, +22%; STORED: 53.2 ± 16.9 μg/mL, +18%). There were no appreciable differences in time course or magnitude of n-3 appearance in response to storage conditions for either food types. Thus, bioavailability of encapsulated DHA/EPA, within low- and high-protein food items, was not affected by high-temperature shelf-storage. A shelf-stable, low- or high-protein food item with encapsulated DHA/EPA is suitable for use in shelf-stable foods.
ERIC Educational Resources Information Center
Rudner, Lawrence
This digest discusses the advantages and disadvantages of using item banks, and it provides useful information for those who are considering implementing an item banking project in their school districts. The primary advantage of item banking is in test development. Using an item response theory method, such as the Rasch model, items from multiple…
Visual search for arbitrary objects in real scenes
Alvarez, George A.; Rosenholtz, Ruth; Kuzmova, Yoana I.; Sherman, Ashley M.
2011-01-01
How efficient is visual search in real scenes? In searches for targets among arrays of randomly placed distractors, efficiency is often indexed by the slope of the reaction time (RT) × Set Size function. However, it may be impossible to define set size for real scenes. As an approximation, we hand-labeled 100 indoor scenes and used the number of labeled regions as a surrogate for set size. In Experiment 1, observers searched for named objects (a chair, bowl, etc.). With set size defined as the number of labeled regions, search was very efficient (~5 ms/item). When we controlled for a possible guessing strategy in Experiment 2, slopes increased somewhat (~15 ms/item), but they were much shallower than search for a random object among other distinctive objects outside of a scene setting (Exp. 3: ~40 ms/item). In Experiments 4–6, observers searched repeatedly through the same scene for different objects. Increased familiarity with scenes had modest effects on RTs, while repetition of target items had large effects (>500 ms). We propose that visual search in scenes is efficient because scene-specific forms of attentional guidance can eliminate most regions from the “functional set size” of items that could possibly be the target. PMID:21671156
Visual search for arbitrary objects in real scenes.
Wolfe, Jeremy M; Alvarez, George A; Rosenholtz, Ruth; Kuzmova, Yoana I; Sherman, Ashley M
2011-08-01
How efficient is visual search in real scenes? In searches for targets among arrays of randomly placed distractors, efficiency is often indexed by the slope of the reaction time (RT) × Set Size function. However, it may be impossible to define set size for real scenes. As an approximation, we hand-labeled 100 indoor scenes and used the number of labeled regions as a surrogate for set size. In Experiment 1, observers searched for named objects (a chair, bowl, etc.). With set size defined as the number of labeled regions, search was very efficient (~5 ms/item). When we controlled for a possible guessing strategy in Experiment 2, slopes increased somewhat (~15 ms/item), but they were much shallower than search for a random object among other distinctive objects outside of a scene setting (Exp. 3: ~40 ms/item). In Experiments 4-6, observers searched repeatedly through the same scene for different objects. Increased familiarity with scenes had modest effects on RTs, while repetition of target items had large effects (>500 ms). We propose that visual search in scenes is efficient because scene-specific forms of attentional guidance can eliminate most regions from the "functional set size" of items that could possibly be the target.
Huang, Yueng-Hsiang; Lee, Jin; Chen, Zhuo; Perry, MacKenna; Cheung, Janelle H; Wang, Mo
2017-06-01
Zohar and Luria's (2005) safety climate (SC) scale, measuring organization- and group- level SC each with 16 items, is widely used in research and practice. To improve the utility of the SC scale, we shortened the original full-length SC scales. Item response theory (IRT) analysis was conducted using a sample of 29,179 frontline workers from various industries. Based on graded response models, we shortened the original scales in two ways: (1) selecting items with above-average discriminating ability (i.e. offering more than 6.25% of the original total scale information), resulting in 8-item organization-level and 11-item group-level SC scales; and (2) selecting the most informative items that together retain at least 30% of original scale information, resulting in 4-item organization-level and 4-item group-level SC scales. All four shortened scales had acceptable reliability (≥0.89) and high correlations (≥0.95) with the original scale scores. The shortened scales will be valuable for academic research and practical survey implementation in improving occupational safety. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Unsworth, Nash; Brewer, Gene A; Spillers, Gregory J
2011-09-01
In three experiments search termination decisions were examined as a function of response type (correct vs. incorrect) and confidence. It was found that the time between the last retrieved item and the decision to terminate search (exit latency) was related to the type of response and confidence in the last item retrieved. Participants were willing to search longer when the last retrieved item was a correct item vs. an incorrect item and when the confidence was high in the last retrieved item. It was also found that the number of errors retrieved during the recall period was related to search termination decisions such that the more errors retrieved, the more likely participants were to terminate the search. Finally, it was found that knowledge of overall search set size influenced the time needed to search for items, but did not influence search termination decisions. Copyright © 2011 Elsevier B.V. All rights reserved.
Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina
2015-06-01
This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
ERIC Educational Resources Information Center
Ding, Kele; Olds, R. Scott; Thombs, Dennis L.
2009-01-01
This retrospective case study assessed the influence of item non-response error on subsequent response to questionnaire items assessing adolescent alcohol and marijuana use. Post-hoc analyses were conducted on survey results obtained from 4,371 7th to 12th grade students in Ohio in 2005. A skip pattern design in a conventional questionnaire…
ERIC Educational Resources Information Center
Hsieh, Chueh-An; von Eye, Alexander A.; Maier, Kimberly S.
2010-01-01
The application of multidimensional item response theory models to repeated observations has demonstrated great promise in developmental research. It allows researchers to take into consideration both the characteristics of item response and measurement error in longitudinal trajectory analysis, which improves the reliability and validity of the…
Applying mixed methods to pretest the Pressure Ulcer Quality of Life (PU-QOL) instrument.
Gorecki, C; Lamping, D L; Nixon, J; Brown, J M; Cano, S
2012-04-01
Pretesting is key in the development of patient-reported outcome (PRO) instruments. We describe a mixed-methods approach based on interviews and Rasch measurement methods in the pretesting of the Pressure Ulcer Quality of Life (PU-QOL) instrument. We used cognitive interviews to pretest the PU-QOL in 35 patients with pressure ulcers with the view to identifying problematic items, followed by Rasch analysis to examine response options, appropriateness of the item series and biases due to question ordering (item fit). We then compared findings in an interactive and iterative process to identify potential strengths and weaknesses of PU-QOL items, and guide decision-making about further revisions to items and design/layout. Although cognitive interviews largely supported items, they highlighted problems with layout, response options and comprehension. Findings from the Rasch analysis identified problems with response options through reversed thresholds. The use of a mixed-methods approach in pretesting the PU-QOL instrument proved beneficial for identifying problems with scale layout, response options and framing/wording of items. Rasch measurement methods are a useful addition to standard qualitative pretesting for evaluating strengths and weaknesses of early stage PRO instruments.
Triple dissociation of duration perception regulating mechanisms: Top-down attention is inherent.
Lin, Yong-Jun; Shimojo, Shinsuke
2017-01-01
The brain constantly adjusts perceived duration based on the recent event history. One such lab phenomenon is subjective time expansion induced in an oddball paradigm ("oddball chronostasis"), where the duration of a distinct item (oddball) appears subjectively longer when embedded in a series of other repeated items (standards). Three hypotheses have been separately proposed but it remains unresolved which or all of them are true: 1) attention prolongs oddball duration, 2) repetition suppression reduces standards duration, and 3) accumulative temporal preparation (anticipation) expedites the perceived item onset so as to lengthen its duration. We thus conducted critical systematic experiments to dissociate the relative contribution of all hypotheses, by orthogonally manipulating sequences types (repeated, ordered, or random) and target serial positions. Participants' task was to judge whether a target lasts shorter or longer than its reference. The main finding was that a random item sequence still elicited significant chronostasis even though each item was odd. That is, simply being a target draws top-down attention and induces chronostasis. In Experiments 1 (digits) and 2 (orientations), top-down attention explained about half of the effect while saliency/adaptation explained the other half. Additionally, for non-repeated (ordered and random) sequence types, a target with later serial position still elicited stronger chronostasis, favoring a temporal preparation over a repetition suppression account. By contrast, in Experiment 3 (colors), top-down attention was likely the sole factor. Consequently, top-down attention is necessary and sometimes sufficient to explain oddball chronostasis; saliency/adaptation and temporal preparation are contingent factors. These critical boundary conditions revealed in our study serve as quantitative constraints for neural models of duration perception.
Vegada, Bhavisha; Shukla, Apexa; Khilnani, Ajeetkumar; Charan, Jaykaran; Desai, Chetna
2016-01-01
Most of the academic teachers use four or five options per item of multiple choice question (MCQ) test as formative and summative assessment. Optimal number of options in MCQ item is a matter of considerable debate among academic teachers of various educational fields. There is a scarcity of the published literature regarding the optimum number of option in each item of MCQ in the field of medical education. To compare three options, four options, and five options MCQs test for the quality parameters - reliability, validity, item analysis, distracter analysis, and time analysis. Participants were 3 rd semester M.B.B.S. students. Students were divided randomly into three groups. Each group was given one set of MCQ test out of three options, four options, and five option randomly. Following the marking of the multiple choice tests, the participants' option selections were analyzed and comparisons were conducted of the mean marks, mean time, validity, reliability and facility value, discrimination index, point biserial value, distracter analysis of three different option formats. Students score more ( P = 0.000) and took less time ( P = 0.009) for the completion of three options as compared to four options and five options groups. Facility value was more ( P = 0.004) in three options group as compared to four and five options groups. There was no significant difference between three groups for the validity, reliability, and item discrimination. Nonfunctioning distracters were more in the four and five options group as compared to three option group. Assessment based on three option MCQs is can be preferred over four option and five option MCQs.
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.
Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland
2014-04-01
To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions
ERIC Educational Resources Information Center
Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.
2003-01-01
Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
Brouwers, Melissa C.; Kho, Michelle E.; Browman, George P.; Burgers, Jako S.; Cluzeau, Françoise; Feder, Gene; Fervers, Béatrice; Graham, Ian D.; Hanna, Steven E.; Makarski, Julie
2010-01-01
Background We established a program of research to improve the development, reporting and evaluation of practice guidelines. We assessed the construct validity of the items and user’s manual in the β version of the AGREE II. Methods We designed guideline excerpts reflecting high-and low-quality guideline content for 21 of the 23 items in the tool. We designed two study packages so that one low-quality and one high-quality version of each item were randomly assigned to each package. We randomly assigned 30 participants to one of the two packages. Participants reviewed and rated the guideline content according to the instructions of the user’s manual and completed a survey assessing the manual. Results In all cases, content designed to be of high quality was rated higher than low-quality content; in 18 of 21 cases, the differences were significant (p < 0.05). The manual was rated by participants as appropriate, easy to use, and helpful in differentiating guidelines of varying quality, with all scores above the mid-point of the seven-point scale. Considerable feedback was offered on how the items and manual of the β-AGREE II could be improved. Interpretation The validity of the items was established and the user’s manual was rated as highly useful by users. We used these results and those of our study presented in part 1 to modify the items and user’s manual. We recommend AGREE II (available at www.agreetrust.org) as the revised standard for guideline development, reporting and evaluation. PMID:20513779
Dellinges, Mark A; Curtis, Donald A
2017-08-01
Faculty members are expected to write high-quality multiple-choice questions (MCQs) in order to accurately assess dental students' achievement. However, most dental school faculty members are not trained to write MCQs. Extensive faculty development programs have been used to help educators write better test items. The aim of this pilot study was to determine if a short workshop would result in improved MCQ item-writing by dental school faculty at one U.S. dental school. A total of 24 dental school faculty members who had previously written MCQs were randomized into a no-intervention group and an intervention group in 2015. Six previously written MCQs were randomly selected from each of the faculty members and given an item quality score. The intervention group participated in a training session of one-hour duration that focused on reviewing standard item-writing guidelines to improve in-house MCQs. The no-intervention group did not receive any training but did receive encouragement and an explanation of why good MCQ writing was important. The faculty members were then asked to revise their previously written questions, and these were given an item quality score. The item quality scores for each faculty member were averaged, and the difference from pre-training to post-training scores was evaluated. The results showed a significant difference between pre-training and post-training MCQ difference scores for the intervention group (p=0.04). This pilot study provides evidence that the training session of short duration was effective in improving the quality of in-house MCQs.
Halimic, Aida; Gage, Heather; Raats, Monique; Williams, Peter
2018-04-01
To explore the impact of price manipulation and healthy eating information on intended food choices. Health information was provided to a random half of subjects (vs. information on Saudi agriculture). Each subject chose from the same lunch menu, containing two healthy and two unhealthy entrees, deserts and beverages, on five occasions. Reference case prices were 5, 3 and 2 Saudi Arabian Reals (SARs). Prices of healthy and unhealthy items were manipulated up (taxed) and down (subsidized) by 1 SAR in four menu variations (random order); subjects were given a budget enabling full choice within any menu. The number of healthy food choices were compared with different price combinations, and between information groups. Linear regression modelling explored the effect of relative prices of healthy/unhealthy options and information on number of healthy choices controlling for dietary behaviours and hunger levels. University campus, Saudi Arabia, 2013. 99 women students. In the reference case, 49.5% of choices were for healthy items. When the price of healthy items was reduced, 58.5% of selections were healthy; 57.2% when the price of unhealthy items rose. In regression modelling, reducing the price of healthy items and increasing the price of unhealthy items increased the number of healthy choices by 5% and 6% respectively. Students reporting a less healthy usual diet selected significantly fewer healthy items. Providing healthy eating information was not a significant influence. Price manipulation offers potential for altering behaviours to combat rising youth obesity in Saudi Arabia. Copyright © 2018 Elsevier Ltd. All rights reserved.
Bitran, Stella; Farabaugh, Amy H; Ameral, Victoria E; LaRocca, Rachel A; Clain, Alisabet J; Fava, Maurizio; Mischoulon, David
2011-07-01
To assess whether early changes in Hamilton Depression Rating Scale-17 anxiety/somatization items predict remission in two controlled studies of Hypericum perforatum (St John's wort) versus selective serotonin reuptake inhibitors for major depressive disorder. The Hypericum Depression Trial Study Group (National Institute of Mental Health) randomized 340 patients to Hypericum, sertraline, or placebo for 8 weeks, whereas the Massachusetts General Hospital study randomized 135 patients to Hypericum, fluoxetine, or placebo for 12 weeks. The investigators examined whether remission was associated with early changes in anxiety/somatization symptoms. In the National Institute of Mental Health study, significant associations were observed between remission and early improvement in the anxiety (psychic) item (sertraline arm), somatic (gastrointestinal item; Hypericum arm), and somatic (general) symptoms (placebo arm). None of the three treatment arms of the Massachusetts General Hospital study showed significant associations between anxiety/somatization symptoms and remission. When both study samples were pooled, we found associations for anxiety (psychic; selective serotonin reuptake inhibitors arm), somatic (gastrointestinal), and hypochondriasis (Hypericum arm), and anxiety (psychic) and somatic (general) symptoms (placebo arm). In the entire sample, remission was associated with the improvement in the anxiety (psychic), somatic (gastrointestinal), and somatic (general) items. The number and the type of anxiety/somatization items associated with remission varied depending on the intervention. Early scrutiny of the Hamilton Depression Rating Scale-17 anxiety/somatization items may help to predict remission of major depressive disorder.
Honey, Garry D; O'loughlin, Chris; Turner, Danielle C; Pomarol-Clotet, Edith; Corlett, Philip R; Fletcher, Paul C
2006-02-01
Ketamine is increasingly used to model the cognitive deficits and symptoms of schizophrenia. We investigated the extent to which ketamine administration in healthy volunteers reproduces the deficits in episodic recognition memory and agency source monitoring reported in schizophrenia. Intravenous infusions of placebo or 100 ng/ml ketamine were administered to 12 healthy volunteers in a double-blind, placebo-controlled, randomized, within-subjects study. In response to presented words, the subject or experimenter performed a deep or shallow encoding task, providing a 2(drug) x 2(depth of processing) x 2(agency) factorial design. At test, subjects discriminated old/new words, and recalled the sources (task and agent). Data were analyzed using multinomial modelling to identify item recognition, source memory for agency and task, and guessing biases. Under ketamine, item recognition and cued recall of deeply encoded items were impaired, replicating previous findings. In contrast to schizophrenia, there was a reduced tendency to externalize agency source guessing biases under ketamine. While the recognition memory deficit observed with ketamine is consistent with previous work and with schizophrenia, the changes in source memory differ from those reported in schizophrenic patients. This difference may account for the pattern of psychopathology induced by ketamine.
NASA Astrophysics Data System (ADS)
Armstrong-Hall, Judy Gail
The purpose of this study was to apply the Hunter-Gatherer Theory of sex spatial skills to responses to individual questions by eighth grade students on the Science component of the Michigan Educational Assessment Program (MEAP) to determine if sex bias was inherent in the test. The Hunter-Gatherer Theory on Spatial Sex Differences, an original theory, that suggested a spatial dimorphism concept with female spatial skill of pattern recall of unconnected items and male spatial skills requiring mental movement. This is the first attempt to apply the Hunter-Gatherer Theory on Spatial Sex Differences to a standardized test. An overall hypothesis suggested that the Hunter-Gatherer Theory of Spatial Sex Differences could predict that males would perform better on problems involving mental movement and females would do better on problems involving the pattern recall of unconnected items. Responses to questions on the 1994-95 MEAP requiring the use of male spatial skills and female spatial skills were analyzed for 5,155 eighth grade students. A panel composed of five educators and a theory developer determined which test items involved the use of male and female spatial skills. A MANOVA, using a random sample of 20% of the 5,155 students to compare male and female correct scores, was statistically significant, with males having higher scores on male spatial skills items and females having higher scores on female spatial skills items. Pearson product moment correlation analyses produced a positive correlation for both male and female performance on both types of spatial skills. The Hunter-Gatherer Theory of Spatial Sex Differences appears to be able to predict that males could perform better on the problems involving mental movement and females could perform better on problems involving the pattern recall of unconnected items. Recommendations for further research included: examination of male/female spatial skill differences at early elementary and high school levels to determine impact of gender on difficulties in solving spatial problems; investigation of the relationship between dominant female spatial skills for students diagnosed with ADHD; study effects of teaching male spatial skills to female students starting in early elementary school to determine the effect on standardized testing.
Design and validation of a questionnaire to assess organizational culture in French hospital wards.
Saillour-Glénisson, F; Domecq, S; Kret, M; Sibe, M; Dumond, J P; Michel, P
2016-09-17
Although many organizational culture questionnaires have been developed, there is a lack of any validated multidimensional questionnaire assessing organizational culture at hospital ward level and adapted to health care context. Facing the lack of an appropriate tool, a multidisciplinary team designed and validated a dimensional organizational culture questionnaire for healthcare settings to be administered at ward level. A database of organizational culture items and themes was created after extensive literature review. Items were regrouped into dimensions and subdimensions (classification validated by experts). Pre-test and face validation was conducted with 15 health care professionals. In a stratified cluster random sample of hospitals, the psychometric validation was conducted in three phases on a sample of 859 healthcare professionals from 36 multidisciplinary medicine services: 1) the exploratory phase included a description of responses' saturation levels, factor and correlations analyses and an internal consistency analysis (Cronbach's alpha coefficient); 2) confirmatory phase used the Structural Equation Modeling (SEM); 3) reproducibility was studied by a test-retest. The overall response rate was 80 %; the completion average was 97 %. The metrological results were: a global Cronbach's alpha coefficient of 0.93, higher than 0.70 for 12 sub-dimensions; all Dillon-Goldstein's rho coefficients higher than 0.70; an excellent quality of external model with a Goodness of Fitness (GoF) criterion of 0.99. Seventy percent of the items had a reproducibility ranging from moderate (Intra-Class Coefficient between 50 and 70 % for 25 items) to good (ICC higher than 70 % for 33 items). COMEt (Contexte Organisationnel et Managérial en Etablissement de Santé) questionnaire is a validated multidimensional organizational culture questionnaire made of 6 dimensions, 21 sub-dimensions and 83 items. It is the first dimensional organizational culture questionnaire, specific to healthcare context, for a unit level assessment showing robust psychometric properties (validity and reliability). This tool is suited for research purposes, especially for assessing organizational context in research analysing the effectiveness of hospital quality improvement strategies. Our tool is also suited for an overall assessment of ward culture and could be a powerful trigger to improve management and clinical performance. Its psychometric properties in other health systems need to be tested.
Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra
2012-03-13
Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
NASA Astrophysics Data System (ADS)
Wang, Lei; Xiong, Chuang; Wang, Xiaojun; Li, Yunlong; Xu, Menghui
2018-04-01
Considering that multi-source uncertainties from inherent nature as well as the external environment are unavoidable and severely affect the controller performance, the dynamic safety assessment with high confidence is of great significance for scientists and engineers. In view of this, the uncertainty quantification analysis and time-variant reliability estimation corresponding to the closed-loop control problems are conducted in this study under a mixture of random, interval, and convex uncertainties. By combining the state-space transformation and the natural set expansion, the boundary laws of controlled response histories are first confirmed with specific implementation of random items. For nonlinear cases, the collocation set methodology and fourth Rounge-Kutta algorithm are introduced as well. Enlightened by the first-passage model in random process theory as well as by the static probabilistic reliability ideas, a new definition of the hybrid time-variant reliability measurement is provided for the vibration control systems and the related solution details are further expounded. Two engineering examples are eventually presented to demonstrate the validity and applicability of the methodology developed.
Converging evidence for control of color-word Stroop interference at the item level.
Bugg, Julie M; Hutchison, Keith A
2013-04-01
Prior studies have shown that cognitive control is implemented at the list and context levels in the color-word Stroop task. At first blush, the finding that Stroop interference is reduced for mostly incongruent items as compared with mostly congruent items (i.e., the item-specific proportion congruence [ISPC] effect) appears to provide evidence for yet a third level of control, which modulates word reading at the item level. However, evidence to date favors the view that ISPC effects reflect the rapid prediction of high-contingency responses and not item-specific control. In Experiment 1, we first show that an ISPC effect is obtained when the relevant dimension (i.e., color) signals proportion congruency, a problematic pattern for theories based on differential response contingencies. In Experiment 2, we replicate and extend this pattern by showing that item-specific control settings transfer to new stimuli, ruling out alternative frequency-based accounts. In Experiment 3, we revert to the traditional design in which the irrelevant dimension (i.e., word) signals proportion congruency. Evidence for item-specific control, including transfer of the ISPC effect to new stimuli, is apparent when 4-item sets are employed but not when 2-item sets are employed. We attribute this pattern to the absence of high-contingency responses on incongruent trials in the 4-item set. These novel findings provide converging evidence for reactive control of color-word Stroop interference at the item level, reveal theoretically important factors that modulate reliance on item-specific control versus contingency learning, and suggest an update to the item-specific control account (Bugg, Jacoby, & Chanani, 2011).
A semi-parametric within-subject mixture approach to the analyses of responses and response times.
Molenaar, Dylan; Bolsinova, Maria; Vermunt, Jeroen K
2018-05-01
In item response theory, modelling the item response times in addition to the item responses may improve the detection of possible between- and within-subject differences in the process that resulted in the responses. For instance, if respondents rely on rapid guessing on some items but not on all, the joint distribution of the responses and response times will be a multivariate within-subject mixture distribution. Suitable parametric methods to detect these within-subject differences have been proposed. In these approaches, a distribution needs to be assumed for the within-class response times. In this paper, it is demonstrated that these parametric within-subject approaches may produce false positives and biased parameter estimates if the assumption concerning the response time distribution is violated. A semi-parametric approach is proposed which resorts to categorized response times. This approach is shown to hardly produce false positives and parameter bias. In addition, the semi-parametric approach results in approximately the same power as the parametric approach. © 2017 The British Psychological Society.
Differential item functioning magnitude and impact measures from item response theory models.
Kleinman, Marjorie; Teresi, Jeanne A
2016-01-01
Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
Koehler, K M; Cunningham-Sabo, L; Lambert, L C; McCalman, R; Skipper, B J; Davis, S M
2000-02-01
Brief dietary assessment instruments are needed to evaluate behavior changes of participants in dietary intervention programs. The purpose of this project was to design and validate an instrument for children participating in Pathways to Health, a culturally appropriate, cancer prevention curriculum. Validation of a brief food selection instrument, Yesterday's Food Choices (YFC), which contained 33 questions about foods eaten the previous day with response choices of yes, no, or not sure. Reference data for validation were 24-hour dietary recalls administered individually to 120 students selected randomly. The YFC and 24-hour dietary recalls were administered to American Indian children in fifth- and seventh-grade classes in the Southwest United States. Dietary recalls were coded for food items in the YFC and results were compared for each item using percentage agreement and the kappa statistic. Percentage agreement for all items was greater than 60%; for most items it was greater than 70%, and for several items it was greater than 80%. The amount of agreement beyond that explained by chance (kappa statistic) was generally small. Three items showed substantial agreement beyond chance (kappa > or = 0.6); 2 items showed moderate agreement (kappa = 0.40 to 0.59) most items showed fair agreement (kappa = 0.20 to 0.39). The food items showing substantial agreement were hot or cold cereal, low-fat milk, and mutton or chile stew. Fried or scrambled eggs and deep-fried foods showed moderate agreement beyond chances. Previous development and validation of brief food selection instruments for children participating in health promotion programs has had limited success. In this study, instrument-related factors that apparently contributed to poor agreement between data from the YFC and 24-hour dietary recall were inclusion of categories of foods vs specific foods; food knowledge, preparation, and vocabulary, item length, and overreporting of attractive foods. Collecting and scoring the 24-hour recall data may also have contributed to poor agreement. Further development of brief instruments for evaluating changes in children's behavior in dietary programs is necessary. Factors related to the YFC that need further development may be issues that are also important in the development of effective, brief dietary assessments for children as individual clients or patients.