Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka
2016-01-01
Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132
Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Yutaka, Ono; Furukawa, Toshiaki A.
2017-01-01
Background Several recent studies have shown that total scores on depressive symptom measures in a general population approximate an exponential pattern except for the lower end of the distribution. Furthermore, we confirmed that the exponential pattern is present for the individual item responses on the Center for Epidemiologic Studies Depression Scale (CES-D). To confirm the reproducibility of such findings, we investigated the total score distribution and item responses of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative study. Methods Data were drawn from the National Survey of Midlife Development in the United States (MIDUS), which comprises four subsamples: (1) a national random digit dialing (RDD) sample, (2) oversamples from five metropolitan areas, (3) siblings of individuals from the RDD sample, and (4) a national RDD sample of twin pairs. K6 items are scored using a 5-point scale: “none of the time,” “a little of the time,” “some of the time,” “most of the time,” and “all of the time.” The pattern of total score distribution and item responses were analyzed using graphical analysis and exponential regression model. Results The total score distributions of the four subsamples exhibited an exponential pattern with similar rate parameters. The item responses of the K6 approximated a linear pattern from “a little of the time” to “all of the time” on log-normal scales, while “none of the time” response was not related to this exponential pattern. Discussion The total score distribution and item responses of the K6 showed exponential patterns, consistent with other depressive symptom scales. PMID:28289560
Tomitaka, Shinichiro; Kawasaki, Yohei; Ide, Kazuki; Yamada, Hiroshi; Miyake, Hirotsugu; Furukawa, Toshiaki A; Furukaw, Toshiaki A
2016-01-01
In a previous study, we reported that the distribution of total depressive symptoms scores according to the Center for Epidemiologic Studies Depression Scale (CES-D) in a general population is stable throughout middle adulthood and follows an exponential pattern except for at the lowest end of the symptom score. Furthermore, the individual distributions of 16 negative symptom items of the CES-D exhibit a common mathematical pattern. To confirm the reproducibility of these findings, we investigated the distribution of total depressive symptoms scores and 16 negative symptom items in a sample of Japanese employees. We analyzed 7624 employees aged 20-59 years who had participated in the Northern Japan Occupational Health Promotion Centers Collaboration Study for Mental Health. Depressive symptoms were assessed using the CES-D. The CES-D contains 20 items, each of which is scored in four grades: "rarely," "some," "much," and "most of the time." The descriptive statistics and frequency curves of the distributions were then compared according to age group. The distribution of total depressive symptoms scores appeared to be stable from 30-59 years. The right tail of the distribution for ages 30-59 years exhibited a linear pattern with a log-normal scale. The distributions of the 16 individual negative symptom items of the CES-D exhibited a common mathematical pattern which displayed different distributions with a boundary at "some." The distributions of the 16 negative symptom items from "some" to "most" followed a linear pattern with a log-normal scale. The distributions of the total depressive symptoms scores and individual negative symptom items in a Japanese occupational setting show the same patterns as those observed in a general population. These results show that the specific mathematical patterns of the distributions of total depressive symptoms scores and individual negative symptom items can be reproduced in an occupational population.
Tomitaka, Shinichiro; Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A; Ono, Yutaka
2016-01-01
Previously, we proposed a model for ordinal scale scoring in which individual thresholds for each item constitute a distribution by each item. This lead us to hypothesize that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores follow a common mathematical model, which is expressed as the product of the frequency of the total depressive symptom scores and the probability of the cumulative distribution function of each item threshold. To verify this hypothesis, we investigated the boundary curves of the distribution of total depressive symptom scores in a general population. Data collected from 21,040 subjects who had completed the Center for Epidemiologic Studies Depression Scale (CES-D) questionnaire as part of a national Japanese survey were analyzed. The CES-D consists of 20 items (16 negative items and four positive items). The boundary curves of adjacent item scores in the distribution of total depressive symptom scores for the 16 negative items were analyzed using log-normal scales and curve fitting. The boundary curves of adjacent item scores for a given symptom approximated a common linear pattern on a log normal scale. Curve fitting showed that an exponential fit had a markedly higher coefficient of determination than either linear or quadratic fits. With negative affect items, the gap between the total score curve and boundary curve continuously increased with increasing total depressive symptom scores on a log-normal scale, whereas the boundary curves of positive affect items, which are not considered manifest variables of the latent trait, did not exhibit such increases in this gap. The results of the present study support the hypothesis that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores commonly follow the predicted mathematical model, which was verified to approximate an exponential mathematical pattern.
Kawasaki, Yohei; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka
2016-01-01
Background Previously, we proposed a model for ordinal scale scoring in which individual thresholds for each item constitute a distribution by each item. This lead us to hypothesize that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores follow a common mathematical model, which is expressed as the product of the frequency of the total depressive symptom scores and the probability of the cumulative distribution function of each item threshold. To verify this hypothesis, we investigated the boundary curves of the distribution of total depressive symptom scores in a general population. Methods Data collected from 21,040 subjects who had completed the Center for Epidemiologic Studies Depression Scale (CES-D) questionnaire as part of a national Japanese survey were analyzed. The CES-D consists of 20 items (16 negative items and four positive items). The boundary curves of adjacent item scores in the distribution of total depressive symptom scores for the 16 negative items were analyzed using log-normal scales and curve fitting. Results The boundary curves of adjacent item scores for a given symptom approximated a common linear pattern on a log normal scale. Curve fitting showed that an exponential fit had a markedly higher coefficient of determination than either linear or quadratic fits. With negative affect items, the gap between the total score curve and boundary curve continuously increased with increasing total depressive symptom scores on a log-normal scale, whereas the boundary curves of positive affect items, which are not considered manifest variables of the latent trait, did not exhibit such increases in this gap. Discussion The results of the present study support the hypothesis that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores commonly follow the predicted mathematical model, which was verified to approximate an exponential mathematical pattern. PMID:27761346
Meijer, Rob R; Niessen, A Susan M; Tendeiro, Jorge N
2016-02-01
Although there are many studies devoted to person-fit statistics to detect inconsistent item score patterns, most studies are difficult to understand for nonspecialists. The aim of this tutorial is to explain the principles of these statistics for researchers and clinicians who are interested in applying these statistics. In particular, we first explain how invalid test scores can be detected using person-fit statistics; second, we provide the reader practical examples of existing studies that used person-fit statistics to detect and to interpret inconsistent item score patterns; and third, we discuss a new R-package that can be used to identify and interpret inconsistent score patterns. © The Author(s) 2015.
Meijer, Rob R; Egberink, Iris J L; Emons, Wilco H M; Sijtsma, Klaas
2008-05-01
We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985) Self-Perception Profile for Children (Harter, 1985) in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.
ERIC Educational Resources Information Center
Ebuoh, Casmir N.; Ezeudu, S. A.
2015-01-01
The study investigated the effects of scoring by section, use of independent scorers and conventional patterns on scorer reliability in Biology essay tests. It was revealed from literature review that conventional pattern of scoring all items at a time in essay tests had been criticized for not being reliable. The study was true experimental study…
The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models
ERIC Educational Resources Information Center
Lee, Wooyeol; Cho, Sun-Joo
2017-01-01
Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
ERIC Educational Resources Information Center
Meijer, Rob R.
2004-01-01
Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…
More Reasons to be Straightforward: Findings and Norms for Two Scales Relevant to Social Anxiety
Rodebaugh, Thomas L.; Heimberg, Richard G.; Brown, Patrick J.; Fernandez, Katya C.; Blanco, Carlos; Schneier, Franklin R.; Liebowitz, Michael R.
2011-01-01
The validity of both the Social Interaction Anxiety Scale and Brief Fear of Negative Evaluation scale has been well-supported, yet the scales have a small number of reverse-scored items that may detract from the validity of their total scores. The current study investigates two characteristics of participants that may be associated with compromised validity of these items: higher age and lower levels of education. In community and clinical samples, the validity of each scale's reverse-scored items was moderated by age, years of education, or both. The straightforward items did not show this pattern. To encourage the use of the straightforward items of these scales, we provide normative data from the same samples as well as two large student samples. We contend that although response bias can be a substantial problem, the reverse-scored questions of these scales do not solve that problem and instead decrease overall validity. PMID:21388781
Dietary quality among men and women in 187 countries in 1990 and 2010: a systematic assessment
Imamura, Fumiaki; Micha, Renata; Khatibzadeh, Shahab; Fahimi, Saman; Shi, Peilin; Powles, John; Mozaffarian, Dariush
2015-01-01
Summary Background Healthy dietary patterns are a global priority to reduce non-communicable diseases. Yet neither worldwide patterns of diets nor their trends with time are well established. We aimed to characterise global changes (or trends) in dietary patterns nationally and regionally and to assess heterogeneity by age, sex, national income, and type of dietary pattern. Methods In this systematic assessment, we evaluated global consumption of key dietary items (foods and nutrients) by region, nation, age, and sex in 1990 and 2010. Consumption data were evaluated from 325 surveys (71·7% nationally representative) covering 88·7% of the global adult population. Two types of dietary pattern were assessed: one reflecting greater consumption of ten healthy dietary items and the other based on lesser consumption of seven unhealthy dietary items. The mean intakes of each dietary factor were divided into quintiles, and each quintile was assigned an ordinal score, with higher scores being equivalent to healthier diets (range 0–100). The dietary patterns were assessed by hierarchical linear regression including country, age, sex, national income, and time as exploratory variables. Findings From 1990 to 2010, diets based on healthy items improved globally (by 2·2 points, 95% uncertainty interval (UI) 0·9 to 3·5), whereas diets based on unhealthy items worsened (−2·5, −3·3 to −1·7). In 2010, the global mean scores were 44·0 (SD 10·5) for the healthy pattern and 52·1 (18·6) for the unhealthy pattern, with weak intercorrelation (r=–0·08) between countries. On average, better diets were seen in older adults compared with younger adults, and in women compared with men (p<0·0001 each). Compared with low-income nations, high-income nations had better diets based on healthy items (+2·5 points, 95% UI 0·3 to 4·1), but substantially poorer diets based on unhealthy items (−33·0, −37·8 to −28·3). Diets and their trends were very heterogeneous across the world regions. For example, both types of dietary patterns improved in high-income countries, but worsened in some low-income countries in Africa and Asia. Middle-income countries showed the largest improvement in dietary patterns based on healthy items, but the largest deterioration in dietary patterns based on unhealthy items. Interpretation Consumption of healthy items improved, while consumption of unhealthy items worsened across the world, with heterogeneity across regions and countries. These global data provide the best estimates to date of nutrition transitions across the world and inform policies and priorities for reducing the health and economic burdens of poor diet quality. Funding The Bill & Melinda Gates Foundation and Medical Research Council. PMID:25701991
Development of a PROMIS item bank to measure pain interference.
Amtmann, Dagmar; Cook, Karon F; Jensen, Mark P; Chen, Wen-Hung; Choi, Seung; Revicki, Dennis; Cella, David; Rothrock, Nan; Keefe, Francis; Callahan, Leigh; Lai, Jin-Shei
2010-07-01
This paper describes the psychometric properties of the PROMIS-pain interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on the review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue=35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96-0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p<0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available. Copyright 2010 International Association for the Study of Pain. All rights reserved.
Revised scoring and improved reliability for the Communication Patterns Questionnaire.
Crenshaw, Alexander O; Christensen, Andrew; Baucom, Donald H; Epstein, Norman B; Baucom, Brian R W
2017-07-01
The Communication Patterns Questionnaire (CPQ; Christensen, 1987) is a widely used self-report measure of couple communication behavior and is well validated for assessing the demand/withdraw interaction pattern, which is a robust predictor of poor relationship and individual outcomes (Schrodt, Witt, & Shimkowski, 2014). However, no studies have examined the CPQ's factor structure using analytic techniques sufficient by modern standards, nor have any studies replicated the factor structure using additional samples. Further, the current scoring system uses fewer than half of the total items for its 4 subscales, despite the existence of unused items that have content conceptually consistent with those subscales. These characteristics of the CPQ have likely contributed to findings that subscale scores are often troubled by suboptimal psychometric properties such as low internal reliability (e.g., Christensen, Eldridge, Catta-Preta, Lim, & Santagata, 2006). The present study uses exploratory and confirmatory factor analyses on 4 samples to reexamine the factor structure of the CPQ to improve scale score reliability and to determine if including more items in the subscales is warranted. Results indicate that a 3-factor solution (constructive communication and 2 demand/withdraw scales) provides the best fit for the data. That factor structure was confirmed in the replication samples. Compared with the original scales, the revised scales include additional items that expand the conceptual range of the constructs, substantially improve reliability of scale scores, and demonstrate stronger associations with relationship satisfaction and sensitivity to change in therapy. Implications for research and treatment are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
ERIC Educational Resources Information Center
Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul
2012-01-01
The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…
More reasons to be straightforward: findings and norms for two scales relevant to social anxiety.
Rodebaugh, Thomas L; Heimberg, Richard G; Brown, Patrick J; Fernandez, Katya C; Blanco, Carlos; Schneier, Franklin R; Liebowitz, Michael R
2011-06-01
The validity of both the Social Interaction Anxiety Scale and Brief Fear of Negative Evaluation scale has been well-supported, yet the scales have a small number of reverse-scored items that may detract from the validity of their total scores. The current study investigates two characteristics of participants that may be associated with compromised validity of these items: higher age and lower levels of education. In community and clinical samples, the validity of each scale's reverse-scored items was moderated by age, years of education, or both. The straightforward items did not show this pattern. To encourage the use of the straightforward items of these scales, we provide normative data from the same samples as well as two large student samples. We contend that although response bias can be a substantial problem, the reverse-scored questions of these scales do not solve that problem and instead decrease overall validity. Copyright © 2011 Elsevier Ltd. All rights reserved.
Makary, A T; Testa, R; Tonge, B J; Einfeld, S L; Mohr, C; Gray, K M
2015-08-01
Studies on adaptive behaviour and ageing in adults with Down syndrome (DS) (without dementia) have typically analysed age-related change in terms of the total item scores on questionnaires. This research extends the literature by investigating whether the age-related changes in adaptive abilities could be differentially attributed to changes in the number or severity (intensity) of behavioural questionnaire items endorsed. The Adaptive Behaviour Assessment System-II Adult (ABAS-II Adult) was completed by parents and caregivers of 53 adults with DS aged between 16 and 56 years. Twenty adults with DS and their parents/caregivers were a part of a longitudinal study, which provided two time points of data. In addition 33 adults with DS and their parents/caregivers from a cross-sectional study were included. Random effects regression analyses were used to examine the patterns in item scores associated with ageing. Increasing age was found to be significantly associated with lower adaptive behaviour abilities for all the adaptive behaviour composite scores, expect for the practical composite. These associations were entirely related to fewer ABAS-II Adult items being selected as present for the older participants, as opposed to the scores being attributable to lower item severity. This study provides evidence for a differential pattern of age-related change for various adaptive behaviour skills in terms of range, but not severity. Possible reasons for this pattern will be discussed. Overall, these findings suggest that adults with DS may benefit from additional support in terms of their social and conceptual abilities as they age. © 2014 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Do Examinees Understand Score Reports for Alternate Methods of Scoring Computer Based Tests?
ERIC Educational Resources Information Center
Whittaker, Tiffany A.; Williams, Natasha J.; Dodd, Barbara G.
2011-01-01
This study assessed the interpretability of scaled scores based on either number correct (NC) scoring for a paper-and-pencil test or one of two methods of scoring computer-based tests: an item pattern (IP) scoring method and a method based on equated NC scoring. The equated NC scoring method for computer-based tests was proposed as an alternative…
Markussen, Marianne S; Veierød, Marit B; Sakhi, Amrit K; Ellingjord-Dale, Merete; Blomhoff, Rune; Ursin, Giske; Andersen, Lene F
2015-02-28
A number of studies have examined dietary patterns in various populations. However, to study to what extent such patterns capture meaningful differences in consumption of foods is of interest. In the present study, we identified important dietary patterns in Norwegian postmenopausal women (age 50-69 years, n 361), and evaluated these patterns by examining their associations with plasma carotenoids. Diet was assessed by a 253-item FFQ. These 253 food items were categorised into forty-six food groups, and dietary patterns were identified using principal component analysis. We used the partial correlation coefficient (r(adj)) and multiple linear regression analysis to examine the associations between the dietary patterns and the plasma carotenoids α-carotene, β-carotene, β-cryptoxanthin, lutein, lycopene and zeaxanthin. Overall, four dietary patterns were identified: the 'Western'; 'Vegetarian'; 'Continental'; 'High-protein'. The 'Western' dietary pattern scores were significantly inversely correlated with plasma lutein, zeaxanthin, lycopene and total carotenoids (-0·25 ≤ r(adj) ≤ -0·13). The 'Vegetarian' dietary pattern scores were significantly positively correlated with all the plasma carotenoids (0·15 ≤ r(adj) ≤ 0·24). The 'Continental' dietary pattern scores were significantly inversely correlated with plasma lutein and α-carotene (r(adj) = -0·13). No significant association between the 'High-protein' dietary pattern scores and the plasma carotenoids was found. In conclusion, the healthy dietary pattern, the 'Vegetarian' pattern, is associated with a more favourable profile of the plasma carotenoids than our unhealthy dietary patterns, the 'Western' and 'Continental' patterns.
The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length
ERIC Educational Resources Information Center
Tendeiro, Jorge N.; Meijer, Rob R.
2013-01-01
To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test's total score. Vector x is to be considered…
Osborne, Nikola K P; Taylor, Michael C; Healey, Matthew; Zajac, Rachel
2016-03-01
It is becoming increasingly apparent that contextual information can exert a considerable influence on decisions about forensic evidence. Here, we explored accuracy and contextual influence in bloodstain pattern classification, and how these variables might relate to analyst characteristics. Thirty-nine bloodstain pattern analysts with varying degrees of experience each completed measures of compliance, decision-making style, and need for closure. Analysts then examined a bloodstain pattern without any additional contextual information, and allocated votes to listed pattern types according to favoured and less favoured classifications. Next, if they believed it would assist with their classification, analysts could request items of contextual information - from commonly encountered sources of information in bloodstain pattern analysis - and update their vote allocation. We calculated a shift score for each item of contextual information based on vote reallocation. Almost all forms of contextual information influenced decision-making, with medical findings leading to the highest shift scores. Although there was a small positive association between shift scores and the degree to which analysts displayed an intuitive decision-making style, shift scores did not vary meaningfully as a function of experience or the other characteristics measured. Almost all of the erroneous classifications were made by novice analysts. Copyright © 2016 The Chartered Society of Forensic Sciences. Published by Elsevier Ireland Ltd. All rights reserved.
Rasch Based Analysis of Oral Proficiency Test Data.
ERIC Educational Resources Information Center
Nakamura, Yuji
2001-01-01
This paper examines the rating scale data of oral proficiency tests analyzed by a Rasch Analysis focusing on an item map and factor analysis. In discussing the item map, the difficulty order of six items and students' answering patterns are analyzed using descriptive statistics and measures of central tendency of test scores. The data ranks the…
Item Analyses of Memory Differences
Salthouse, Timothy A.
2017-01-01
Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285
ERIC Educational Resources Information Center
Sullins, Walter L.
Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…
Wolfe, Edward W; McGill, Michael T
2011-01-01
This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.
Psychometric Properties of IRT Proficiency Estimates
ERIC Educational Resources Information Center
Kolen, Michael J.; Tong, Ye
2010-01-01
Psychometric properties of item response theory proficiency estimates are considered in this paper. Proficiency estimators based on summed scores and pattern scores include non-Bayes maximum likelihood and test characteristic curve estimators and Bayesian estimators. The psychometric properties investigated include reliability, conditional…
Determinants of ante-partum depression: a multicenter study.
Balestrieri, Matteo; Matteo, Balestrieri; Isola, Miriam; Miriam, Isola; Bisoffi, Giulia; Giulia, Bisoffi; Calò, Salvatore; Salvatore, Calò; Conforti, Anita; Anita, Conforti; Driul, Lorenza; Lorenza, Driul; Marchesoni, Diego; Diego, Marchesoni; Petrosemolo, Paola; Paola, Petrosemolo; Rossi, Michela; Michela, Rossi; Zito, Adriana; Adriana, Zito; Zorzenone, Stefania; Stefania, Zorzenone; Di Sciascio, Guido; Guido, Di Sciascio; Leone, Roberto; Roberto, Leone; Bellantuono, Cesario; Cesario, Bellantuono
2012-12-01
Ante-partum depression (APD) is usually defined as a non-psychotic depressive episode of mild to moderate severity, beginning in or extending into pregnancy. APD has received less attention than postpartum depression. This is a cross-sectional study carried out in the Obstetrics and Gynaecology (OG) departments of four different general hospitals in Italy. Women attending consecutively the OG departments for their first ultrasound examination were asked to fill in the Edinburgh Postnatal Depression Scale (EPDS) in its Italian validated version. We used the total scores of the EPDS as a continuous variable for univariate and linear regression analyses; in accordance with the literature, the item analysis of EPDS was carried out by classifying the sample as women with "no depression" (scores 0-9), "possible depression" (scores 10-12), "probable depression" (scores 13+) and "probable APD" (scores 15+). The number of women recruited was 1,608. The EPDS assessment classified 10.9 % of the women as possibly depressed, 8.3 % as probably depressed and 4.7 % probably affected from an APD. EPDS score distribution was associated with nationality (higher scores for foreigners), cohabitation (higher scores for women living with friends or in a community), occupation (higher scores for housewives), past episodes of depression and use of herbal drugs. Non-depressed women had significantly lower values on all ten items as compared with depressed women, however, the pattern of item distribution on the EPDS scale remained similar across depression severity groups. In all four groups item 4 (anxious depression) attained the highest scores, while item 10 (suicidality) attained the lowest scores.
Relative validity of a tool to measure food acculturation in children of Mexican descent.
Vera-Becerra, Luz Elvia; Lopez, Martha L; Kaiser, Lucia L
2016-02-01
The purpose of this study was to examine relative validity of a food frequency questionnaire (FFQ) to measure food acculturation in young Mexican-origin children. In 2006, Spanish-speaking staff interviewed mothers in a community-based sample of households from Ventura, California (US) (n = 95) and Guanajuato, Mexico (MX) (n = 200). Data included two 24-h dietary recalls (24-DR); a 30-item FFQ; and anthropometry of the children. To measure construct, convergent, and discriminant validity, data analyses included factor analysis, Spearman correlations, t-test, respectively. Factor analysis revealed two constructs: 1) a US food pattern including hamburgers, pizza, hot dogs, fried chicken, juice, cereal, pastries, lower fat milk, quesadillas, and American cheese and 2) a MX food pattern including tortillas, fried beans, rice/noodles, whole milk, and pan dulce (sweet bread). Out of 22 food items that could be compared across the FFQ and mean 24-DRs, 17 were significantly, though weakly, correlated (highest r = 0.62, for whole milk). The mean US food pattern score was significantly higher, and the MX food pattern score, lower in US children than in MX children (p < 0.0001). After adjusting for child's age and gender; mother's education; and household size, the US food pattern score was positively related to body mass index (BMI) z-scores (beta coefficient: +0.29, p = - 0.004), whereas the MX food pattern score was negatively related to BMI z-scores (beta coefficient: -0.28, p = 0.002). This tool may be useful to evaluate nutrition education interventions to prevent childhood obesity on both sides of the border. Copyright © 2015 Elsevier Ltd. All rights reserved.
Semantic representation in the white matter pathway
Fang, Yuxing; Wang, Xiaosha; Zhong, Suyu; Song, Luping; Han, Zaizhu; Gong, Gaolang
2018-01-01
Object conceptual processing has been localized to distributed cortical regions that represent specific attributes. A challenging question is how object semantic space is formed. We tested a novel framework of representing semantic space in the pattern of white matter (WM) connections by extending the representational similarity analysis (RSA) to structural lesion pattern and behavioral data in 80 brain-damaged patients. For each WM connection, a neural representational dissimilarity matrix (RDM) was computed by first building machine-learning models with the voxel-wise WM lesion patterns as features to predict naming performance of a particular item and then computing the correlation between the predicted naming score and the actual naming score of another item in the testing patients. This correlation was used to build the neural RDM based on the assumption that if the connection pattern contains certain aspects of information shared by the naming processes of these two items, models trained with one item should also predict naming accuracy of the other. Correlating the neural RDM with various cognitive RDMs revealed that neural patterns in several WM connections that connect left occipital/middle temporal regions and anterior temporal regions associated with the object semantic space. Such associations were not attributable to modality-specific attributes (shape, manipulation, color, and motion), to peripheral picture-naming processes (picture visual similarity, phonological similarity), to broad semantic categories, or to the properties of the cortical regions that they connected, which tended to represent multiple modality-specific attributes. That is, the semantic space could be represented through WM connection patterns across cortical regions representing modality-specific attributes. PMID:29624578
Francis, Wendy S; Baca, Yuzeth
2014-01-01
Spanish-English bilinguals (N = 144) performed free recall, serial recall and order reconstruction tasks in both English and Spanish. Long-term memory for both item and order information was worse in the less fluent language (L2) than in the more fluent language (L1). Item scores exhibited a stronger disadvantage for the L2 in serial recall than in free recall. Relative order scores were lower in the L2 for all three tasks, but adjusted scores for free and serial recall were equivalent across languages. Performance of English-speaking monolinguals (N = 72) was comparable to bilingual performance in the L1, except that monolinguals had higher adjusted order scores in free recall. Bilingual performance patterns in the L2 were consistent with the established effects of concurrent task performance on these memory tests, suggesting that the cognitive resources required for processing words in the L2 encroach on resources needed to commit item and order information to memory. These findings are also consistent with a model in which item memory is connected to the language system, order information is processed by separate mechanisms and attention can be allocated differentially to these two systems.
Comparison of scales for evaluating premenstrual symptoms in women using oral contraceptives.
Coffee, Andrea L; Kuehl, Thomas J; Sulak, Patricia J
2008-05-01
To compare two scales used in research to evaluate daily premenstrual mood symptoms during use of a monophasic oral contraceptive. Subanalysis of data from a prospective study. University-affiliated medical center. SUBJECTS; One hundred two reproductive-aged (18-48 yrs) women taking a monophasic oral contraceptive containing ethinyl estradiol and drospirenone in the standard 21-7 fashion (21 days of hormones followed by 7 days of placebo), and who had self-identified premenstrual symptoms of headache, mood changes, or pelvic pain. Subjects completed a single-item questionnaire, the Scott & White Daily Diary of Symptoms, and a multiple-item questionnaire, the Penn State Daily Symptom Report (DSR), to assess their premenstrual symptoms. The Scott & White diary used a visual analog scale of 0-10 to assess pelvic pain, headache, and mood (a composite of anxiety, depression, and irritability). The Penn State DSR contained 17 items: 10 behavioral and seven physical components, each rated on a scale of 0-4, with one item that specifically rated mood swings. Scores from the two scales were compared by using Spearman correlation coefficients, the Kendall W for concordance, and linear regression of ranked sums for study cycles. The Scott & White mood score significantly correlated with the total of the 17 items on the Penn State DSR, as well as the 10 behavioral items, the seven physical items, and the single mood-swing item (p<0.0001); specific coefficients of concordance were 0.44, 0.23, 0.10, and 0.28, respectively, and R2 values were 0.39, 0.39, 0.30, and 0.34, respectively. The daily Scott & White mood score was positively correlated with all 17 elements of the Penn State DSR (0.25-0.57). The greatest correlation was seen with the mood-swing element. Both instruments demonstrated the same patterns during the 21-7 oral contraceptive cycle, with symptoms increasing immediately before and peaking during the 7-day hormone-free interval. A single-item daily mood score using a rating scale of 0-10 was concordant with a relatively complex 17-element symptom index and demonstrated the same pattern of change during cycles of oral contraception. The simple scoring system offers an advantage, especially in clinical studies of long duration.
Scoring and Classifying Examinees Using Measurement Decision Theory
ERIC Educational Resources Information Center
Rudner, Lawrence M.
2009-01-01
This paper describes and evaluates the use of measurement decision theory (MDT) to classify examinees based on their item response patterns. The model has a simple framework that starts with the conditional probabilities of examinees in each category or mastery state responding correctly to each item. The presented evaluation investigates: (1) the…
Outlier Detection in High-Stakes Certification Testing. Research Report.
ERIC Educational Resources Information Center
Meijer, Rob R.
Recent developments of person-fit analysis in computerized adaptive testing (CAT) are discussed. Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to…
Outlier Detection in High-Stakes Certification Testing.
ERIC Educational Resources Information Center
Meijer, Rob R.
2002-01-01
Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)
Clusters of cultures: diversity in meaning of family value and gender role items across Europe.
van Vlimmeren, Eva; Moors, Guy B D; Gelissen, John P T M
2017-01-01
Survey data are often used to map cultural diversity by aggregating scores of attitude and value items across countries. However, this procedure only makes sense if the same concept is measured in all countries. In this study we argue that when (co)variances among sets of items are similar across countries, these countries share a common way of assigning meaning to the items. Clusters of cultures can then be observed by doing a cluster analysis on the (co)variance matrices of sets of related items. This study focuses on family values and gender role attitudes. We find four clusters of cultures that assign a distinct meaning to these items, especially in the case of gender roles. Some of these differences reflect response style behavior in the form of acquiescence. Adjusting for this style effect impacts on country comparisons hence demonstrating the usefulness of investigating the patterns of meaning given to sets of items prior to aggregating scores into cultural characteristics.
The stroke impairment assessment set: its internal consistency and predictive validity.
Tsuji, T; Liu, M; Sonoda, S; Domen, K; Chino, N
2000-07-01
To study the scale quality and predictive validity of the Stroke Impairment Assessment Set (SIAS) developed for stroke outcome research. Rasch analysis of the SIAS; stepwise multiple regression analysis to predict discharge functional independence measure (FIM) raw scores from demographic data, the SIAS scores, and the admission FIM scores; cross-validation of the prediction rule. Tertiary rehabilitation center in Japan. One hundred ninety stroke inpatients for the study of the scale quality and the predictive validity; a second sample of 116 stroke inpatients for the cross-validation study. Mean square fit statistics to study the degree of fit to the unidimensional model; logits to express item difficulties; discharge FIM scores for the study of predictive validity. The degree of misfit was acceptable except for the shoulder range of motion (ROM), pain, visuospatial function, and speech items; and the SIAS items could be arranged on a common unidimensional scale. The difficulty patterns were identical at admission and at discharge except for the deep tendon reflexes, ROM, and pain items. They were also similar for the right- and left-sided brain lesion groups except for the speech and visuospatial items. For the prediction of the discharge FIM scores, the independent variables selected were age, the SIAS total scores, and the admission FIM scores; and the adjusted R2 was .64 (p < .0001). Stability of the predictive equation was confirmed in the cross-validation sample (R2 = .68, p < .001). The unidimensionality of the SIAS was confirmed, and the SIAS total scores proved useful for stroke outcome prediction.
A Comment on Early Student Blunders on Computer-Based Adaptive Tests
ERIC Educational Resources Information Center
Green, Bert F.
2011-01-01
This article refutes a recent claim that computer-based tests produce biased scores for very proficient test takers who make mistakes on one or two initial items and that the "bias" can be reduced by using a four-parameter IRT model. Because the same effect occurs with pattern scores on nonadaptive tests, the effect results from IRT scoring, not…
Psychometric assessment of the IBS-D Daily Symptom Diary and Symptom Event Log.
Rosa, Kathleen; Delgado-Herrera, Leticia; Zeiher, Bernie; Banderas, Benjamin; Arbuckle, Rob; Spears, Glen; Hudgens, Stacie
2016-12-01
Diarrhea-predominant irritable bowel syndrome (IBS-D) can considerably impact patients' lives. Patient-reported symptoms are crucial in understanding the diagnosis and progression of IBS-D. This study psychometrically evaluates the newly developed IBS-D Daily Symptom Diary and Symptom Event Log (hereafter, "Event Log") according to US regulatory recommendations. A US-based observational field study was conducted to understand cross-sectional psychometric properties of the IBS-D Daily Symptom Diary and Event Log. Analyses included item descriptive statistics, item-to-item correlations, reliability, and construct validity. The IBS-D Daily Symptom Diary and Event Log had no items with excessive missing data. With the exception of two items ("frequency of gas" and "accidents"), moderate to high inter-item correlations were observed among all items of the IBS-D Daily Symptom Diary and Event Log (day 1 range 0.67-0.90). Item scores demonstrated reliability, with the exception of the "frequency of gas" and "accidents" items of the Diary and "incomplete evacuation" item of the Event Log. The pattern of correlations of the IBS-D Daily Symptom Diary and Event Log item scores with generic and disease-specific measures was as expected, moderate for similar constructs and low for dissimilar constructs, supporting construct validity. Known-groups methods showed statistically significant differences and monotonic trends in each of the IBS-D Daily Symptom Diary item scores among groups defined by patients' IBS-D severity ratings ("none"/"mild," "moderate," or "severe"/"very severe"), supporting construct validity. Initial psychometric results support the reliability and validity of the items of the IBS-D Daily Symptom Diary and Event Log.
ERIC Educational Resources Information Center
Clemens, Nathan H.; Davis, John L.; Simmons, Leslie E.; Oslund, Eric L.; Simmons, Deborah C.
2015-01-01
Standardized measures are often used as an index of students' reading comprehension and scores have important implications, particularly for students who perform below expectations. This study examined secondary-level students' patterns of responding and the prevalence and impact of non-attempted items on a timed, group-administered,…
The Disaggregation of Value-Added Test Scores to Assess Learning Outcomes in Economics Courses
ERIC Educational Resources Information Center
Walstad, William B.; Wagner, Jamie
2016-01-01
This study disaggregates posttest, pretest, and value-added or difference scores in economics into four types of economic learning: positive, retained, negative, and zero. The types are derived from patterns of student responses to individual items on a multiple-choice test. The micro and macro data from the "Test of Understanding in College…
Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.
2008-01-01
Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750
Changes in the nutritional quality of fast-food items marketed at restaurants, 2010 v. 2013.
Soo, Jackie; Harris, Jennifer L; Davison, Kirsten K; Williams, David R; Roberto, Christina A
2018-03-27
To examine the nutritional quality of menu items promoted in four (US) fast-food restaurant chains (McDonald's, Burger King, Wendy's, Taco Bell) in 2010 and 2013. Menu items pictured on signs and menu boards were recorded at 400 fast-food restaurants across the USA. The Nutrient Profile Index (NPI) was used to calculate overall nutrition scores for items (higher scores indicate greater nutritional quality) and was dichotomized to denote healthier v. less healthy items. Changes over time in NPI scores and energy of promoted foods and beverages were analysed using linear regression. Four hundred fast-food restaurants (McDonald's, Burger King, Wendy's, Taco Bell; 100 locations per chain). NPI of fast-food items marketed at fast-food restaurants. Promoted foods and beverages on general menu boards and signs remained below the 'healthier' cut-off at both time points. On general menu boards, pictured items became modestly healthier from 2010 to 2013, increasing (mean (se)) by 3·08 (0·16) NPI score points (P<0·001) and decreasing (mean (se)) by 130 (15) kJ (31·1 (3·65) kcal; P<0·001). This pattern was evident in all chains except Taco Bell, where pictured items increased in energy. Foods and beverages pictured on the kids' section showed the greatest nutritional improvements. Although promoted foods on general menu boards and signs improved in nutritional quality, beverages remained the same or became worse. Foods, and to a lesser extent, beverages, promoted on menu boards and signs in fast-food restaurants showed limited improvements in nutritional quality in 2013 v. 2010.
[A study of behavior patterns between smokers and nonsmokers].
Kim, H S
1990-04-01
Clinical and epidemiologic studies of coronary heart disease (CHD) have from time to time over the last three decades found associations between prevalence of CHD and behavioral attributes and cigarette smoking. The main purpose of this study is reduced to major risk factor of coronary heart disease through prohibition of smoking and control of behavior pattern. The subjects consisted of 120 smokers and 90 nonsmokers who were married men older than 30 years working in officers. The officers were surveyed by means of questionnaire September 26 through October 6, 1989. The Instruments used for this study was a self-administered measurement tool composed of 59 items was made through modifications of Jenkuns Activity Survey (JAS). The Data were analysed by SAS (Statistical Analysis System) program personal computer. The statistical technique used for this study were Frequency, chi 2-test, t-test, ANOVA, Pearson Correlation Coefficient. The 15 items were chosen with items above 0.3 of the factor loading in the factor analysis. In the first factor analysis 19 factors were extracted and accounted for 86% of the total variance. However when the number of factors were limited to 3 in order to derive Jenkins classification, three factors were derived. There names are Job-Involvement, Speed & Impatience, Hard-Driving. Each of them includes 21 items, 21 and 9, respectively. The results of this study were as follow: 1. The score of the smoker group and non-smoker group in Job-Involvement (t = 5.7147, p less than 0.0001), Speed & Impatience (t = 4.6756, p less than .0001), Hard-Driving (t = 8.0822, p less than .0001) and total type A behavior pattern showed statistically significant differences (t = 8.1224, p less than .0001). 2. The score of type A behavior pattern by number of cigarettes smoked daily were not statistically significant differences. 3. The score of type A behavior pattern by duration of smoking were not significant differences. It was concluded that the relationship between smokers and non-smokers of type A behavior pattern was statistically significant difference but number of cigarettes smoked daily and duration of smoking were not significant differences. Therefore this study is needed to adequate nursing intervention of type A behavior pattern in order to elevated to educational effect for prohibition of cigarette smoking.
Performance on large-scale science tests: Item attributes that may impact achievement scores
NASA Astrophysics Data System (ADS)
Gordon, Janet Victoria
Significant differences in achievement among ethnic groups persist on the eighth-grade science Washington Assessment of Student Learning (WASL). The WASL measures academic performance in science using both scenario and stand-alone question types. Previous research suggests that presenting target items connected to an authentic context, like scenario question types, can increase science achievement scores especially in underrepresented groups and thus help to close the achievement gap. The purpose of this study was to identify significant differences in performance between gender and ethnic subgroups by question type on the 2005 eighth-grade science WASL. MANOVA and ANOVA were used to examine relationships between gender and ethnic subgroups as independent variables with achievement scores on scenario and stand-alone question types as dependent variables. MANOVA revealed no significant effects for gender, suggesting that the 2005 eighth-grade science WASL was gender neutral. However, there were significant effects for ethnicity. ANOVA revealed significant effects for ethnicity and ethnicity by gender interaction in both question types. Effect sizes were negligible for the ethnicity by gender interaction. Large effect sizes between ethnicities on scenario question types became moderate to small effect sizes on stand-alone question types. This indicates the score advantage the higher performing subgroups had over the lower performing subgroups was not as large on stand-alone question types compared to scenario question types. A further comparison examined performance on multiple-choice items only within both question types. Similar achievement patterns between ethnicities emerged; however, achievement patterns between genders changed in boys' favor. Scenario question types appeared to register differences between ethnic groups to a greater degree than stand-alone question types. These differences may be attributable to individual differences in cognition, characteristics of test items themselves and/or opportunities to learn. Suggestions for future research are made.
Bifactor Structure for the Categorical Chinese Rosenberg Self-Esteem Scale.
Xu, Menglin; Leung, Shing-On
2016-10-11
Recently, the bifactor model was suggested for the latent structure of the Rosenberg Self-Esteem Scale (RSES). The present paper investigates (i) the differences among bifactor, bifactor negative and other models; (ii) the effects of treating data as both categorical vs continuous; (iii) whether a problematic item in the Chinese RSES should be removed; and (iv) whether the final scoring would be affected. With a sample of 1.734 grade 4-6 school pupils in Hong Kong, we used BIC differences in addition to the usual model fit indices, and found that there was strong evidence for using the bifactor model (RMSEA = .052, 90% CI [.043, .062], CFI = .992, TLI = .984 for 9-item RSES categorical). Little difference is found between treating data as categorical or continuous for fit indices, but the factor loading patterns are better in categorical case. Keeping a problematic item has little effect on fit indices, but would lead to unexpected negative loading. The ranking of loadings within positive and negative items across different conditions are the same, which has important effects on scoring. Loadings in the method effects in the bifactor models are all positive (p < .001), which is different from previous research. All models show similar results on scoring, and support the usual simple sum score in most practice.
Development and validation of parenting measures for body image and eating patterns in childhood.
Damiano, Stephanie R; Hart, Laura M; Paxton, Susan J
2015-01-01
Evidence-based parenting interventions are important in assisting parents to help their children develop healthy body image and eating patterns. To adequately assess the impact of parenting interventions, valid parent measures are required. The aim of this study was to develop and assess the validity and reliability of two new parent measures, the Parenting Intentions for Body image and Eating patterns in Childhood (Parenting Intentions BEC) and the Knowledge Test for Body image and Eating patterns in Childhood (Knowledge Test BEC). Participants were 27 professionals working in research or clinical treatment of body dissatisfaction or eating disorders, and 75 parents of children aged 2-6 years, who completed the measures via an online questionnaire. Seven scenarios were developed for the Parenting Intentions BEC to describe common experiences about the body and food that parents might need to respond to in front of their child. Parents ranked four behavioural intentions, derived from the current literature on parenting risk factors for body dissatisfaction and unhealthy eating patterns in children. Two subscales were created, one representing positive behavioural intentions, the other negative behavioural intentions. After piloting a larger pool of items, 13 statements were used to construct the Knowledge Test BEC. These were designed to be factual statements about the influence of parent language, media, family meals, healthy eating, and self-esteem on child eating and body image. The validity of both measures was tested by comparing parent and professional scores, and reliability was assessed by comparing parent scores over two testing occasions. Compared with parents, professionals reported significantly higher scores on the Positive Intentions subscale and significantly lower on the Negative Intentions subscale of the Parenting Intentions BEC; confirming the discriminant validity of six out of the seven scenarios. Test-retest reliability was also confirmed as parent scores on the two Parenting Intentions subscales did not differ over time. Eleven out of the 13 Knowledge Test items demonstrated sufficient discriminant validity and test-retest reliability. Overall, results indicated that the six-scenario Parenting Intentions BEC and the 11-item Knowledge Test BEC are valid and reliable measures for parents of young children.
Sharp, J L; Gough, K; Pascoe, M C; Drosdowsky, A; Chang, V T; Schofield, P
2018-07-01
The Memorial Symptom Assessment Scale Short Form (MSAS-SF) is a widely used symptom assessment instrument. Patients who self-complete the MSAS-SF have difficulty following the two-part response format, resulting in incorrectly completed responses. We describe modifications to the response format to improve useability, and rational scoring rules for incorrectly completed items. The modified MSAS-SF was completed by 311 women in our Peer and Nurse support Trial to Assist women in Gynaecological Oncology; the PeNTAGOn study. Descriptive statistics were used to summarise completion of the modified MSAS-SF, and provide symptom statistics before and after applying the rational scoring rules. Spearman's correlations with the Functional Assessment for Cancer Therapy-General (FACT-G) and Hospital Anxiety and Depression Scale (HADS) were assessed. Correct completion of the modified MSAS-SF items ranged from 91.5 to 98.7%. The rational scoring rules increased the percentage of useable responses on average 4% across all symptoms. MSAS-SF item statistics were similar with and without the scoring rules. The pattern of correlations with FACT-G and HADS was compatible with prior research. The modified MSAS-SF was useable for self-completion and responses demonstrated validity. The rational scoring rules can minimise loss of data from incorrectly completed responses. Further investigation is recommended.
Godin, Judith; Keefe, Janice; Andrew, Melissa K
2017-04-01
Missing values are commonly encountered on the Mini Mental State Examination (MMSE), particularly when administered to frail older people. This presents challenges for MMSE scoring in research settings. We sought to describe missingness in MMSEs administered in long-term-care facilities (LTCF) and to compare and contrast approaches to dealing with missing items. As part of the Care and Construction project in Nova Scotia, Canada, LTCF residents completed an MMSE. Different methods of dealing with missing values (e.g., use of raw scores, raw scores/number of items attempted, scale-level multiple imputation [MI], and blended approaches) are compared to item-level MI. The MMSE was administered to 320 residents living in 23 LTCF. The sample was predominately female (73%), and 38% of participants were aged >85 years. At least one item was missing from 122 (38.2%) of the MMSEs. Data were not Missing Completely at Random (MCAR), χ 2 (1110) = 1,351, p < 0.001. Using raw scores for those missing <6 items in combination with scale-level MI resulted in the regression coefficients and standard errors closest to item-level MI. Patterns of missing items often suggest systematic problems, such as trouble with manual dexterity, literacy, or visual impairment. While these observations may be relatively easy to take into account in clinical settings, non-random missingness presents challenges for research and must be considered in statistical analyses. We present suggestions for dealing with missing MMSE data based on the extent of missingness and the goal of analyses. Copyright © 2016 The Authors. Production and hosting by Elsevier B.V. All rights reserved.
Johnson, Timothy R; Kuhn, Kristine M
2015-12-01
This paper introduces the ltbayes package for R. This package includes a suite of functions for investigating the posterior distribution of latent traits of item response models. These include functions for simulating realizations from the posterior distribution, profiling the posterior density or likelihood function, calculation of posterior modes or means, Fisher information functions and observed information, and profile likelihood confidence intervals. Inferences can be based on individual response patterns or sets of response patterns such as sum scores. Functions are included for several common binary and polytomous item response models, but the package can also be used with user-specified models. This paper introduces some background and motivation for the package, and includes several detailed examples of its use.
Building an Evaluation Scale using Item Response Theory.
Lalor, John P; Wu, Hao; Yu, Hong
2016-11-01
Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory
Lalor, John P.; Wu, Hao; Yu, Hong
2016-01-01
Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
Neural Overlap in Item Representations Across Episodes Impairs Context Memory.
Kim, Ghootae; Norman, Kenneth A; Turk-Browne, Nicholas B
2018-06-12
We frequently encounter the same item in different contexts, and when that happens, memories of earlier encounters can get reactivated. We examined how existing memories are changed as a result of such reactivation. We hypothesized that when an item's initial and subsequent neural representations overlap, this allows the initial item to become associated with novel contextual information, interfering with later retrieval of the initial context. Specifically, we predicted a negative relationship between representational similarity across repeated experiences of an item and subsequent source memory for the initial context. We tested this hypothesis in an fMRI study, in which objects were presented multiple times during different tasks. We measured the similarity of the neural patterns in lateral occipital cortex that were elicited by the first and second presentations of objects, and related this neural overlap score to subsequent source memory. Consistent with our hypothesis, greater item-specific pattern similarity was linked to worse source memory for the initial task. In contrast, greater reactivation of the initial context was associated with better source memory. Our findings suggest that the influence of novel experiences on an existing context memory depends on how reliably a shared component (i.e., item) is represented across these episodes.
NASA Astrophysics Data System (ADS)
Armstrong-Hall, Judy Gail
The purpose of this study was to apply the Hunter-Gatherer Theory of sex spatial skills to responses to individual questions by eighth grade students on the Science component of the Michigan Educational Assessment Program (MEAP) to determine if sex bias was inherent in the test. The Hunter-Gatherer Theory on Spatial Sex Differences, an original theory, that suggested a spatial dimorphism concept with female spatial skill of pattern recall of unconnected items and male spatial skills requiring mental movement. This is the first attempt to apply the Hunter-Gatherer Theory on Spatial Sex Differences to a standardized test. An overall hypothesis suggested that the Hunter-Gatherer Theory of Spatial Sex Differences could predict that males would perform better on problems involving mental movement and females would do better on problems involving the pattern recall of unconnected items. Responses to questions on the 1994-95 MEAP requiring the use of male spatial skills and female spatial skills were analyzed for 5,155 eighth grade students. A panel composed of five educators and a theory developer determined which test items involved the use of male and female spatial skills. A MANOVA, using a random sample of 20% of the 5,155 students to compare male and female correct scores, was statistically significant, with males having higher scores on male spatial skills items and females having higher scores on female spatial skills items. Pearson product moment correlation analyses produced a positive correlation for both male and female performance on both types of spatial skills. The Hunter-Gatherer Theory of Spatial Sex Differences appears to be able to predict that males could perform better on the problems involving mental movement and females could perform better on problems involving the pattern recall of unconnected items. Recommendations for further research included: examination of male/female spatial skill differences at early elementary and high school levels to determine impact of gender on difficulties in solving spatial problems; investigation of the relationship between dominant female spatial skills for students diagnosed with ADHD; study effects of teaching male spatial skills to female students starting in early elementary school to determine the effect on standardized testing.
A knowledge-based theory of rising scores on "culture-free" tests.
Fox, Mark C; Mitchum, Ainsley L
2013-08-01
Secular gains in intelligence test scores have perplexed researchers since they were documented by Flynn (1984, 1987). Gains are most pronounced on abstract, so-called culture-free tests, prompting Flynn (2007) to attribute them to problem-solving skills availed by scientifically advanced cultures. We propose that recent-born individuals have adopted an approach to analogy that enables them to infer higher level relations requiring roles that are not intrinsic to the objects that constitute initial representations of items. This proposal is translated into item-specific predictions about differences between cohorts in pass rates and item-response patterns on the Raven's Matrices (Flynn, 1987), a seemingly culture-free test that registers the largest Flynn effect. Consistent with predictions, archival data reveal that individuals born around 1940 are less able to map objects at higher levels of relational abstraction than individuals born around 1990. Polytomous Rasch models verify predicted violations of measurement invariance, as raw scores are found to underestimate the number of analogical rules inferred by members of the earlier cohort relative to members of the later cohort who achieve the same overall score. The work provides a plausible cognitive account of the Flynn effect, furthers understanding of the cognition of matrix reasoning, and underscores the need to consider how test-takers select item responses. PsycINFO Database Record (c) 2013 APA, all rights reserved.
What Can We Learn about Auditory Processing from Adult Hearing Questionnaires?
Bamiou, Doris-Eva; Iliadou, Vasiliki Vivian; Zanchetta, Sthella; Spyridakou, Chrysa
2015-01-01
Questionnaires addressing auditory disability may identify and quantify specific symptoms in adult patients with listening difficulties. (1) To assess validity of the Speech, Spatial, and Qualities of Hearing Scale (SSQ), the (Modified) Amsterdam Inventory for Auditory Disability (mAIAD), and the Hyperacusis Questionnaire (HYP) in adult patients experiencing listening difficulties in the presence of a normal audiogram. (2) To examine which individual questionnaire items give the worse scores in clinical participants with an auditory processing disorder (APD). A prospective correlational analysis study. Clinical participants (N = 58) referred for assessment because of listening difficulties in the presence of normal audiometric thresholds to audiology/ear, nose, and throat or audiovestibular medicine clinics. Normal control participants (N = 30). The mAIAD, HYP, and the SSQ were administered to a clinical population of nonneurological adults who were referred for auditory processing (AP) assessment because of hearing complaints, in the presence of normal audiogram and cochlear function, and to a sample of age-matched normal-hearing controls, before the AP testing. Clinical participants with abnormal results in at least one ear and in at least two tests of AP (and at least one of these tests to be nonspeech) were classified as clinical APD (N = 39), and the remaining (16 of whom had a single test abnormality) as clinical non-APD (N = 19). The SSQ correlated strongly with the mAIAD and the HYP, and correlation was similar within the clinical group and the normal controls. All questionnaire total scores and subscores (except sound distinction of mAIAD) were significantly worse in the clinical APD versus the normal group, while questionnaire total scores and most subscores indicated greater listening difficulties for the clinical non-APD versus the normal subgroups. Overall, the clinical non-APD group tended to give better scores than the APD in all questionnaires administered. Correlation was strong for the worse-ear gaps-in-noise threshold with the SSQ, mAIAD, and HYP; strong to moderate for the speech in babble and left-ear dichotic digit test scores (at p < 0.01); and weak to moderate for the remaining AP tests except the frequency pattern test that did not correlate. The worse-scored items in all three questionnaires concerned speech-in-noise questions. This is similar to worse-scored items by hearing-impaired participants as reported in the literature. Worse-scored items of the clinical group also included quality aspects of listening questions from the SSQ, which most likely pertain to cognitive aspects of listening, such as ability to ignore other sounds and listening effort. Hearing questionnaires may help assess symptoms of adults with APD. The listening difficulties and needs of adults with APD to some extent overlap with those of hearing-impaired listeners, but there are significant differences. The correlation of the gaps-in-noise and duration pattern (but not frequency pattern) tests with the questionnaire scores indicates that temporal processing deficits may play an important role in clinical presentation. American Academy of Audiology.
Peyre, Hugo; Leplège, Alain; Coste, Joël
2011-03-01
Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias <2%) in all studied situations. Whereas multiple imputation and full information maximum likelihood are confirmed as reference methods, the personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.
Development of an Inconsistent Responding Scale for the Triarchic Psychopathy Measure.
Mowle, Elyse N; Kelley, Shannon E; Edens, John F; Donnellan, M Brent; Smith, Shannon Toney; Wygant, Dustin B; Sellbom, Martin
2017-08-01
Inconsistent or careless responding to self-report measures is estimated to occur in approximately 10% of university research participants and may be even more common among offender populations. Inconsistent responding may be a result of a number of factors including inattentiveness, reading or comprehension difficulties, and cognitive impairment. Many stand-alone personality scales used in applied and research settings, however, do not include validity indicators to help identify inattentive response patterns. Using multiple archival samples, the current study describes the development of an inconsistent responding scale for the Triarchic Psychopathy Measure (TriPM; Patrick, 2010), a widely used self-report measure of psychopathy. We first identified pairs of correlated TriPM items in a derivation sample (N = 2,138) and then created a total score based on the sum of the absolute value of the differences for each item pair. The resulting scale, the Triarchic Assessment Procedure for Inconsistent Responding (TAPIR), strongly differentiated between genuine TriPM protocols and randomly generated TriPM data (N = 1,000), as well as between genuine protocols and those in which 50% of the original data were replaced with random item responses. TAPIR scores demonstrated fairly consistent patterns of association with some theoretically relevant correlates (e.g., inconsistency scales embedded in other personality inventories), although not others (e.g., measures of conscientiousness) across our cross-validation samples. Tentative TAPIR cut scores that may discriminate between attentively and carelessly completed protocols are presented. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Sensory integration functions of children with cochlear implants.
Koester, AnjaLi Carrasco; Mailloux, Zoe; Coleman, Gina Geppert; Mori, Annie Baltazar; Paul, Steven M; Blanche, Erna; Muhs, Jill A; Lim, Deborah; Cermak, Sharon A
2014-01-01
OBJECTIVE. We investigated sensory integration (SI) function in children with cochlear implants (CIs). METHOD. We analyzed deidentified records from 49 children ages 7 mo to 83 mo with CIs. Records included Sensory Integration and Praxis Tests (SIPT), Sensory Processing Measure (SPM), Sensory Profile (SP), Developmental Profile 3 (DP-3), and Peabody Developmental Motor Scales (PDMS), with scores depending on participants' ages. We compared scores with normative population mean scores and with previously identified patterns of SI dysfunction. RESULTS. One-sample t tests revealed significant differences between children with CIs and the normative population on the majority of the SIPT items associated with the vestibular and proprioceptive bilateral integration and sequencing (VPBIS) pattern. Available scores for children with CIs on the SPM, SP, DP-3, and PDMS indicated generally typical ratings. CONCLUSION. SIPT scores in a sample of children with CIs reflected the VPBIS pattern of SI dysfunction, demonstrating the need for further examination of SI functions in children with CIs during occupational therapy assessment and intervention planning. Copyright © 2014 by the American Occupational Therapy Association, Inc.
Baribeau, Danielle A; Doyle-Thomas, Krissy A R; Dupuis, Annie; Iaboni, Alana; Crosbie, Jennifer; McGinn, Holly; Arnold, Paul D; Brian, Jessica; Kushki, Azadeh; Nicolson, Rob; Schachar, Russell J; Soreni, Noam; Szatmari, Peter; Anagnostou, Evdokia
2015-06-01
Several neurodevelopmental disorders are associated with social processing deficits. The objective of this study was to compare patterns of social perception abilities across obsessive-compulsive disorder (OCD), attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder (ASD), and control participants. A total of 265 children completed the Reading the Mind in the Eyes Test-Child Version (RMET). Parents or caregivers completed established trait/symptom scales. The predicted percentage of accuracy on the RMET was compared across disorders and by item difficulty and item valence (i.e., positive/negative/neutral mental states), then analyzed for associations with trait/symptom scores. The percentage of correct RMET scores varied significantly between diagnostic groups (p < .0001). On pairwise group comparisons controlling for age and sex, children with ADHD and ASD scored lower than the other groups (p < .0001). When IQ was also controlled for in the model, participants with OCD performed better than controls (p < .001), although differences between other groups were less pronounced. Participants with ASD scored lowest on easy items. Those with ASD and ADHD scored significantly lower than other groups on items with positive valence (p < .01). Greater social communication impairment and hyperactivity/impulsivity, but not OCD traits/symptoms, were associated with lower scores on the RMET, irrespective of diagnosis. Social perception abilities in neurodevelopmental disorders exist along a continuum. Children with ASD have the greatest deficits, whereas children with OCD may be hypersensitive to social information. Social communication deficits and hyperactive/impulsive traits are associated with impaired social perception abilities; these findings highlight overlapping cognitive and behavioral manifestations across disorders. Copyright © 2015 American Academy of Child and Adolescent Psychiatry. Published by Elsevier Inc. All rights reserved.
Van Lerbeirghe, J; Van Lerbeirghe, J; Van Schaeybroeck, P; Robijn, H; Rasschaert, R; Sys, J; Parlevliet, T; Hallaert, G; Van Wambeke, P; Depreitere, B
2018-01-01
The core outcome measures index (COMI) is a validated multidimensional instrument for assessing patient-reported outcome in patients with back problems. The aim of the present study is to translate the COMI into Dutch and validate it for use in native Dutch speakers with low back pain. The COMI was translated into Dutch following established guidelines and avoiding region-specific terminology. A total of 89 Dutch-speaking patients with low back pain were recruited from 8 centers, located in the Dutch-speaking part of Belgium. Patients completed a questionnaire booklet including the validated Dutch version of the Roland Morris disability questionnaire, EQ-5D, the WHOQoL-Bref, the Numeric Rating Scale (NRS) for pain, and the Dutch translation of the COMI. Two weeks later, patients completed the Dutch COMI translation again, with a transition scale assessing changes in their condition. The patterns of correlations between the individual COMI items and the validated reference questionnaires were comparable to those reported for other validated language versions of the COMI. The intraclass correlation for the COMI summary score was 0.90 (95% CI 0.84-0.94). It was 0.75 and 0.70 for the back and leg pain score, respectively. The minimum detectable change for the COMI summary score was 1.74. No significant differences were observed between repeated scores of individual COMI items or for the summary score. The reproducibility of the Dutch translation of the COMI is comparable to that of other validated spine outcome measures. The COMI items correlate well with the established item-specific scores. The Dutch translation of the COMI, validated by this work, is a reliable and valuable tool for spine centers treating Dutch-speaking patients and can be used in registries and outcome studies.
Individual Differences in Base Rate Neglect: A Fuzzy Processing Preference Index
Wolfe, Christopher R.; Fisher, Christopher R.
2013-01-01
Little is known about individual differences in integrating numeric base-rates and qualitative text in making probability judgments. Fuzzy-Trace Theory predicts a preference for fuzzy processing. We conducted six studies to develop the FPPI, a reliable and valid instrument assessing individual differences in this fuzzy processing preference. It consists of 19 probability estimation items plus 4 "M-Scale" items that distinguish simple pattern matching from “base rate respect.” Cronbach's Alpha was consistently above 0.90. Validity is suggested by significant correlations between FPPI scores and three other measurers: "Rule Based" Process Dissociation Procedure scores; the number of conjunction fallacies in joint probability estimation; and logic index scores on syllogistic reasoning. Replicating norms collected in a university study with a web-based study produced negligible differences in FPPI scores, indicating robustness. The predicted relationships between individual differences in base rate respect and both conjunction fallacies and syllogistic reasoning were partially replicated in two web-based studies. PMID:23935255
LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G
2015-04-01
Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Hollis, Geoff
2018-04-01
Best-worst scaling is a judgment format in which participants are presented with a set of items and have to choose the superior and inferior items in the set. Best-worst scaling generates a large quantity of information per judgment because each judgment allows for inferences about the rank value of all unjudged items. This property of best-worst scaling makes it a promising judgment format for research in psychology and natural language processing concerned with estimating the semantic properties of tens of thousands of words. A variety of different scoring algorithms have been devised in the previous literature on best-worst scaling. However, due to problems of computational efficiency, these scoring algorithms cannot be applied efficiently to cases in which thousands of items need to be scored. New algorithms are presented here for converting responses from best-worst scaling into item scores for thousands of items (many-item scoring problems). These scoring algorithms are validated through simulation and empirical experiments, and considerations related to noise, the underlying distribution of true values, and trial design are identified that can affect the relative quality of the derived item scores. The newly introduced scoring algorithms consistently outperformed scoring algorithms used in the previous literature on scoring many-item best-worst data.
Adeniyi, A A; Adegbite, K O; Braimoh, M O; Ogunbanjo, B O
2013-03-01
Satisfaction is important in dental care because satisfaction with care alleviates dental anxiety, influences patients' compliance and is an important indicator of quality of care. This study was designed to determine the factors that contribute to satisfaction with dental care among patients attending the Lagos State University (LASUTH) Dental Clinic. Across-sectional, descriptive questionnaire-based survey was conducted among adult patients attending the LASUTH Dental Clinic. The questionnaire, a modification of the Dental Satisfaction Questionnaire (DSQ), contained 19 items on a Likert-pattern scale with scores ranging from 0 to 4. The scores obtained for satisfaction with the dental services ranged from 19 to 75 with a mean of 55.30 +/- 11.55. The majority of respondents (305 or 87.4%) were satisfied with the services received. The items generating the highest and lowest mean satisfaction score were cleanliness/comfort of the facility and cost of services respectively. Long waiting time was the item respondents liked least about the services. There was a statistically significant relationship between the items assessing communication and respondent's gender (p = 0.001). The relationship between the overall satisfaction score and gender (p = 0.233), age category (p = 0.842) and educational status (p = 0.565) were not statistically significant. The results indicate a high level of satisfaction with services provided at the LASUTH Dental Clinic. However, there is need for improvement in communication with patients and reduction in waiting time.
NASA Astrophysics Data System (ADS)
Greenberg, Ariela Caren
Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
MMPI-2 Item Endorsements in Dissociative Identity Disorder vs. Simulators.
Brand, Bethany L; Chasson, Gregory S; Palermo, Cori A; Donato, Frank M; Rhodes, Kyle P; Voorhees, Emily F
2016-03-01
Elevated scores on some MMPI-2 (Minnesota Multiphasic Inventory-2) validity scales are common among patients with dissociative identity disorder (DID), which raises questions about the validity of their responses. Such patients show elevated scores on atypical answers (F), F-psychopathology (Fp), atypical answers in the second half of the test (FB), schizophrenia (Sc), and depression (D) scales, with Fp showing the greatest utility in distinguishing them from coached and uncoached DID simulators. In the current study, we investigated the items on the MMPI-2 F, Fp, FB, Sc, and D scales that were most and least commonly endorsed by participants with DID in our 2014 study and compared these responses with those of coached and uncoached DID simulators. The comparisons revealed that patients with DID most frequently endorsed items related to dissociation, trauma, depression, fearfulness, conflict within family, and self-destructiveness. The coached group more successfully imitated item endorsements of the DID group than did the uncoached group. However, both simulating groups, especially the uncoached group, frequently endorsed items that were uncommonly endorsed by the DID group. The uncoached group endorsed items consistent with popular media portrayals of people with DID being violent, delusional, and unlawful. These results suggest that item endorsement patterns can provide useful information to clinicians making determinations about whether an individual is presenting with DID or feigning. © 2016 American Academy of Psychiatry and the Law.
Can Item Keyword Feedback Help Remediate Knowledge Gaps?
Feinberg, Richard A; Clauser, Amanda L
2016-10-01
In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.
The Many Null Distributions of Person Fit Indices.
ERIC Educational Resources Information Center
Molenaar, Ivo W.; Hoijtink, Herbert
1990-01-01
Statistical properties of person fit indices are reviewed as indicators of the extent to which a person's score pattern is in agreement with a measurement model. Distribution of a fit index and ability-free fit evaluation are discussed. The null distribution was simulated for a test of 20 items. (SLD)
Kurka, Jonathan M; Buman, Matthew P; Ainsworth, Barbara E
2014-01-01
Athletes may be at risk for developing adverse health outcomes due to poor eating behaviors during college. Due to the complex nature of the diet, it is difficult to include or exclude individual food items and specific food groups from the diet. Eating behaviors may better characterize the complex interactions between individual food items and specific food groups. The purpose was to examine the Rapid Eating Assessment for Patients survey (REAP) as a valid tool for analyzing eating behaviors of NCAA Division-I male and female athletes using pattern identification. Also, to investigate the relationships between derived eating behavior patterns and body mass index (BMI) and waist circumference (WC) while stratifying by sex and aesthetic nature of the sport. Two independent samples of male (n = 86; n = 139) and female (n = 64; n = 102) collegiate athletes completed the REAP in June-August 2011 (n = 150) and June-August 2012 (n = 241). Principal component analysis (PCA) determined possible factors using wave-1 athletes. Exploratory (EFA) and confirmatory factor analyses (CFA) determined factors accounting for error and confirmed model fit in wave-2 athletes. Wave-2 athletes' BMI and WC were recorded during a physical exam and sport participation determined classification in aesthetic and non-aesthetic sport. Mean differences in eating behavior pattern score were explored. Regression models examined interactions between pattern scores, participation in aesthetic or non-aesthetic sport, and BMI and waist circumference controlling for age and race. A 5-factor PCA solution accounting for 60.3% of sample variance determined fourteen questions for EFA and CFA. A confirmed solution revealed patterns of Desserts, Healthy food, Meats, High-fat food, and Dairy. Pattern score (mean ± SE) differences were found, as non-aesthetic sport males had a higher (better) Dessert score than aesthetic sport males (2.16 ± 0.07 vs. 1.93 ± 0.11). Female aesthetic athletes had a higher score compared to non-aesthetic female athletes for the Dessert (2.11 ± 0.11 vs. 1.88 ± 0.08), Meat (1.95 ± 0.10 vs. 1.72 ± 0.07), High-fat food (1.70 ± 0.08 vs. 1.46 ± 0.06), and Dairy (1.70 ± 0.11 vs. 1.43 ± 0.07) patterns. REAP is a construct valid tool to assess dietary patterns in college athletes. In light of varying dietary patterns, college athletes should be evaluated for healthful and unhealthful eating behaviors.
External validity of the pediatric cardiac quality of life inventory
Marino, Bradley S.; Drotar, Dennis; Cassedy, Amy; Davis, Richard; Tomlinson, Ryan S.; Mellion, Katelyn; Mussatto, Kathleen; Mahony, Lynn; Newburger, Jane W.; Tong, Elizabeth; Cohen, Mitchell I.; Helfaer, Mark A.; Kazak, Anne E.; Wray, Jo; Wernovsky, Gil; Shea, Judy A.; Ittenbach, Richard
2012-01-01
Purpose The Pediatric Cardiac Quality of Life Inventory (PCQLI) is a disease-specific, health-related quality of life (HRQOL) measure for pediatric heart disease (HD). The purpose of this study was to demonstrate the external validity of PCQLI scores. Methods The PCQLI development site (Development sample) and six geographically diverse centers in the United States (Composite sample) recruited pediatric patients with acquired or congenital HD. Item response option variability, scores [Total (TS); Disease Impact (DI) and Psychosocial Impact (PI) subscales], patterns of correlation, and internal consistency were compared between samples. Results A total of 3,128 patients and parent participants (1,113 Development; 2,015 Composite) were analyzed. Response option variability patterns of all items in both samples were acceptable. Inter-sample score comparisons revealed no differences. Median item–total (Development, 0.57; Composite, 0.59) and item–subscale (Development, DI 0.58, PI 0.59; Composite, DI 0.58, PI 0.56) correlations were moderate. Subscale–subscale (0.79 for both samples) and subscale–total (Development, DI 0.95, PI 0.95; Composite, DI 0.95, PI 0.94) correlations and internal consistency (Development, TS 0.93, DI 0.90, PI 0.84; Composite, TS 0.93, DI 0.89, PI 0.85) were high in both samples. Conclusion PCQLI scores are externally valid across the US pediatric HD population and may be used for multi-center HRQOL studies. PMID:21188538
Item Response Modeling with Sum Scores
ERIC Educational Resources Information Center
Johnson, Timothy R.
2013-01-01
One of the distinctions between classical test theory and item response theory is that the former focuses on sum scores and their relationship to true scores, whereas the latter concerns item responses and their relationship to latent scores. Although item response theory is often viewed as the richer of the two theories, sum scores are still…
Anxiety, Stress and Coping Patterns in Children in Dental Settings.
Pop-Jordanova, Nadica; Sarakinova, Olivera; Pop-Stefanova-Trposka, Maja; Zabokova-Bilbilova, Efka; Kostadinovska, Emilija
2018-04-15
Fear of the dentist and dental treatment is a common problem. It can cause treatment difficulties for the practitioner, as well as severe consequences for the patient. As is known, the level of stress can be evaluated thought electrodermal activity, cortisol measure in saliva, or indirectly by psychometric tests. The present study examined the psychological influence of dental interventions on the child as well as coping patterns used for stress diminution. We examined two matched groups of patients: a) children with orthodontic problems (anomalies in shape, position and function of dentomaxillofacial structures) (N = 31, mean age 10.3 ± 2.02) years; and b) children with ordinary dental problems (N = 31, mean age 10.3 ± 2.4 years). As psychometric instruments, we used: 45 items Sarason's scale for anxiety, 20 items simple Stress - test adapted for children, as well as A - cope test for evaluation coping patterns. Obtained scores confirmed the presence of moderate anxiety in both groups as well as moderate stress level. For Sarason's test obtained scores for the group with dental problems are 20.63 ± 8.37 (from max 45); and for Stress test 7.63 ± 3.45 (from max 20); for the orthodontic group obtained scores are 18.66 ± 6.85 for Sarason's test, while for the Stress test were 7.76 ± 3.78. One way ANOVA confirmed a significant difference in values of obtained scores related to the age and gender. Calculated Student t - test shows non-significant differences in obtained test results for both groups of examinees. Coping mechanisms evaluated by A - cope test shows that in both groups the most important patterns used for stress relief are: developing self-reliance and optimism; avoiding problems and engaging in demanding activity. This study confirmed that moderate stress level and anxiety are present in both groups of patients (orthodontic and dental). Obtained scores are depending on gender and age. As more used coping patterns in both groups are developing self-reliance and optimism; avoiding problems and engaging in demanding activity. Some strategies for managing this problem are discussed.
Pearson, Keith E; Wadley, Virginia G; McClure, Leslie A; Shikany, James M; Unverzagt, Fred W; Judd, Suzanne E
2016-01-01
Identifying factors that contribute to the preservation of cognitive function is imperative to maintaining quality of life in advanced years. Of modifiable risk factors, diet quality has emerged as a promising candidate to make an impact on cognition. The objective of this study was to evaluate associations between empirically derived dietary patterns and cognitive function. This study included 18 080 black and white participants aged 45 years and older from the REasons for Geographic And Racial Differences in Stroke (REGARDS) cohort. Principal component analysis on data from the Block98 FFQ yielded five dietary patterns: convenience, plant-based, sweets/fats, Southern, and alcohol/salads. Incident cognitive impairment was defined as shifting from intact cognitive status (score >4) at first assessment to impaired cognitive status (score ≤4) at latest assessment, measured by the Six-Item Screener. Learning, memory and executive function were evaluated with the Word List Learning, Word List Delayed Recall, and animal fluency assessments. In fully adjusted models, greater consumption of the alcohol/salads pattern was associated with lower odds of incident cognitive impairment (highest quintile (Q5) v . lowest quintile (Q1): OR 0·68; 95 % CI 0·56, 0·84; P for trend 0·0005). Greater consumption of the alcohol/salads pattern was associated with higher scores on all domain-specific assessments and greater consumption of the plant-based pattern was associated with higher scores in learning and memory. Greater consumption of the Southern pattern was associated with lower scores on each domain-specific assessment (all P < 0·05). In conclusion, dietary patterns including plant-based foods and alcohol intake were associated with higher cognitive scores, and a pattern including fried food and processed meat typical of a Southern diet was associated with lower scores.
ERIC Educational Resources Information Center
He, Yong
2013-01-01
Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…
An Approach to Scoring and Equating Tests with Binary Items: Piloting With Large-Scale Assessments
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2016-01-01
This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…
Rodebaugh, Thomas L; Woods, Carol M; Heimberg, Richard G
2007-06-01
Although well-used and empirically supported, the Social Interaction Anxiety Scale (SIAS) has a questionable factor structure and includes reverse-scored items with questionable utility. Here, using samples of undergraduates and a sample of clients with social anxiety disorder, we extend previous work that opened the question of whether the reverse-scored items belong on the scale. First, we successfully confirmed the factor structure obtained in previous samples. Second, we found the reverse-scored items to show consistently weaker relationships with a variety of comparison measures. Third, we demonstrated that removing the reverse-scored questions generally helps rather than hinders the psychometric performance of the SIAS total score. Fourth, we found that the reverse-scored items show a strong relationship with the normal personality characteristic of extraversion, suggesting that the reverse-scored items may primarily assess extraversion. Given the above results, we suggest investigators consider performing data analyses using only the straightforwardly worded items of the SIAS.
Can Item Keyword Feedback Help Remediate Knowledge Gaps?
Feinberg, Richard A.; Clauser, Amanda L.
2016-01-01
ABSTRACT Background In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. Objective The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Methods Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Results Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Conclusions Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation. PMID:27777664
The Meal Pattern Questionnaire: A psychometric evaluation using the Eating Disorder Examination.
Alfonsson, S; Sewall, A; Lidholm, H; Hursti, T
2016-04-01
Meal pattern is an important variable in both obesity treatment and treatment for eating disorders. Momentary assessment and eating diaries are highly valid measurement methods but often cumbersome and not always feasible to use in clinical practice. The aim of this study was to design and evaluate a self-report instrument for measuring meal patterns. The Pattern of eating item from the Eating Disorder Examination (EDE) interview was adapted to self-report format to follow the same overall structure as the Eating Disorder Examination Questionnaire. The new instrument was named the Meal Patterns Questionnaire (MPQ) and was compared with the EDE in a student sample (n=105) and an obese sample (n=111). The individual items of the MPQ and the EDE showed moderate to high correlations (rho=.63-89) in the two samples. Significant differences between the MPQ and EDE were only found for two items in the obese sample. The total scores correlated to a high degree (rho=.87/.74) in both samples and no significant differences were found in this variable. The MPQ can provide an overall picture of a person's eating patterns and is a valid way to collect data regarding meal patterns. The MPQ may be a useable tool in clinical practice and research studies when more extensive instruments cannot be used. Future studies should evaluate the MPQ in diverse cultural populations and with more ecological assessment methods. Copyright © 2015 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Kaplan, Randy M.; Bennett, Randy Elliot
This study explores the potential for using a computer-based scoring procedure for the formulating-hypotheses (F-H) item. This item type presents a situation and asks the examinee to generate explanations for it. Each explanation is judged right or wrong, and the number of creditable explanations is summed to produce an item score. Scores were…
Wu, Bechien U.; Batech, Michael; Quezada, Michael; Lew, Daniel; Fujikawa, Kelly; Kung, Jonathan; Jamil, Laith H.; Chen, Wansu; Afghani, Elham; Reicher, Sonya; Buxbaum, James; Pandol, Stephen J.
2017-01-01
OBJECTIVES Acute pancreatitis has a highly variable course. Currently there is no widely accepted method to measure disease activity in patients hospitalized for acute pancreatitis. We aimed to develop a clinical activity index that incorporates routine clinical parameters to assist in the measurement, study, and management of acute pancreatitis. METHODS We used the UCLA/RAND appropriateness method to identify items for inclusion in the disease activity instrument. We conducted a systematic literature review followed by two sets of iterative modified Delphi meetings including a panel of international experts between November 2014 and November 2015. The final instrument was then applied to patient data obtained from five separate study cohorts across Southern California to assess profiles of disease activity. RESULTS From a list of 35 items comprising 6 domains, we identified 5 parameters for inclusion in the final weighted clinical activity scoring system: organ failure, systemic inflammatory response syndrome, abdominal pain, requirement for opiates and ability to tolerate oral intake. We applied the weighted scoring system across the 5 study cohorts comprising 3,123 patients. We identified several distinct patterns of disease activity: (i) overall there was an elevated score at baseline relative to discharge across all study cohorts, (ii) there were distinct patterns of disease activity related to duration of illness as well as (iii) early and persistent elevation of disease activity among patients with severe acute pancreatitis defined as persistent organ failure. CONCLUSIONS We present the development and initial validation of a clinical activity score for real-time assessment of disease activity in patients with acute pancreatitis. PMID:28462914
Wu, Bechien U; Batech, Michael; Quezada, Michael; Lew, Daniel; Fujikawa, Kelly; Kung, Jonathan; Jamil, Laith H; Chen, Wansu; Afghani, Elham; Reicher, Sonya; Buxbaum, James; Pandol, Stephen J
2017-07-01
Acute pancreatitis has a highly variable course. Currently there is no widely accepted method to measure disease activity in patients hospitalized for acute pancreatitis. We aimed to develop a clinical activity index that incorporates routine clinical parameters to assist in the measurement, study, and management of acute pancreatitis. We used the UCLA/RAND appropriateness method to identify items for inclusion in the disease activity instrument. We conducted a systematic literature review followed by two sets of iterative modified Delphi meetings including a panel of international experts between November 2014 and November 2015. The final instrument was then applied to patient data obtained from five separate study cohorts across Southern California to assess profiles of disease activity. From a list of 35 items comprising 6 domains, we identified 5 parameters for inclusion in the final weighted clinical activity scoring system: organ failure, systemic inflammatory response syndrome, abdominal pain, requirement for opiates and ability to tolerate oral intake. We applied the weighted scoring system across the 5 study cohorts comprising 3,123 patients. We identified several distinct patterns of disease activity: (i) overall there was an elevated score at baseline relative to discharge across all study cohorts, (ii) there were distinct patterns of disease activity related to duration of illness as well as (iii) early and persistent elevation of disease activity among patients with severe acute pancreatitis defined as persistent organ failure. We present the development and initial validation of a clinical activity score for real-time assessment of disease activity in patients with acute pancreatitis.
Zhou, Yan; Ortiz, Freddy; Nuñez, Christopher; Elashoff, David; Woo, Ellen; Apostolova, Liana G.; Wolf, Sheldon; Casado, Maria; Caceres, Nenette; Panchal, Hemali; Ringman, John M.
2015-01-01
Background/Aims Performance on the Montreal Cognitive Assessment (MoCA) has been demonstrated to be dependent on the educational level. The purpose of this study was to identify how to best adjust MoCA scores and to identify MoCA items most sensitive to cognitive decline in incipient Alzheimer's disease (AD) in a Spanish-speaking population with varied levels of education. Methods We analyzed data from 50 Spanish-speaking participants. We examined the pattern of diagnosis-adjusted MoCA residuals in relation to education and compared four alternative score adjustments using bootstrap sampling. Sensitivity and specificity analyses were performed for the raw and each adjusted score. The interval reliability of the MoCA as well as item discrimination and item validity were examined. Results We found that with progressive compensation added for those with lower education, unexplained residuals decreased and education-residual association moved to zero, suggesting that more compensation was necessary to better adjust MoCA scores in those with a lower educational level. Cube copying, sentence repetition, delayed recall, and orientation were most sensitive to cognitive impairment due to AD. Conclusion A compensation of 3-4 points was needed for <6 years of education. Overall, the Spanish version of the MoCA maintained adequate psychometric properties in this population. PMID:25873930
Schimmenti, Adriano
2016-01-01
The purpose of this study was to examine the psychometric properties of the Italian translation of the Adolescent Dissociative Experiences Scale (A-DES). A sample of 1,806 high-school students between the ages of 13 and 18 years, recruited in 6 Italian cities, completed the A-DES. The A-DES showed high internal consistency, excellent item-to-scale homogeneity, good split-half reliability, and a single-factor structure. The scores of the Italian adolescents were comparable to those found in previous research with the measure. No gender differences were found in mean A-DES scores, but boys and girls showed different patterns of responses on A-DES items. Age differences were also found, with 13- and 18-year-old students scoring higher on the measure than the other participants. A cluster analysis showed that participants could be consistently grouped into 2 clusters of low- and high-dissociative adolescents. This study supports the A-DES as a reliable and valid screening measure for dissociative symptoms in adolescents.
Zhao, Yue
2017-03-01
In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation. Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen's summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items. The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant. Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.
Hays, Ron D; Spritzer, Karen L; Amtmann, Dagmar; Lai, Jin-Shei; Dewitt, Esi Morgan; Rothrock, Nan; Dewalt, Darren A; Riley, William T; Fries, James F; Krishnan, Eswar
2013-11-01
To create upper-extremity and mobility subdomain scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) physical functioning adult item bank. Expert reviews were used to identify upper-extremity and mobility items from the PROMIS item bank. Psychometric analyses were conducted to assess empirical support for scoring upper-extremity and mobility subdomains. Data were collected from the U.S. general population and multiple disease groups via self-administered surveys. The sample (N=21,773) included 21,133 English-speaking adults who participated in the PROMIS wave 1 data collection and 640 Spanish-speaking Latino adults recruited separately. Not applicable. We used English- and Spanish-language data and existing PROMIS item parameters for the physical functioning item bank to estimate upper-extremity and mobility scores. In addition, we fit graded response models to calibrate the upper-extremity items and mobility items separately, compare separate to combined calibrations, and produce subdomain scores. After eliminating items because of local dependency, 16 items remained to assess upper extremity and 17 items to assess mobility. The estimated correlation between upper extremity and mobility was .59 using existing PROMIS physical functioning item parameters (r=.60 using parameters calibrated separately for upper-extremity and mobility items). Upper-extremity and mobility subdomains shared about 35% of the variance in common, and produced comparable scores whether calibrated separately or together. The identification of the subset of items tapping these 2 aspects of physical functioning and scored using the existing PROMIS parameters provides the option of scoring these subdomains in addition to the overall physical functioning score. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Kedem, Leia E; Evans, Ellen M; Chapman-Novakofski, Karen
2014-11-01
Lifestyle interventions commonly measure psychosocial beliefs as precursors to positive behavior change, but often overlook questionnaire validation. This can affect measurement accuracy if the survey has been developed for a different population, as differing behavioral influences may affect instrument validity. The present study aimed to explore psychometric properties of self-efficacy and outcome expectation scales-originally developed for younger children-in a population of female college freshmen (N = 268). Exploratory principal component analysis was used to investigate underlying data patterns and assess validity of previously published subscales. Composite scores for reliable subscales (Cronbach's α ≥ .70) were calculated to help characterize self-efficacy and outcome expectation beliefs in this population. The outcome expectation factor structure clearly comprised of positive (α = .81-.90) and negative outcomes (α = .63-.67). The self-efficacy factor structure included themes of motivation and effort (α = .75-.94), but items pertaining to hunger and availability cross-loaded often. Based on cross-loading patterns and low Cronbach's alpha values, respectively, self-efficacy items regarding barriers to healthy eating and negative outcome expectation items should be refined to improve reliability. Composite scores suggested that eating healthfully was associated with positive outcomes, but self-efficacy to do so was lower. Thus, dietary interventions for college students may be more successful by including skill-building activities to enhance self-efficacy and increase the likelihood of behavior change. © The Author(s) 2014.
Well-being as a moving target: measurement equivalence of the Bradburn Affect Balance Scale.
Maitland, S B; Dixon, R A; Hultsch, D F; Hertzog, C
2001-03-01
Although the Bradburn Affect Balance scale (ABS) is a frequently used two-factor indicator of well-being in later life, its measurement and invariance properties are not well documented. We examined these issues using confirmatory factor analyses of cross-sectional (adults ages 54-87 years) and longitudinal data from the Victoria Longitudinal Study. Stability of the positive and negative affect factors was moderate across a 3-year period. Overall, factor loadings for positive affect items were invariant over time with the exception of the pleased item. Negative affect items were time invariant. However, age-group comparisons between young-old and old-old groups revealed age differences in loadings for the upset item at Time 1. Finally, gender groups differed in loadings for the top of the world and going your way items. Thus a pattern of partial measurement equivalence characterized item response to the ABS. Our results suggest that group comparisons and longitudinal change in ABS scale scores of positive and negative affect should be interpreted with caution.
Vereecken, Carine; Haerens, Leen; De Bourdeaudhuij, Ilse; Maes, Lea
2010-10-01
To identify the correlates of the home food environment (parents' intake, availability and food-related parenting practices) at the age of 10 years with dietary patterns during childhood and in adolescence. Primary-school children of fifty-nine Flemish elementary schools completed a questionnaire at school in 2002. Four years later they completed a questionnaire by e-mail or mail at home. Their parents completed a questionnaire on food-related parenting practices at baseline. Longitudinal study. The analyses included 609 matched questionnaires. Multi-level regression analyses were used to identify baseline parenting practices (pressure, reward, negotiation, catering on demand, permissiveness, verbal praise, avoiding negative modelling, availability of healthy/unhealthy food items and mothers' fruit and vegetable (F&V) and excess scores) associated with children's dietary patterns (F&V and excess scores). Mother's F&V score was a significant positive independent predictor for children's F&V score at baseline and follow-up, whereas availability of unhealthy foods was significantly negatively associated with both scores. Negotiation was positively associated with children's follow-up score of F&V, while permissiveness was positively associated with children's follow-up excess score. Availability of unhealthy foods and mother's excess score were positively related to children's excess score at baseline and follow-up. Parental intake and restricting the availability of unhealthy foods not only appeared to have a consistent impact on children's and adolescents' diets, but also negotiating and less permissive food-related parenting practices may improve adolescents' diets.
The SAT Gender Gap: Identifying the Causes.
ERIC Educational Resources Information Center
Rosser, Phyllis
Questions on the Scholastic Aptitude Test (SAT) with the largest score differences between women and men of all racial and ethnic groups were identified. Patterns of difficulty that would explain the SAT's continuing underprediction of female first-year college performance were studied. An item analysis of one form of the June 1986 SAT for 1,112…
How to Compare Parametric and Nonparametric Person-Fit Statistics Using Real Data
ERIC Educational Resources Information Center
Sinharay, Sandip
2017-01-01
Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed…
Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S
2016-01-01
To report piloting and initial validation of the VQoL_CYP, a novel age-appropriate vision-related quality of life (VQoL) instrument for self-reporting by children with visual impairment (VI). Participants were a random patient sample of children with VI aged 10-15 years. 69 patients, drawn from patient databases at Great Ormond Street Hospital and Moorfields Eye Hospital, United Kingdom, participated in piloting of the draft 47-item VQoL instrument, which enabled preliminary item reduction. Subsequent administration of the instrument, alongside functional vision (FV) and generic health-related quality of life (HRQoL) self-report measures, to 101 children with VI comprising a nationally representative sample enabled further item reduction and evaluation of psychometric properties using Rasch analysis. Construct validity was assessed through Pearson correlation coefficients. Item reduction through piloting (8 items removed for skewness and individual item response pattern) and validation (1 item removed for skewness and 3 for misfit in Rasch) produced a 35-item scale, with fit values within acceptable limits, no notable differential item functioning, good measurement precision, ordered response categories and acceptable targeting in Rasch. The VQoL_CYP showed good construct validity, correlating strongly with HRQoL scores, moderately with FV scores but not with acuity. Robust child-appropriate self-report VQoL measures for children with VI are necessary for understanding the broader impacts of living with a visual disability, distinguishing these from limited functioning per se. Future planned use in larger patient samples will allow further psychometric development of the VQoL_CYP as an adjunct to objective outcomes assessment.
At-home and away-from-home dietary patterns and BMI z-scores in Brazilian adolescents.
Cunha, Diana Barbosa; Bezerra, Ilana Nogueira; Pereira, Rosangela Alves; Sichieri, Rosely
2018-01-01
Away-from-home food intake has been associated with high rates of overweight among children and adolescents. However, there are no studies comparing at-home and away-from-home eating patterns among adolescents. The objective of this paper was to identify at-home and away-from-home dietary patterns among adolescents in Brazil, and to evaluate the relationship between these patterns and body mass index (BMI) z-scores. Data from the Brazilian National Dietary Survey 2008-2009 were analyzed in this cross-sectional study. Dietary intake was assessed by completion of written food records on two non-consecutive days. Five thousand two hundred sixty-six adolescents 10-19 years of age living in urban areas of Brazil were included in the analysis. Thirty-two food groups were examined by factor analysis, stratified by at-home and away-from-home eating. The associations between the food patterns and BMI z-scores were ascertained using linear regression analysis. In general, mean at-home food intake was greater than away-from-home food intake, but the ratio of away-from-home/at-home was greater than 30% for baked and deep-fried snacks, soft drinks, sandwiches, pizza, and desserts, and was lower than 10% for rice and beans. Three main similar dietary patterns were identified both at-home and away-from-home: the "Traditional pattern", the "Bread and Butter pattern" and the "Western pattern"; however, away-from-home patterns encompassed more overall food items. Only the at-home "Western pattern" was positively associated with BMI z-scores (β = 0.0006; p < 0.001). Our results indicate that unhealthy dietary pattern consumed at home is associated to BMI z-score, while away-from-home food consumption is not associated. Copyright © 2017 Elsevier Ltd. All rights reserved.
Zhao, Gai; Bian, Yang; Li, Ming
2013-12-18
To analyze the impact of passing items above the roof level in the gross motor subtest of Peabody development motor scales (PDMS-2) on its assessment results. In the subtests of PDMS-2, 124 children from 1.2 to 71 months were administered. Except for the original scoring method, a new scoring method which includes passing items above the ceiling were developed. The standard scores and quotients of the two scoring methods were compared using the independent-samples t test. Only one child could pass the items above the ceiling in the stationary subtest, 19 children in the locomotion subtest, and 17 children in the visual-motor integration subtest. When the scores of these passing items were included in the raw scores, the total raw scores got the added points of 1-12, the standard scores added 0-1 points and the motor quotients added 0-3 points. The diagnostic classification was changed only in two children. There was no significant difference between those two methods about motor quotients or standard scores in the specific subtest (P>0.05). The passing items above a ceiling of PDMS-2 isn't a rare situation. It usually takes place in the locomotion subtest and visual-motor integration subtest. Including these passing items into the scoring system will not make significant difference in the standard scores of the subtests or the developmental motor quotients (DMQ), which supports the original setting of a ceiling established by upassing 3 items in a row. However, putting the passing items above the ceiling into the raw score will improve tracking of children's developmental trajectory and intervention effects.
Gasser, Constantine E; Kerr, Jessica A; Mensah, Fiona K; Wake, Melissa
2017-04-01
This study aimed to derive and compare longitudinal trajectories of dietary scores and patterns from 2-3 to 10-11 years and from 4-5 to 14-15 years of age. In waves two to six of the Baby (B) Cohort and one to six of the Kindergarten (K) Cohort of the population-based Longitudinal Study of Australian Children, parents or children reported biennially on the study child's consumption of twelve to sixteen healthy and less healthy food or drink items for the previous 24 h. For each wave, we derived a dietary score from 0 to 14, based on the 2013 Australian Dietary Guidelines (higher scores indicating healthier diet). We then used factor analyses to empirically derive dietary patterns for separate waves. Using group-based trajectory modelling, we generated trajectories of dietary scores and empirical patterns in 4504 B and 4640 K Cohort children. Four similar trajectories of dietary scores emerged for the B and K Cohorts, containing comparable proportions of children in each cohort: 'never healthy' (8·8 and 11·9 %, respectively), 'moderately healthy' (24·0 and 20·7 %), 'becoming less healthy' (16·6 and 27·3 %) and 'always healthy' (50·7 and 40·2 %). Deriving trajectories based on dietary patterns, rather than dietary scores, produced similar findings. For 'becoming less healthy' trajectories, dietary quality appeared to worsen from 7 years of age in both cohorts. In conclusion, a brief dietary measure administered repeatedly across childhood generated robust, nuanced dietary trajectories that were replicable across two cohorts and two methodologies. These trajectories appear ideal for future research into dietary determinants and health outcomes.
Khan, Anzalee; Lewis, Charles; Lindenmayer, Jean-Pierre
2011-11-16
Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity.
2011-01-01
Background Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Methods Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. Results The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. Conclusions The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity. PMID:22087503
ERIC Educational Resources Information Center
Feldt, Leonard S.
2004-01-01
In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.
Estimating Total-test Scores from Partial Scores in a Matrix Sampling Design.
ERIC Educational Resources Information Center
Sachar, Jane; Suppes, Patrick
It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…
Estimating Total-Test Scores from Partial Scores in a Matrix Sampling Design.
ERIC Educational Resources Information Center
Sachar, Jane; Suppes, Patrick
1980-01-01
The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students and 60 items of the 110-item Stanford Mental Arithmetic Test. Three methods yielded fairly good estimates of the total-test score. (Author/RL)
MAHR, ALFRED D.; NEOGI, TUHINA; LAVALLEY, MICHAEL P.; DAVIS, JOHN C.; HOFFMAN, GARY S.; MCCUNE, W. JOSEPH; SPECKS, ULRICH; SPIERA, ROBERT F.; ST.CLAIR, E. WILLIAM; STONE, JOHN H.; MERKEL, PETER A.
2013-01-01
Objective To assess the Birmingham Vasculitis Activity Score for Wegener's Granulomatosis (BVAS/WG) with respect to its selection and weighting of items. Methods This study used the BVAS/WG data from the Wegener's Granulomatosis Etanercept Trial. The scoring frequencies of the 34 predefined items and any “other” items added by clinicians were calculated. Using linear regression with generalized estimating equations in which the physician global assessment (PGA) of disease activity was the dependent variable, we computed weights for all predefined items. We also created variables for clinical manifestations frequently added as other items, and computed weights for these as well. We searched for the model that included the items and their generated weights yielding an activity score with the highest R2 to predict the PGA. Results We analyzed 2,044 BVAS/WG assessments from 180 patients; 734 assessments were scored during active disease. The highest R2 with the PGA was obtained by scoring WG activity based on the following items: the 25 predefined items rated on ≥5 visits, the 2 newly created fatigue and weight loss variables, the remaining minor other and major other items, and a variable that signified whether new or worse items were present at a specific visit. The weights assigned to the items ranged from 1 to 21. Compared with the original BVAS/WG, this modified score correlated significantly more strongly with the PGA. Conclusion This study suggests possibilities to enhance the item selection and weighting of the BVAS/WG. These changes may increase this instrument's ability to capture the continuum of disease activity in WG. PMID:18512722
Daniels, Vijay J; Bordage, Georges; Gierl, Mark J; Yudkowsky, Rachel
2014-10-01
Objective structured clinical examinations (OSCEs) are used worldwide for summative examinations but often lack acceptable reliability. Research has shown that reliability of scores increases if OSCE checklists for medical students include only clinically relevant items. Also, checklists are often missing evidence-based items that high-achieving learners are more likely to use. The purpose of this study was to determine if limiting checklist items to clinically discriminating items and/or adding missing evidence-based items improved score reliability in an Internal Medicine residency OSCE. Six internists reviewed the traditional checklists of four OSCE stations classifying items as clinically discriminating or non-discriminating. Two independent reviewers augmented checklists with missing evidence-based items. We used generalizability theory to calculate overall reliability of faculty observer checklist scores from 45 first and second-year residents and predict how many 10-item stations would be required to reach a Phi coefficient of 0.8. Removing clinically non-discriminating items from the traditional checklist did not affect the number of stations (15) required to reach a Phi of 0.8 with 10 items. Focusing the checklist on only evidence-based clinically discriminating items increased test score reliability, needing 11 stations instead of 15 to reach 0.8; adding missing evidence-based clinically discriminating items to the traditional checklist modestly improved reliability (needing 14 instead of 15 stations). Checklists composed of evidence-based clinically discriminating items improved the reliability of checklist scores and reduced the number of stations needed for acceptable reliability. Educators should give preference to evidence-based items over non-evidence-based items when developing OSCE checklists.
Udo, Tomoko; McKee, Sherry A; Grilo, Carlos M
2015-01-01
The Beck Depression Inventory (BDI) is often used to assess depression symptoms, but its factor structure and its clinical utility have not been evaluated in patients with binge eating disorder (BED) and obesity. A total of 882 treatment-seeking obese patients with BED were administered structured interviews (Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition Axis I Disorders) and completed self-report questionnaires. Exploratory and confirmatory factor analyses supported a brief 16-item BDI version with a three-factor structure (affective, attitudinal and somatic). Both 21- and 16-item versions showed excellent internal consistency (both α=0.89) and had significant correlation patterns with different aspects of eating disorder psychopathology; three factors showed significant but variable associations with eating disorder psychopathology. Area under the curves (AUC) for both BDI versions were significant in predicting major depressive disorder (MDD; AUC=0.773 [16-item], 73.5% sensitivity/70.2% specificity, AUC=0.769 [21-item], 79.5% sensitivity/64.1% specificity) and mood disorders (AUC=0.763 [16-item], 67.1% sensitivity/71.5% specificity, AUC=0.769 [21-item], 84.2% sensitivity/55.7% specificity). The 21-item BDI (cutoff score ≥16) showed higher negative predictive values (94.0% vs. 93.0% [MDD]; 92.4% vs. 88.3% [mood disorders]) than the brief 16-item BDI (cutoff score ≥13). Both BDI versions demonstrated moderate performance as a screening instrument for MDD/mood disorders in obese patients with BED. Advantages and disadvantages for both versions are discussed. A three-factor structure has potential to inform the conceptualization of depression features. Copyright © 2015 Elsevier Inc. All rights reserved.
Directed forgetting of complex pictures in an item method paradigm.
Hauswald, Anne; Kissler, Johanna
2008-11-01
An item-cued directed forgetting paradigm was used to investigate the ability to control episodic memory and selectively encode complex coloured pictures. A series of photographs was presented to 21 participants who were instructed to either remember or forget each picture after it was presented. Memory performance was later tested with a recognition task where all presented items had to be retrieved, regardless of the initial instructions. A directed forgetting effect--that is, better recognition of "to-be-remembered" than of "to-be-forgotten" pictures--was observed, although its size was smaller than previously reported for words or line drawings. The magnitude of the directed forgetting effect correlated negatively with participants' depression and dissociation scores. The results indicate that, at least in an item method, directed forgetting occurs for complex pictures as well as words and simple line drawings. Furthermore, people with higher levels of dissociative or depressive symptoms exhibit altered memory encoding patterns.
Item and Error Analysis on Raven's Coloured Progressive Matrices in Williams Syndrome
ERIC Educational Resources Information Center
Van Herwegen, Jo; Farran, Emily; Annaz, Dagmara
2011-01-01
Raven's Coloured Progressive Matrices (RCPM) is a standardised test that is commonly used to obtain a non-verbal reasoning score for children. As the RCPM involves the matching of a target to a pattern it is also considered to be a visuo-spatial perception task. RCPM is therefore frequently used in studies in Williams Syndrome (WS), in order to…
Fox, Mark C; Mitchum, Ainsley L
2014-01-01
The trend of rising scores on intelligence tests raises important questions about the comparability of variation within and between time periods. Descriptions of the processes that mediate selection of item responses provide meaningful psychological criteria upon which to base such comparisons. In a recent paper, Fox and Mitchum presented and tested a cognitive theory of rising scores on analogical and inductive reasoning tests that is specific enough to make novel predictions about cohort differences in patterns of item responses for tests such as the Raven's Matrices. In this paper we extend the same proposal in two important ways by (1) testing it against a dataset that enables the effects of cohort to be isolated from those of age, and (2) applying it to two other inductive reasoning tests that exhibit large Flynn effects: Letter Series and Word Series. Following specification and testing of a confirmatory item response model, predicted violations of measurement invariance are observed between two age-matched cohorts that are separated by only 20 years, as members of the later cohort are found to map objects at higher levels of abstraction than members of the earlier cohort who possess the same overall level of ability. Results have implications for the Flynn effect and cognitive aging while underscoring the value of establishing psychological criteria for equating members of distinct groups who achieve the same scores.
Calibration of the Test of Relational Reasoning.
Dumas, Denis; Alexander, Patricia A
2016-10-01
Relational reasoning, or the ability to discern meaningful patterns within a stream of information, is a critical cognitive ability associated with academic and professional success. Importantly, relational reasoning has been described as taking multiple forms, depending on the type of higher order relations being drawn between and among concepts. However, the reliable and valid measurement of such a multidimensional construct of relational reasoning has been elusive. The Test of Relational Reasoning (TORR) was designed to tap 4 forms of relational reasoning (i.e., analogy, anomaly, antinomy, and antithesis). In this investigation, the TORR was calibrated and scored using multidimensional item response theory in a large, representative undergraduate sample. The bifactor model was identified as the best-fitting model, and used to estimate item parameters and construct reliability. To improve the usefulness of the TORR to educators, scaled scores were also calculated and presented. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
The development of the "Cantonese receptive vocabulary test' for children aged 2-6 in Hong Kong.
Cheung, P S; Lee, K Y; Lee, L W
1997-01-01
The study aims to develop a Cantonese receptive vocabulary test to assess 2-6-year-old children in Hong Kong. The test consists of 100 test items. Each target item is accompanied by a phonological distractor, a semantic distractor and an unrelated distractor. A sample of 609 normal children from four Maternal and Child Health Centres and nine kindergartens was selected. The results show that there is a significant effect of age on the correct score. ANOVA was performed to look at the age effect on each distractor individually. It was found that the scores of the three distractors decrease in their own patterns as age increases. With strong content validity, strong construct validity and high correlation coefficients in the split-half reliability, this test could be used as a reliable measurement for the Cantonese-speaking population in Hong Kong.
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot
ERIC Educational Resources Information Center
Magis, David; Facon, Bruno
2013-01-01
Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Romaguera, Dora; Ängquist, Lars; Du, Huaidong; Jakobsen, Marianne Uhre; Forouhi, Nita G.; Halkjær, Jytte; Feskens, Edith J. M.; van der A, Daphne L.; Masala, Giovanna; Steffen, Annika; Palli, Domenico; Wareham, Nicholas J.; Overvad, Kim; Tjønneland, Anne; Boeing, Heiner; Riboli, Elio; Sørensen, Thorkild I.
2011-01-01
Background Dietary factors such as low energy density and low glycemic index were associated with a lower gain in abdominal adiposity. A better understanding of which food groups/items contribute to these associations is necessary. Objective To ascertain the association of food groups/items consumption on prospective annual changes in “waist circumference for a given BMI” (WCBMI), a proxy for abdominal adiposity. Design We analyzed data from 48,631 men and women from 5 countries participating in the European Prospective Investigation into Cancer and Nutrition (EPIC) study. Anthropometric measurements were obtained at baseline and after a median follow-up time of 5.5 years. WCBMI was defined as the residuals of waist circumference regressed on BMI, and annual change in WCBMI (ΔWCBMI, cm/y) was defined as the difference between residuals at follow-up and baseline, divided by follow-up time. The association between food groups/items and ΔWCBMI was modelled using centre-specific adjusted linear regression, and random-effects meta-analyses to obtain pooled estimates. Results Higher fruit and dairy products consumption was associated with a lower gain in WCBMI whereas the consumption of white bread, processed meat, margarine, and soft drinks was positively associated with ΔWCBMI. When these six food groups/items were analyzed in combination using a summary score, those in the highest quartile of the score – indicating a more favourable dietary pattern –showed a ΔWCBMI of −0.11 (95% CI −0.09 to −0.14) cm/y compared to those in the lowest quartile. Conclusion A dietary pattern high in fruit and dairy and low in white bread, processed meat, margarine, and soft drinks may help to prevent abdominal fat accumulation. PMID:21858094
NASA Astrophysics Data System (ADS)
Gadbury-Amyot, Cynthia C.
This study examined validity and reliability of portfolio assessment using Messick's (1996, 1995) unified framework of construct validity. Theoretical and empirical evidence was sought for six aspects of construct validity. The sample included twenty student portfolios. Each portfolio were evaluated by seven faculty raters using a primary trait analysis scoring rubric. There was a significant relationship (r = .81--.95; p < .01) between the seven subscales in the scoring rubric demonstrating measurement of a common construct. Item analysis was conducted to examine convergent and discriminant empirical relationships of the 35 items in the scoring rubric. There was a significant relationship between all items ( p < .01), and all but one item was more strongly correlated with its own subscale than with other subscales. However, correlations of items across subscales were predominantly moderate in strength indicating that items did not strongly discriminate between subscales. A fully crossed, two facet generalizability (G) study design was used to examine reliability. Analysis of variance demonstrated that the greatest source of variance was the scoring rubric itself, accounting for 78% of the total variance. The smallest source of variance was the interaction between portfolio and rubric (1.15%) indicating that while the seven subscales varied in difficulty level, the relative standing of individual portfolios was maintained across subscales. Faculty rater variance accounted for only 1.28% of total variance. A phi coefficient of .86, analogous to a reliability coefficient in classical test theory, was obtained in the Decision study by increasing the subscales to fourteen and decreasing faculty raters to three. There was a significant relationship between portfolios and grade point average (r = .70; p < .01), and the National Dental Hygiene Board Examination (r = .60; p < .01). The relationship between portfolios and the Central Regional Dental Testing Service examination was both weak and nonsignificant (r = .19; p > .05). An open-ended survey was used to elicit student feedback on portfolio development. A majority of the students (76%) perceived value in the development of programmatic portfolios. In conclusion, the pattern of findings from this study suggest that portfolios can serve as a valid and reliable measure for assessing student competency.
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.
Eichenbaum, Alexander E; Marcus, David K; French, Brian F
2017-06-01
This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
An international measure of awareness and beliefs about cancer: development and testing of the ABC
Simon, Alice E; Forbes, Lindsay J L; Boniface, David; Warburton, Fiona; Brain, Kate E; Dessaix, Anita; Donnelly, Michael; Haynes, Kerry; Hvidberg, Line; Lagerlund, Magdalena; Petermann, Lisa; Tishelman, Carol; Vedsted, Peter; Vigmostad, Maria Nyre; Wardle, Jane; Ramirez, Amanda J
2012-01-01
Objectives To develop an internationally validated measure of cancer awareness and beliefs; the awareness and beliefs about cancer (ABC) measure. Design and setting Items modified from existing measures were assessed by a working group in six countries (Australia, Canada, Denmark, Norway, Sweden and the UK). Validation studies were completed in the UK, and cross-sectional surveys of the general population were carried out in the six participating countries. Participants Testing in UK English included cognitive interviewing for face validity (N=10), calculation of content validity indexes (six assessors), and assessment of test–retest reliability (N=97). Conceptual and cultural equivalence of modified (Canadian and Australian) and translated (Danish, Norwegian, Swedish and Canadian French) ABC versions were tested quantitatively for equivalence of meaning (≥4 assessors per country) and in bilingual cognitive interviews (three interviews per translation). Response patterns were assessed in surveys of adults aged 50+ years (N≥2000) in each country. Main outcomes Psychometric properties were evaluated through tests of validity and reliability, conceptual and cultural equivalence and systematic item analysis. Test–retest reliability used weighted-κ and intraclass correlations. Construction and validation of aggregate scores was by factor analysis for (1) beliefs about cancer outcomes, (2) beliefs about barriers to symptomatic presentation, and item summation for (3) awareness of cancer symptoms and (4) awareness of cancer risk factors. Results The English ABC had acceptable test–retest reliability and content validity. International assessments of equivalence identified a small number of items where wording needed adjustment. Survey response patterns showed that items performed well in terms of difficulty and discrimination across countries except for awareness of cancer outcomes in Australia. Aggregate scores had consistent factor structures across countries. Conclusions The ABC is a reliable and valid international measure of cancer awareness and beliefs. The methods used to validate and harmonise the ABC may serve as a methodological guide in international survey research. PMID:23253874
Reliability of self-rated tinnitus distress and association with psychological symptom patterns.
Hiller, W; Goebel, G; Rief, W
1994-05-01
Psychological complaints were investigated in two samples of 60 and 138 in-patients suffering from chronic tinnitus. We administered the Tinnitus Questionnaire (TQ), a 52-item self-rating scale which differentiates between dimensions of emotional and cognitive distress, intrusiveness, auditory perceptual difficulties, sleep disturbances and somatic complaints. The test-retest reliability was .94 for the TQ global score and between .86 and .93 for subscales. Three independent analyses were conducted to estimate the split-half reliability (internal consistency) which was only slightly lower than the test-retest values for scales with a relatively small number of items. Reliability was sufficient also on the level of single items. Low correlation between the TQ and the Hopkins Symptom Checklist (SCL-90-R) indicate a distinct quality of tinnitus-related and general psychological disturbances.
ERIC Educational Resources Information Center
Crocker, Linda M.; Mehrens, William A.
Four new methods of item analysis were used to select subsets of items which would yield measures of attitude change. The sample consisted of 263 students at Michigan State University who were tested on the Inventory of Beliefs as freshmen and retested on the same instrument as juniors. Item change scores and total change scores were computed for…
Determining an Imaging Literacy Curriculum for Radiation Oncologists: An International Delphi Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giuliani, Meredith E., E-mail: Meredith.Giuliani@rmp.uhn.on.ca; Department of Radiation Oncology, University of Toronto, Toronto, Ontario; Gillan, Caitlin
2014-03-15
Purpose: Rapid evolution of imaging technologies and their integration into radiation therapy practice demands that radiation oncology (RO) training curricula be updated. The purpose of this study was to develop an entry-to-practice image literacy competency profile. Methods and Materials: A list of 263 potential imaging competency items were assembled from international objectives of training. Expert panel eliminated redundant or irrelevant items to create a list of 97 unique potential competency items. An international 2-round Delphi process was conducted with experts in RO. In round 1, all experts scored, on a 9-point Likert scale, the degree to which they agreed anmore » item should be included in the competency profile. Items with a mean score ≥7 were included, those 4 to 6 were reviewed in round 2, and items scored <4 were excluded. In round 2, items were discussed and subsequently ranked for inclusion or exclusion in the competency profile. Items with >75% voting for inclusion were included in the final competency profile. Results: Forty-nine radiation oncologists were invited to participate in round 1, and 32 (65%) did so. Participants represented 24 centers in 6 countries. Of the 97 items ranked in round 1, 80 had a mean score ≥7, 1 item had a score <4, and 16 items with a mean score of 4 to 6 were reviewed and rescored in round 2. In round 2, 4 items had >75% of participants voting for inclusion and were included; the remaining 12 were excluded. The final list of 84 items formed the final competency profile. The 84 enabling competency items were aggregated into the following 4 thematic groups of key competencies: (1) imaging fundamentals (42 items); (2) clinical application (27 items); (3) clinical management (5 items); and (4) professional practice (10 items). Conclusions: We present an imaging literacy competency profile which could constitute the minimum training standards in radiation oncology residency programs.« less
ERIC Educational Resources Information Center
Kim, Seonghoon
2013-01-01
With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…
Observed Score and True Score Equating Procedures for Multidimensional Item Response Theory
ERIC Educational Resources Information Center
Brossman, Bradley Grant
2010-01-01
The purpose of this research was to develop observed score and true score equating procedures to be used in conjunction with the Multidimensional Item Response Theory (MIRT) framework. Currently, MIRT scale linking procedures exist to place item parameter estimates and ability estimates on the same scale after separate calibrations are conducted.…
Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models
ERIC Educational Resources Information Center
Andersson, Björn
2016-01-01
In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Chiba, Mitsuro; Nakane, Kunio; Takayama, Yuko; Sugawara, Kae; Ohno, Hideo; Ishii, Hajime; Tsuda, Satoko; Tsuji, Tsuyotoshi; Komatsu, Masafumi; Sugawara, Takeshi
2016-01-01
Context Plant-based diets (PBDs) are a healthy alternative to westernized diets. A semivegetarian diet, a PBD, has been shown to prevent a relapse in Crohn disease. However, there is no way to measure adherence to PBDs. Objective To develop a simple way of evaluating adherence to a PBD for Japanese patients with inflammatory bowel disease (IBD). Design PBD scores were assigned according to the frequency of consumption provided on a food-frequency questionnaire, obtained on hospitalization for 159 patients with ulcerative colitis and 70 patients with Crohn disease. Eight items considered to be preventive factors for IBD were scored positively, and 8 items considered to be IBD risk factors were scored negatively. The PBD score was calculated from the sum of plus and minus scores. Higher PBD scores indicated greater adherence to a PBD. The PBD scores were evaluated on hospitalization and 2 years after discharge for 22 patients with Crohn disease whose dietary pattern and prognosis were established. Main Outcome Measure Plant-Based Diet score. Results The PBD scores differed significantly, in descending order, by dietary type: pro-Japanese diet, mixed type, and pro-westernized diet (Wilcoxon/Kruskal-Wallis test). The PBD scores in the ulcerative colitis and Crohn disease groups were 10.9 ± 9.5 and 8.2 ± 8.2, respectively. For patients with Crohn disease, those with long-term remission and normal C-reactive protein concentration were significantly more likely to have PBD scores of 25 or greater than below 25 (χ2). Conclusion The PBD score is a valid assessment of PBD dietary adherence. PMID:27768566
NASA Astrophysics Data System (ADS)
Ishimoto, Michi; Davenport, Glen; Wittmann, Michael C.
2017-12-01
Student views of force and motion reflect the personal experiences and physics education of the student. With a different language, culture, and educational system, we expect that Japanese students' views on force and motion might be different from those of American students. The Force and Motion Conceptual Evaluation (FMCE) is an instrument used to probe student views on force and motion. It was designed using research on American students, and, as such, the items might function differently for Japanese students. Preliminary results from a translated version indicated that Japanese students had similar misconceptions as those of American students. In this study, we used item response curves (IRCs) to make more detailed item-by-item comparisons. IRCs show the functioning of individual items across all levels of performance by plotting the proportion of each response as a function of the total score. Most of the IRCs showed very similar patterns on both correct and incorrect responses; however, a few of the plots indicate differences between the populations. The similar patterns indicate that students tend to interact with FMCE items similarly, despite differences in culture, language, and education. We speculate about the possible causes for the differences in some of the IRCs. This report is intended to show how IRCs can be used as a part of the validation process when making comparisons across languages and nationalities. Differences in IRCs can help to pinpoint artifacts of translation, contextual effects because of differences in culture, and perhaps intrinsic differences in student understanding of Newtonian motion.
Scoring the importance of tropical forest landscapes with local people: patterns and insights.
Sheil, Douglas; Liswanti, Nining
2006-07-01
Good natural resource management is scarce in many remote tropical regions. Improved management requires better local consultation, but accessing and understanding the preferences and concerns of stakeholders can be difficult. Scoring, where items are numerically rated in relation to each other, is simple and seems applicable even in situations where capacity and funds are limited, but managers rarely use such methods. Here we investigate scoring with seven indigenous communities threatened by forest loss in Kalimantan, Indonesia. We aimed to clarify the forest's multifaceted importance, using replication, cross-check exercises, and interviews. Results are sometimes surprising, but generally explained by additional investigation that sometimes provides new insights. The consistency of scoring results increases in line with community literacy and wealth. Various benefits and pitfalls are identified and examined. Aside from revealing and clarifying local preferences, scoring has unexplored potential as a quantitative technique. Scoring is an underappreciated management tool with wide potential.
Parent-teacher agreement on children's problems in 21 societies.
Rescorla, Leslie A; Bochicchio, Lauren; Achenbach, Thomas M; Ivanova, Masha Y; Almqvist, Fredrik; Begovac, Ivan; Bilenberg, Niels; Bird, Hector; Dobrean, Anca; Erol, Nese; Fombonne, Eric; Fonseca, Antonio; Frigerio, Alessandra; Fung, Daniel S S; Lambert, Michael C; Leung, Patrick W L; Liu, Xianchen; Marković, Ivica; Markovic, Jasminka; Minaei, Asghar; Ooi, Yoon Phaik; Roussos, Alexandra; Rudan, Vlasta; Simsek, Zeynep; van der Ende, Jan; Weintraub, Sheila; Wolanczyk, Tomasz; Woo, Bernardine; Weiss, Bahr; Weisz, John; Zukauskiene, Rita; Verhulst, Frank C
2014-01-01
Parent-teacher cross-informant agreement, although usually modest, may provide important clinical information. Using data for 27,962 children from 21 societies, we asked the following: (a) Do parents report more problems than teachers, and does this vary by society, age, gender, or type of problem? (b) Does parent-teacher agreement vary across different problem scales or across societies? (c) How well do parents and teachers in different societies agree on problem item ratings? (d) How much do parent-teacher dyads in different societies vary in within-dyad agreement on problem items? (e) How well do parents and teachers in 21 societies agree on whether the child's problem level exceeds a deviance threshold? We used five methods to test agreement for Child Behavior Checklist (CBCL) and Teacher's Report Form (TRF) ratings. CBCL scores were higher than TRF scores on most scales, but the informant differences varied in magnitude across the societies studied. Cross-informant correlations for problem scale scores varied moderately across societies studied and were significantly higher for Externalizing than Internalizing problems. Parents and teachers tended to rate the same items as low, medium, or high, but within-dyad item agreement varied widely in every society studied. In all societies studied, both parental noncorroboration of teacher-reported deviance and teacher noncorroboration of parent-reported deviance were common. Our findings underscore the importance of obtaining information from parents and teachers when evaluating and treating children, highlight the need to use multiple methods of quantifying cross-informant agreement, and provide comprehensive baselines for patterns of parent-teacher agreement across 21 societies.
Comparability of scores on the MMPI-2-RF scales generated with the MMPI-2 and MMPI-2-RF booklets.
Van der Heijden, P T; Egger, J I M; Derksen, J J L
2010-05-01
In most validity studies on the recently released 338-item MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989) Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008; Tellegen & Ben-Porath, 2008), scale scores were derived from the 567-item MMPI-2 booklet. In this study, we evaluated the comparability of the MMPI-2-RF scale scores derived from the original 567-item MMPI-2 booklet with MMPI-2-RF scale scores derived from the 338-item MMPI-2-RF booklet in a Dutch student sample (N = 107). We used a counterbalanced (ABBA) design. We compared results with those previously reported by Tellegen and Ben-Porath (2008). Our findings support the comparability of the scores of the 338-item version and the 567-item version of the 50 MMPI-2-RF scales. We discuss clinical implications and directions for further research.
Wang, Dongqing; Hawley, Nicola L; Thompson, Avery A; Lameko, Viali; Reupena, Muagatutia Sefuiva; McGarvey, Stephen T; Baylin, Ana
2017-04-01
Background: The Samoan population has been undergoing a nutrition transition toward more imported and processed foods and a more sedentary lifestyle. Objectives: We aimed to identify dietary patterns in Samoa and to evaluate their associations with metabolic outcomes. Methods: The sample of this cross-sectional study includes 2774 Samoan adults recruited in 2010 (1104 with metabolic syndrome compared with 1670 without). Principal component analysis on food items from a 104-item food-frequency questionnaire was used to identify dietary patterns. Adjusted least squares means of each component of metabolic syndrome were estimated by quintiles of factor scores for each dietary pattern. Metabolic syndrome status was regressed on quintiles of scores by using log-binomial models to obtain prevalence ratios. Results: We identified a modern pattern, a mixed-traditional pattern, and a mixed-modern pattern. The modern pattern included a high intake of imported and processed foods, including pizza, cheeseburgers, margarine, sugary drinks, desserts, snacks, egg products, noodles, nuts, breads, and cakes and a low intake of traditional agricultural products and fish. The mixed-traditional pattern had a high intake of neotraditional foods, including fruits, vegetables, soup, poultry, and fish, and imported and processed foods, including dairy products, breads, and cakes. The mixed-modern pattern was loaded with imported and processed foods, including pizza, cheeseburgers, red meat, egg products, noodles, and grains, but also with neotraditional foods, such as seafood and coconut. It also included a low intake of fish, tea, coffee, soup, and traditional agricultural staples. Higher adherence to the mixed-modern pattern was associated with lower abdominal circumference ( P -trend < 0.0001), lower serum triglycerides ( P -trend = 0.03), and higher serum HDL cholesterol ( P -trend = 0.0003). The mixed-modern pattern was inversely associated with prevalence of metabolic syndrome (the highest quintile: prevalence ratio = 0.79; 95% CI: 0.69, 0.91; P -trend = 0.006). Conclusion: Mixed dietary patterns containing healthier foods, rather than a largely imported and processed modern diet, may help prevent metabolic syndrome in Samoa. © 2017 American Society for Nutrition.
Psychometric Properties of the Shipley Block Design Task: A Study with Jamaican Young Adults
ERIC Educational Resources Information Center
Beaujean, A. Alexander; Hull, Darrell M.; Sheng, Yanyan; Worrell, Frank C.; Bolen, Judy; Verdisco, Aimee E.
2017-01-01
We examined the structure of the new "Block Patterns" (BP) test from the Shipley Institute of Living Scale-Second Edition in a sample of Jamaican young adults. To date, very little has been published on the properties of this subtest's items and scores. The BP test is similar in design to the Block Design subtest found in many cognitive…
ERIC Educational Resources Information Center
Keating, Xiaofen D.; Castro-Pinero, Jose; Centeio, Erin; Harrison, Louis, Jr.; Ramirez, Tere; Chen, Li
2010-01-01
This study examined student health-related fitness (HRF) knowledge and its relationship to physical activity (PA). The participants were undergraduate students from a large U.S. state university. HRF knowledge was assessed using a test consisting of 150 multiple choice items. Differences in HRF knowledge scores by sex, ethnicity, and years in…
Allen Gomes, Ana; Ruivo Marques, Daniel; Meia-Via, Ana Maria; Meia-Via, Mariana; Tavares, José; Fernandes da Silva, Carlos; Pinto de Azevedo, Maria Helena
2015-04-01
Based on successive samples totaling more than 5000 higher education students, we scrutinized the reliability, structure, initial validity and normative scores of a brief self-report seven-item scale to screen for the continuum of nighttime insomnia complaints/perceived sleep quality, used by our team for more than a decade, henceforth labeled the Basic Scale on Insomnia complaints and Quality of Sleep (BaSIQS). In study/sample 1 (n = 1654), the items were developed based on part of a larger survey on higher education sleep-wake patterns. The test-retest study was conducted in an independent small group (n = 33) with a 2-8 week gap. In study/sample 2 (n = 360), focused mainly on validity, the BaSIQS was completed together with the Pittsburgh Sleep Quality Index (PSQI). In study 3, a large recent sample of students from universities all over the country (n = 2995) answered the BaSIQS items, based on which normative scores were determined, and an additional question on perceived sleep problems in order to further analyze the scale's validity. Regarding reliability, Cronbach alpha coefficients were systematically higher than 0.7, and the test-retest correlation coefficient was greater than 0.8. Structure analyses revealed consistently satisfactory two-factor and single-factor solutions. Concerning validity analyses, BaSIQS scores were significantly correlated with PSQI component scores and overall score (r = 0.652 corresponding to a large association); mean scores were significantly higher in those students classifying themselves as having sleep problems (p < 0.0001, d = 0.99 corresponding to a large effect size). In conclusion, the BaSIQS is very easy to administer, and appears to be a reliable and valid scale in higher education students. It might be a convenient short tool in research and applied settings to rapidly assess sleep quality or screen for insomnia complaints, and it may be easily used in other populations with minor adaptations.
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.
Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E
2018-02-02
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Doğanay Erdoğan, Beyza; Elhan, Atilla Halİl; Kaskatı, Osman Tolga; Öztuna, Derya; Küçükdeveci, Ayşe Adile; Kutlay, Şehim; Tennant, Alan
2017-10-01
This study aimed to explore the potential of an inclusive and fully integrated measurement system for the Activities component of the International Classification of Functioning, Disability and Health (ICF), incorporating four classical scales, including the Health Assessment Questionnaire (HAQ), and a Computerized Adaptive Testing (CAT). Three hundred patients with rheumatoid arthritis (RA) answered relevant questions from four questionnaires. Rasch analysis was performed to create an item bank using this item pool. A further 100 RA patients were recruited for a CAT application. Both real and simulated CATs were applied and the agreement between these CAT-based scores and 'paper-pencil' scores was evaluated with intraclass correlation coefficient (ICC). Anchoring strategies were used to obtain a direct translation from the item bank common metric to the HAQ score. Mean age of 300 patients was 52.3 ± 11.7 years; disease duration was 11.3 ± 8.0 years; 74.7% were women. After testing for the assumptions of Rasch analysis, a 28-item Activities item bank was created. The agreement between CAT-based scores and paper-pencil scores were high (ICC = 0.993). Using those HAQ items in the item bank as anchoring items, another Rasch analysis was performed with HAQ-8 scores as separate items together with anchoring items. Finally a conversion table of the item bank common metric to the HAQ scores was created. A fully integrated and inclusive health assessment system, illustrating the Activities component of the ICF, was built to assess RA patients. Raw score to metric conversions and vice versa were available, giving access to the metric by a simple look-up table. © 2015 Asia Pacific League of Associations for Rheumatology and Wiley Publishing Asia Pty Ltd.
Sotos-Prieto, Mercedes; Moreno-Franco, Belén; Ordovás, Jose M; León, Montse; Casasnovas, Jose A; Peñalvo, Jose L
2015-04-01
To design and develop a questionnaire that can account for an individual's adherence to a Mediterranean lifestyle including the assessment of diet and physical activity patterns, as well as social interaction. The Mediterranean Lifestyle (MEDLIFE) index was created based on the current Spanish Mediterranean food guide pyramid. MEDLIFE is a twenty-eight-item derived index consisting of questions about food consumption (fifteen items), traditional Mediterranean dietary habits (seven items) and physical activity, rest and social interaction habits (six items). Linear regression models and Spearman rank correlation were fitted to assess content validity and internal consistency. A subset of participants in the Aragon Workers' Health Study cohort (Zaragoza, Spain) provided the data for development of MEDLIFE. Participants (n 988) of the Aragon Workers' Health Study cohort in Spain. Mean MEDLIFE score was 11·3 (sd 2·6; range: 0-28), and the quintile distribution of MEDLIFE score showed a significant association with each of the individual items as well as with specific nutrients and lifestyle indicators (intra-validity). We also quantified MEDLIFE correspondence with previously reported diet quality indices and found significant correlations (ρ range: 0·44-0·53; P<0·001) for the Alternate Healthy Eating Index, the Alternate Mediterranean Diet Index and Mediterranean Diet Adherence Screener. MEDLIFE is the first index to include an overall assessment of lifestyle habits. It is expected to be a more holistic tool to measure adherence to the Mediterranean lifestyle in epidemiological studies.
Validation of a literature-based adherence score to Mediterranean diet: the MEDI-LITE score.
Sofi, Francesco; Dinu, Monica; Pagliai, Giuditta; Marcucci, Rossella; Casini, Alessandro
2017-09-01
Numerous studies have demonstrated a relationship between adherence to Mediterranean diet and prevention of chronic degenerative diseases. The aim of this study was to validate a novel instrument to measure adherence to Mediterranean diet based on the literature (the MEDI-LITE score). Two-hundred-and-four clinically healthy subjects completed both the MEDI-LITE score and the validated MedDietScore (MDS). Significant positive correlation between the MEDI-LITE and the MDS scores was found in the study population (R = .70; p < .0001). Furthermore, statistically significant positive correlations were found for all the nine different food groups. According to the receiver operating characteristic (ROC) curve analysis, MEDI-LITE evidenced a significant discriminative capacity between adherents and non-adherents to the Mediterranean diet pattern (optimal cut-off point = 8.50; sensitivity = 96%; specificity = 38%). In conclusion, our findings show that the MEDI-LITE score well correlate with MDS in both global score and in most of the items related to the specific food categories.
Salathé, Cornelia Rolli; Trippolini, Maurizio Alen; Terribilini, Livio Claudio; Oliveri, Michael; Elfering, Achim
2018-06-01
Purpose To develop a multidimensional scale to asses psychosocial beliefs-the Yellow Flag Questionnaire (YFQ)-aimed at guiding interventions for workers with chronic musculoskeletal (MSK) pain. Methods Phase 1 consisted of item selection based on literature search, item development and expert consensus rounds. In phase 2, items were reduced with calculating a quality-score per item, using structure equation modeling and confirmatory factor analysis on data from 666 workers. In phase 3, Cronbach's α, and Pearson correlations coefficients were computed to compare YFQ with disability, anxiety, depression and self-efficacy and the YFQ score based on data from 253 injured workers. Regressions of YFQ total score on disability, anxiety, depression and self-efficacy were calculated. Results After phase 1, the YFQ included 116 items and 15 domains. Further reductions of items in phase 2 by applying the item quality criteria reduced the total to 48 items. Phase factor analysis with structural equation modeling confirmed 32 items in seven domains: activity, work, emotions, harm & blame, diagnosis beliefs, co-morbidity and control. Cronbach α was 0.91 for the total score, between 0.49 and 0.81 for the 7 distinct scores of each domain, respectively. Correlations between YFQ total score ranged with disability, anxiety, depression and self-efficacy was .58, .66, .73, -.51, respectively. After controlling for age and gender the YFQ total score explained between R2 27% and R2 53% variance of disability, anxiety, depression and self-efficacy. Conclusions The YFQ, a multidimensional screening scale is recommended for use to assess psychosocial beliefs of workers with chronic MSK pain. Further evaluation of the measurement properties such as the test-retest reliability, responsiveness and prognostic validity is warranted.
Harris, Joshua D; Erickson, Brandon J; Cvetanovich, Gregory L; Abrams, Geoffrey D; McCormick, Frank M; Gupta, Anil K; Verma, Nikhil N; Bach, Bernard R; Cole, Brian J
2014-02-01
Condition-specific questionnaires are important components in evaluation of outcomes of surgical interventions. No condition-specific study methodological quality questionnaire exists for evaluation of outcomes of articular cartilage surgery in the knee. To develop a reliable and valid knee articular cartilage-specific study methodological quality questionnaire. Cross-sectional study. A stepwise, a priori-designed framework was created for development of a novel questionnaire. Relevant items to the topic were identified and extracted from a recent systematic review of 194 investigations of knee articular cartilage surgery. In addition, relevant items from existing generic study methodological quality questionnaires were identified. Items for a preliminary questionnaire were generated. Redundant and irrelevant items were eliminated, and acceptable items modified. The instrument was pretested and items weighed. The instrument, the MARK score (Methodological quality of ARticular cartilage studies of the Knee), was tested for validity (criterion validity) and reliability (inter- and intraobserver). A 19-item, 3-domain MARK score was developed. The 100-point scale score demonstrated face validity (focus group of 8 orthopaedic surgeons) and criterion validity (strong correlation to Cochrane Quality Assessment score and Modified Coleman Methodology Score). Interobserver reliability for the overall score was good (intraclass correlation coefficient [ICC], 0.842), and for all individual items of the MARK score, acceptable to perfect (ICC, 0.70-1.000). Intraobserver reliability ICC assessed over a 3-week interval was strong for 2 reviewers (≥0.90). The MARK score is a valid and reliable knee articular cartilage condition-specific study methodological quality instrument. This condition-specific questionnaire may be used to evaluate the quality of studies reporting outcomes of articular cartilage surgery in the knee.
Handling missing values in the MDS-UPDRS.
Goetz, Christopher G; Luo, Sheng; Wang, Lu; Tilley, Barbara C; LaPelle, Nancy R; Stebbins, Glenn T
2015-10-01
This study was undertaken to define the number of missing values permissible to render valid total scores for each Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) part. To handle missing values, imputation strategies serve as guidelines to reject an incomplete rating or create a surrogate score. We tested a rigorous, scale-specific, data-based approach to handling missing values for the MDS-UPDRS. From two large MDS-UPDRS datasets, we sequentially deleted item scores, either consistently (same items) or randomly (different items) across all subjects. Lin's Concordance Correlation Coefficient (CCC) compared scores calculated without missing values with prorated scores based on sequentially increasing missing values. The maximal number of missing values retaining a CCC greater than 0.95 determined the threshold for rendering a valid prorated score. A second confirmatory sample was selected from the MDS-UPDRS international translation program. To provide valid part scores applicable across all Hoehn and Yahr (H&Y) stages when the same items are consistently missing, one missing item from Part I, one from Part II, three from Part III, but none from Part IV can be allowed. To provide valid part scores applicable across all H&Y stages when random item entries are missing, one missing item from Part I, two from Part II, seven from Part III, but none from Part IV can be allowed. All cutoff values were confirmed in the validation sample. These analyses are useful for constructing valid surrogate part scores for MDS-UPDRS when missing items fall within the identified threshold and give scientific justification for rejecting partially completed ratings that fall below the threshold. © 2015 International Parkinson and Movement Disorder Society.
Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items
ERIC Educational Resources Information Center
Cher Wong, Cheow
2015-01-01
Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Automatic Scoring of Paper-and-Pencil Figural Responses. Research Report.
ERIC Educational Resources Information Center
Martinez, Michael E.; And Others
Large-scale testing is dominated by the multiple-choice question format. Widespread use of the format is due, in part, to the ease with which multiple-choice items can be scored automatically. This paper examines automatic scoring procedures for an alternative item type: figural response. Figural response items call for the completion or…
Automatically Scoring Short Essays for Content. CRESST Report 836
ERIC Educational Resources Information Center
Kerr, Deirdre; Mousavi, Hamid; Iseli, Markus R.
2013-01-01
The Common Core assessments emphasize short essay constructed response items over multiple choice items because they are more precise measures of understanding. However, such items are too costly and time consuming to be used in national assessments unless a way is found to score them automatically. Current automatic essay scoring techniques are…
Preequating with Empirical Item Characteristic Curves: An Observed-Score Preequating Method
ERIC Educational Resources Information Center
Zu, Jiyun; Puhan, Gautam
2014-01-01
Preequating is in demand because it reduces score reporting time. In this article, we evaluated an observed-score preequating method: the empirical item characteristic curve (EICC) method, which makes preequating without item response theory (IRT) possible. EICC preequating results were compared with a criterion equating and with IRT true-score…
A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries
ERIC Educational Resources Information Center
Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.
2012-01-01
Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale…
Variation in the Readability of Items Within Surveys
Calderón, José L.; Morales, Leo S.; Liu, Honghu; Hays, Ron D.
2006-01-01
The objective of this study was to estimate the variation in the readability of survey items within 2 widely used health-related quality-of-life surveys: the National Eye Institute Visual Functioning Questionnaire–25 (VFQ-25) and the Short Form Health Survey, version 2 (SF-36v2). Flesch-Kincaid and Flesch Reading Ease formulas were used to estimate readability. Individual survey item scores and descriptive statistics for each survey were calculated. Variation of individual item scores from the mean survey score was graphically depicted for each survey. The mean reading grade level and reading ease estimates for the VFQ-25 and SF-36v2 were 7.8 (fairly easy) and 6.4 (easy), respectively. Both surveys had notable variation in item readability; individual item readability scores ranged from 3.7 to 12.0 (very easy to difficult) for the VFQ-25 and 2.2 to 12.0 (very easy to difficult) for the SF-36v2. Because survey respondents may not comprehend items with readability scores that exceed their reading ability, estimating the readability of each survey item is an important component of evaluating survey readability. Standards for measuring the readability of surveys are needed. PMID:16401705
Kisala, Pamela A; Tulsky, David S; Kalpakjian, Claire Z; Heinemann, Allen W; Pohlig, Ryan T; Carle, Adam; Choi, Seung W
2015-05-01
To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.
Development, scoring, and reliability of the Microscale Audit of Pedestrian Streetscapes (MAPS)
2013-01-01
Background Streetscape (microscale) features of the built environment can influence people’s perceptions of their neighborhoods’ suitability for physical activity. Many microscale audit tools have been developed, but few have published systematic scoring methods. We present the development, scoring, and reliability of the Microscale Audit of Pedestrian Streetscapes (MAPS) tool and its theoretically-based subscales. Methods MAPS was based on prior instruments and was developed to assess details of streetscapes considered relevant for physical activity. MAPS sections (route, segments, crossings, and cul-de-sacs) were scored by two independent raters for reliability analyses. There were 290 route pairs, 516 segment pairs, 319 crossing pairs, and 53 cul-de-sac pairs in the reliability sample. Individual inter-rater item reliability analyses were computed using Kappa, intra-class correlation coefficient (ICC), and percent agreement. A conceptual framework for subscale creation was developed using theory, expert consensus, and policy relevance. Items were grouped into subscales, and subscales were analyzed for inter-rater reliability at tiered levels of aggregation. Results There were 160 items included in the subscales (out of 201 items total). Of those included in the subscales, 80 items (50.0%) had good/excellent reliability, 41 items (25.6%) had moderate reliability, and 18 items (11.3%) had low reliability, with limited variability in the remaining 21 items (13.1%). Seventeen of the 20 route section subscales, valence (positive/negative) scores, and overall scores (85.0%) demonstrated good/excellent reliability and 3 demonstrated moderate reliability. Of the 16 segment subscales, valence scores, and overall scores, 12 (75.0%) demonstrated good/excellent reliability, three demonstrated moderate reliability, and one demonstrated poor reliability. Of the 8 crossing subscales, valence scores, and overall scores, 6 (75.0%) demonstrated good/excellent reliability, and 2 demonstrated moderate reliability. The cul-de-sac subscale demonstrated good/excellent reliability. Conclusions MAPS items and subscales predominantly demonstrated moderate to excellent reliability. The subscales and scoring system represent a theoretically based framework for using these complex microscale data and may be applicable to other similar instruments. PMID:23621947
NASA Astrophysics Data System (ADS)
Peters, John S.
This study used a multiple response model (MRM) on selected items from the Views on Science-Technology-Society (VOSTS) survey to examine science-technology-society (STS) literacy among college non-science majors' taught using Problem/Case Studies Based Learning (PBL/CSBL) and traditional expository methods of instruction. An initial pilot investigation of 15 VOSTS items produced a valid and reliable scoring model which can be used to quantitatively assess student literacy on a variety of STS topics deemed important for informed civic engagement in science related social and environmental issues. The new scoring model allows for the use of parametric inferential statistics to test hypotheses about factors influencing STS literacy. The follow-up cross-institutional study comparing teaching methods employed Hierarchical Linear Modeling (HLM) to model the efficiency and equitability of instructional methods on STS literacy. A cluster analysis was also used to compare pre and post course patterns of student views on the set of positions expressed within VOSTS items. HLM analysis revealed significantly higher instructional efficiency in the PBL/CSBL study group for 4 of the 35 STS attitude indices (characterization of media vs. school science; tentativeness of scientific models; cultural influences on scientific research), and more equitable effects of traditional instruction on one attitude index (interdependence of science and technology). Cluster analysis revealed generally stable patterns of pre to post course views across study groups, but also revealed possible teaching method effects on the relationship between the views expressed within VOSTS items with respect to (1) interdependency of science and technology; (2) anti-technology; (3) socioscientific decision-making; (4) scientific/technological solutions to environmental problems; (5) usefulness of school vs. media characterizations of science; (6) social constructivist vs. objectivist views of theories; (7) impact of cultural religious/ethical views on science; (8) tentativeness of scientific models, evidence and predictions; (9) civic control of technological developments. This analysis also revealed common relationships between student views which would not have been revealed under the original unique response model (URM) of VOSTS and also common viewpoint patterns that warrant further qualitative exploration.
Bleau Lavigne, Maude; Reeves, Isabelle; Sasseville, Marie-Josée; Loignon, Christine
The primary purpose of this study was to develop 2 survey tools to explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in primary health care settings. One survey was intended for the patients receiving care for a diabetic foot ulcer in primary health care settings and the other was intended for the health professionals providing treatment. The second purpose of this study was to evaluate the psychometric properties of the 2 surveys. Development and validation of survey instruments. Two surveys were developed using a published guide. Following review of pertinent literature and identification of variables to be measured, a bank of items was developed and pretested to determine clarity of the item and responses. Psychometric testing comprised measurement of content validity index (CVI) and intraclass correlation coefficient (ICC). Only items obtaining satisfactory CVI and ICC scores were included in the final version of the surveys. The final version of the patient survey contained 41 items and the final version of the survey for health care professionals contained 21 items. The patient-intended survey's items demonstrate high content validity scores and satisfactory test-retest reliability scores. The overall CVI score was 0.98. Forty of the 49 items eligible for testing obtain satisfactory ICC scores. One item's test-retest reliability could not be tested but it was retained based on its high CVI. The health professional-intended survey, an overall CVI score of 0.91 but items had lower ICC scores (63%, 31 of the 49 items), did not achieve a satisfactory ICC score for inclusion in the final instrument. This project led to development of 2 instruments designed to identify and explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in the primary health care setting. Future research and testing is required to translate these French surveys into English and additional languages, in order to reach a broader population.
Lonsdale, Chris; Hodge, Ken; Rose, Elaine A
2008-06-01
The purpose of the four studies described in this article was to develop and test a new measure of competitive sport participants' intrinsic motivation, extrinsic motivation, and amotivation (self-determination theory; Deci & Ryan, 1985). The items for the new measure, named the Behavioral Regulation in Sport Questionnaire (BRSQ), were constructed using interviews, expert review, and pilot testing. Analyses supported the internal consistency, test-retest reliability, and factorial validity of the BRSQ scores. Nomological validity evidence was also supportive, as BRSQ subscale scores were correlated in the expected pattern with scores derived from measures of motivational consequences. When directly compared with scores derived from the Sport Motivation Scale (SMS; Pelletier, Fortier, Vallerand, Tuson, & Blais, 1995) and a revised version of that questionnaire (SMS-6; Mallett, Kawabata, Newcombe, Otero-Forero, & Jackson, 2007), BRSQ scores demonstrated equal or superior reliability and factorial validity as well as better nomological validity.
Audio-Enhanced Tablet Computers to Assess Children's Food Frequency From Migrant Farmworker Mothers.
Kilanowski, Jill F; Trapl, Erika S; Kofron, Ryan M
2013-06-01
This study sought to improve data collection in children's food frequency surveys for non-English speaking immigrant/migrant farmworker mothers using audio-enhanced tablet computers (ATCs). We hypothesized that by using technological adaptations, we would be able to improve data capture and therefore reduce lost surveys. This Food Frequency Questionnaire (FFQ), a paper-based dietary assessment tool, was adapted for ATCs and assessed consumption of 66 food items asking 3 questions for each food item: frequency, quantity of consumption, and serving size. The tablet-based survey was audio enhanced with each question "read" to participants, accompanied by food item images, together with an embedded short instructional video. Results indicated that respondents were able to complete the 198 questions from the 66 food item FFQ on ATCs in approximately 23 minutes. Compared with paper-based FFQs, ATC-based FFQs had less missing data. Despite overall reductions in missing data by use of ATCs, respondents still appeared to have difficulty with question 2 of the FFQ. Ability to score the FFQ was dependent on what sections missing data were located. Unlike the paper-based FFQs, no ATC-based FFQs were unscored due to amount or location of missing data. An ATC-based FFQ was feasible and increased ability to score this survey on children's food patterns from migrant farmworker mothers. This adapted technology may serve as an exemplar for other non-English speaking immigrant populations.
Validation of the 'Test of the Adherence to Inhalers' (TAI) for Asthma and COPD Patients.
Plaza, Vicente; Fernández-Rodríguez, Concepción; Melero, Carlos; Cosío, Borja G; Entrenas, Luís Manuel; de Llano, Luis Pérez; Gutiérrez-Pereyra, Fernando; Tarragona, Eduard; Palomino, Rosa; López-Viña, Antolín
2016-04-01
To validate the 'Test of Adherence to Inhalers' (TAI), a 12-item questionnaire designed to assess the adherence to inhalers in patients with COPD or asthma. A total of 1009 patients with asthma or COPD participated in a cross-sectional multicenter study. Patients with electronic adherence ≥80% were defined as adherents. Construct validity, internal validity, and criterion validity were evaluated. Self-reported adherence was compared with the Morisky-Green questionnaire. Factor analysis study demonstrated two factors, factor 1 was coincident with TAI patient domain (items 1 to 10) and factor 2 with TAI health-care professional domain (items 11 and 12). The Cronbach's alpha was 0.860 and the test-retest reliability 0.883. TAI scores correlated with electronic adherence (ρ=0.293, p=0.01). According to the best cut-off for 10 items (score 50, area under the ROC curve 0.7), 569 (62.5%) patients were classified as non-adherents. The non-adherence behavior pattern was: erratic 527 (57.9%), deliberate 375 (41.2%), and unwitting 242 (26.6%) patients. As compared to Morisky-Green test, TAI showed better psychometric properties. The TAI is a reliable and homogeneous questionnaire to identify easily non-adherence and to classify from a clinical perspective the barriers related to the use of inhalers in asthma and COPD.
Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D
2017-07-01
The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Differential item functioning magnitude and impact measures from item response theory models.
Kleinman, Marjorie; Teresi, Jeanne A
2016-01-01
Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
ERIC Educational Resources Information Center
Yao, Lihua
2014-01-01
The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle;…
ERIC Educational Resources Information Center
Kim, Hyung Jin; Brennan, Robert L.; Lee, Won-Chan
2017-01-01
In equating, when common items are internal and scoring is conducted in terms of the number of correct items, some pairs of total scores ("X") and common-item scores ("V") can never be observed in a bivariate distribution of "X" and "V"; these pairs are called "structural zeros." This simulation…
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2017-01-01
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
ERIC Educational Resources Information Center
Green, Samuel B.; Yang, Yanyun
2009-01-01
A method is presented for estimating reliability using structural equation modeling (SEM) that allows for nonlinearity between factors and item scores. Assuming the focus is on consistency of summed item scores, this method for estimating reliability is preferred to those based on linear SEM models and to the most commonly reported estimate of…
Stability of Rasch Scales over Time
ERIC Educational Resources Information Center
Taylor, Catherine S.; Lee, Yoonsun
2010-01-01
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Turner-Stokes, Lynne; Vanderstay, Roxana; Stevermuer, Tara; Simmonds, Frances; Khan, Fary; Eagar, Kathy
2015-01-01
Objective To describe and compare outcomes from in-patient rehabilitation (IPR) in working-aged adults across different groups of long-term neurological conditions, as defined by the UK National Service Framework. Design Analysis of a large Australian prospectively collected dataset for completed IPR episodes (n = 28,596) from 2003-2012. Methods De-identified data for adults (16–65 years) with specified neurological impairment codes were extracted, cleaned and divided into ‘Sudden-onset’ conditions: (Stroke (n = 12527), brain injury (n = 7565), spinal cord injury (SCI) (n = 3753), Guillain-Barré syndrome (GBS) (n = 805)) and ‘Progressive/stable’ conditions (Progressive (n = 3750) and Cerebral palsy (n = 196)). Key outcomes included Functional Independence Measure (FIM) scores, length of stay (LOS), and discharge destination. Results Mean LOS ranged from 21–57 days with significant group differences in gender, source of admission and discharge destination. All six groups showed significant change (p<0.001) between admission and discharge that was likely to be clinically important across a range of items. Significant between-group differences were observed for FIM Motor and Cognitive change scores (Kruskal-Wallis p<0.001), and item-by-item analysis confirmed distinct patterns for each of the six groups. SCI and GBS patients were generally at the ceiling of the cognitive subscale. The ‘Progressive/stable’ conditions made smaller improvements in FIM score than the ‘Sudden-onset conditions’, but also had shorter LOS. Conclusion All groups made gains in independence during admission, although pattern of change varied between conditions, and ceiling effects were observed in the FIM-cognitive subscale. Relative cost-efficiency between groups can only be indirectly inferred. Limitations of the current dataset are discussed, together with opportunities for expansion and further development. PMID:26167877
Roorda, Leo D; Green, John R; Houwink, Annemieke; Bagley, Pam J; Smith, Jane; Molenaar, Ivo W; Geurts, Alexander C
2012-06-01
To enable improved interpretation of the total score and faster scoring of the Rivermead Mobility Index (RMI) by studying item ordering or hierarchy and formulating start-and-stop rules in patients after stroke. Cohort study. Rehabilitation center in the Netherlands; stroke rehabilitation units and the community in the United Kingdom. Item hierarchy of the RMI was studied in an initial group of patients (n=620; mean age ± SD, 69.2±12.5y; 297 [48%] men; 304 [49%] left hemisphere lesion, and 269 [43%] right hemisphere lesion), and the adequacy of the item hierarchy-based start-and-stop rules was checked in a second group of patients (n=237; mean age ± SD, 60.0±11.3y; 139 [59%] men; 103 [44%] left hemisphere lesion, and 93 [39%] right hemisphere lesion) undergoing rehabilitation after stroke. Not applicable. Mokken scale analysis was used to investigate the fit of the double monotonicity model, indicating hierarchical item ordering. The percentages of patients with a difference between the RMI total score and the scores based on the start-and-stop rules were calculated to check the adequacy of these rules. The RMI had good fit of the double monotonicity model (coefficient H(T)=.87). The interpretation of the total score improved. Item hierarchy-based start-and-stop rules were formulated. The percentages of patients with a difference between the RMI total score and the score based on the recommended start-and-stop rules were 3% and 5%, respectively. Ten of the original 15 items had to be scored after applying the start-and-stop rules. Item hierarchy was established, enabling improved interpretation and faster scoring of the RMI. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Teicher, Martin H.; Parigger, Angelika
2015-01-01
There is increasing interest in childhood maltreatment as a potent stimulus that may alter trajectories of brain development, induce epigenetic modifications and enhance risk for medical and psychiatric disorders. Although a number of useful scales exist for retrospective assessment of abuse and neglect they have significant limitations. Moreover, they fail to provide detailed information on timing of exposure, which is critical for delineation of sensitive periods. The Maltreatment and Abuse Chronology of Exposure (MACE) scale was developed in a sample of 1051 participants using item response theory to gauge severity of exposure to ten types of maltreatment (emotional neglect, non-verbal emotional abuse, parental physical maltreatment, parental verbal abuse, peer emotional abuse, peer physical bullying, physical neglect, sexual abuse, witnessing interparental violence and witnessing violence to siblings) during each year of childhood. Items included in the subscales had acceptable psychometric properties based on infit and outfit mean square statistics, and each subscale passed Andersen’s Likelihood ratio test. The MACE provides an overall severity score and multiplicity score (number of types of maltreatment experienced) with excellent test-retest reliability. Each type of maltreatment showed good reliability as did severity of exposure across each year of childhood. MACE Severity correlated 0.738 with Childhood Trauma Questionnaire (CTQ) score and MACE Multiplicity correlated 0.698 with the Adverse Childhood Experiences scale (ACE). However, MACE accounted for 2.00- and 2.07-fold more of the variance, on average, in psychiatric symptom ratings than CTQ or ACE, respectively, based on variance decomposition. Different types of maltreatment had distinct and often unique developmental patterns. The 52-item MACE, a simpler Maltreatment Abuse and Exposure Scale (MAES) that only assesses overall exposure and the original test instrument (MACE-X) with several additional items plus spreadsheets and R code for scoring are provided to facilitate use and to spur further development. PMID:25714856
Development and validation of the Myasthenia Gravis Impairment Index.
Barnett, Carolina; Bril, Vera; Kapral, Moira; Kulkarni, Abhaya; Davis, Aileen M
2016-08-30
We aimed to develop a measure of myasthenia gravis impairment using a previously developed framework and to evaluate reliability and validity, specifically face, content, and construct validity. The first draft of the Myasthenia Gravis Impairment Index (MGII) included examination items from available measures enriched with newly developed, patient-reported items, modified after patient input. International neuromuscular specialists evaluated face and content validity via an e-mail survey. Test-retest reliability was assessed in stable patients at a 3-week interval and interrater reliability was evaluated in the same day. Construct validity was assessed through correlations between the MGII and other measures and by comparing scores in different patient groups. The first draft was assessed by 18 patients, and 72 specialists answered the survey. The second draft had 7 examination and 22 patient-reported items. Field testing included 200 patients, with 54 patients completing the reliability studies. Test-retest reliability of the total score was good (intraclass correlation coefficient 0.92; 95% confidence interval 0.79-0.94), as was interrater reliability of the examination component (intraclass correlation coefficient 0.81; 95% confidence interval 0.79-0.94). The MGII correlated well with comparison measures, with higher correlations with the MG-activities of daily living (r = 0.91) and MG-specific quality of life 15-item scale (r = 0.78). When assessing different patient groups, the scores followed expected patterns. The MGII was developed using a patient-centered framework of myasthenia-related impairments and incorporating patient input throughout the development process. It is reliable in an outpatient setting and has demonstrated construct validity. Responsiveness studies are under way. © 2016 American Academy of Neurology.
Development and validation of the Myasthenia Gravis Impairment Index
Bril, Vera; Kapral, Moira; Kulkarni, Abhaya; Davis, Aileen M.
2016-01-01
Objective: We aimed to develop a measure of myasthenia gravis impairment using a previously developed framework and to evaluate reliability and validity, specifically face, content, and construct validity. Methods: The first draft of the Myasthenia Gravis Impairment Index (MGII) included examination items from available measures enriched with newly developed, patient-reported items, modified after patient input. International neuromuscular specialists evaluated face and content validity via an e-mail survey. Test–retest reliability was assessed in stable patients at a 3-week interval and interrater reliability was evaluated in the same day. Construct validity was assessed through correlations between the MGII and other measures and by comparing scores in different patient groups. Results: The first draft was assessed by 18 patients, and 72 specialists answered the survey. The second draft had 7 examination and 22 patient-reported items. Field testing included 200 patients, with 54 patients completing the reliability studies. Test–retest reliability of the total score was good (intraclass correlation coefficient 0.92; 95% confidence interval 0.79–0.94), as was interrater reliability of the examination component (intraclass correlation coefficient 0.81; 95% confidence interval 0.79–0.94). The MGII correlated well with comparison measures, with higher correlations with the MG–activities of daily living (r = 0.91) and MG-specific quality of life 15-item scale (r = 0.78). When assessing different patient groups, the scores followed expected patterns. Conclusions: The MGII was developed using a patient-centered framework of myasthenia-related impairments and incorporating patient input throughout the development process. It is reliable in an outpatient setting and has demonstrated construct validity. Responsiveness studies are under way. PMID:27402891
Goal orientation in surgical residents: a study of the motivation behind learning.
Hoffman, Rebecca L; Hudak-Rosander, Cristina; Datta, Jashodeep; Morris, Jon B; Kelz, Rachel R
2014-08-01
The subconscious way in which an individual approaches learning, goal orientation (GO), has been shown to influence job satisfaction, job performance, and burnout in nonmedical cohorts. The aim of this study was to adapt and validate an instrument to assess GO in surgical residents, so that in the future, we can better understand how differences in motivation affect professional development. Residents were recruited to complete a 17-item survey adapted from the Patterns of Adaptive Learning Scales (PALS). The survey included three scales assessing GO in residency-specific terms. Items were scored on a 5-point Likert scale, and the psychometric properties of the adapted and original PALS were compared. Ninety-five percent of residents (61/64) participated. Median age was 30 y and 33% were female. Mean (standard deviation) scale scores for the adapted PALS were: mastery 4.30 (0.48), performance approach (PAP) 3.17 (0.99), and performance avoid 2.75 (0.88). Mean (standard deviation) scale scores for the original PALS items were: mastery 3.35 (1.02), PAP 2.76 (1.15), and performance avoid 2.41 (0.91). Cronbach alpha were α = 0.89 and α = 0.84 for the adapted PAP and avoid scales, respectively, which were comparable with the original scales. For the adapted mastery scale, α = 0.54. Exploratory factor analysis revealed five factors, and factor loadings for individual mastery items did not load consistently onto a single factor. This study represents the first steps in the development of a novel tool to measure GO among surgical residents. Understanding motivational psychology in residents may facilitate improved education and professional development. Copyright © 2014 Elsevier Inc. All rights reserved.
MMPI-2-RF characteristics of custody evaluation litigants.
Archer, Elizabeth M; Hagan, Leigh D; Mason, Janelle; Handel, Richard; Archer, Robert P
2012-03-01
The Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) is a 338-item objective self-report measure drawn from the 567 items of the MMPI-2. Although there is a substantial MMPI-2 literature regarding child custody litigants, there has been only one previously published study using MMPI-2-RF data in this population that focused on Validity scales L-r and K-r. The current study evaluated the MMPI-2-RF results of 344 child custody litigants and showed substantial consistency between T-score elevations typically found on MMPI-2 Validity scales L and K, and comparable elevations for MMPI-2-RF validity scales L-r and K-r. Mean T-scores well within normal limits characterized results for clinical scales on both instruments. The RC scale intercorrelation patterns, and alpha coefficient values found for MMPI-2-RF scales in a custody population, were also found to be very similar to those reported for other populations. Directions for future research are presented.
Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E
2015-08-01
The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
Hays, Ron D; Revicki, Dennis A; Feeny, David; Fayers, Peter; Spritzer, Karen L; Cella, David
2016-10-01
Preference-based health-related quality of life (HR-QOL) scores are useful as outcome measures in clinical studies, for monitoring the health of populations, and for estimating quality-adjusted life-years. This was a secondary analysis of data collected in an internet survey as part of the Patient-Reported Outcomes Measurement Information System (PROMIS(®)) project. To estimate Health Utilities Index Mark 3 (HUI-3) preference scores, we used the ten PROMIS(®) global health items, the PROMIS-29 V2.0 single pain intensity item and seven multi-item scales (physical functioning, fatigue, pain interference, depressive symptoms, anxiety, ability to participate in social roles and activities, sleep disturbance), and the PROMIS-29 V2.0 items. Linear regression analyses were used to identify significant predictors, followed by simple linear equating to avoid regression to the mean. The regression models explained 48 % (global health items), 61 % (PROMIS-29 V2.0 scales), and 64 % (PROMIS-29 V2.0 items) of the variance in the HUI-3 preference score. Linear equated scores were similar to observed scores, although differences tended to be larger for older study participants. HUI-3 preference scores can be estimated from the PROMIS(®) global health items or PROMIS-29 V2.0. The estimated HUI-3 scores from the PROMIS(®) health measures can be used for economic applications and as a measure of overall HR-QOL in research.
Ni, Pengsheng; McDonough, Christine M.; Jette, Alan M.; Bogusz, Kara; Marfeo, Elizabeth E.; Rasch, Elizabeth K.; Brandt, Diane E.; Meterko, Mark; Chan, Leighton
2014-01-01
Objectives To develop and test an instrument to assess physical function (PF) for Social Security Administration (SSA) disability programs, the SSA-PF. Item Response Theory (IRT) analyses were used to 1) create a calibrated item bank for each of the factors identified in prior factor analyses, 2) assess the fit of the items within each scale, 3) develop separate Computer-Adaptive Test (CAT) instruments for each scale, and 4) conduct initial psychometric testing. Design Cross-sectional data collection; IRT analyses; CAT simulation. Setting Telephone and internet survey. Participants Two samples: 1,017 SSA claimants, and 999 adults from the US general population. Interventions None. Main Outcome Measure Model fit statistics, correlation and reliability coefficients, Results IRT analyses resulted in five unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. Comparing the simulated CATs to the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared to those of a sample of US adults. Conclusions The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. PMID:23578594
Ni, Pengsheng; McDonough, Christine M; Jette, Alan M; Bogusz, Kara; Marfeo, Elizabeth E; Rasch, Elizabeth K; Brandt, Diane E; Meterko, Mark; Haley, Stephen M; Chan, Leighton
2013-09-01
To develop and test an instrument to assess physical function for Social Security Administration (SSA) disability programs, the SSA-Physical Function (SSA-PF) instrument. Item response theory (IRT) analyses were used to (1) create a calibrated item bank for each of the factors identified in prior factor analyses, (2) assess the fit of the items within each scale, (3) develop separate computer-adaptive testing (CAT) instruments for each scale, and (4) conduct initial psychometric testing. Cross-sectional data collection; IRT analyses; CAT simulation. Telephone and Internet survey. Two samples: SSA claimants (n=1017) and adults from the U.S. general population (n=999). None. Model fit statistics, correlation, and reliability coefficients. IRT analyses resulted in 5 unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. On comparing the simulated CATs with the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared with those of a sample of U.S. adults. The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Bennett, Randy Elliot; And Others
1990-01-01
The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Walter, Emily M.; Henderson, Charles R.; Beach, Andrea L.; Williams, Cody T.
2016-01-01
Researchers, administrators, and policy makers need valid and reliable information about teaching practices. The Postsecondary Instructional Practices Survey (PIPS) is designed to measure the instructional practices of postsecondary instructors from any discipline. The PIPS has 24 instructional practice statements and nine demographic questions. Users calculate PIPS scores by an intuitive proportion-based scoring convention. Factor analyses from 72 departments at four institutions (N = 891) support a 2- or 5-factor solution for the PIPS; both models include all 24 instructional practice items and have good model fit statistics. Factors in the 2-factor model include (a) instructor-centered practices, nine items; and (b) student-centered practices, 13 items. Factors in the 5-factor model include (a) student–student interactions, six items; (b) content delivery, four items; (c) formative assessment, five items; (d) student-content engagement, five items; and (e) summative assessment, four items. In this article, we describe our development and validation processes, provide scoring conventions and outputs for results, and describe wider applications of the instrument. PMID:27810868
Medvedev, Oleg N; Turner-Stokes, Lynne; Ashford, Stephen; Siegert, Richard J
2018-02-28
To determine whether the UK Functional Assessment Measure (UK FIM+FAM) fits the Rasch model in stroke patients with complex disability and, if so, to derive a conversion table of Rasch-transformed interval level scores. The sample included a UK multicentre cohort of 1,318 patients admitted for specialist rehabilitation following a stroke. Rasch analysis was conducted for the 30-item scale including 3 domains of items measuring physical, communication and psychosocial functions. The fit of items to the Rasch model was examined using 3 different analytical approaches referred to as "pathways". The best fit was achieved in the pathway where responses from motor, communication and psychosocial domains were summarized into 3 super-items and where some items were split because of differential item functioning (DIF) relative to left and right hemisphere location (χ2 (10) = 14.48, p = 0.15). Re-scoring of items showing disordered thresholds did not significantly improve the overall model fit. The UK FIM+FAM with domain super-items satisfies expectations of the unidimensional Rasch model without the need for re-scoring. A conversion table was produced to convert the total scale scores into interval-level data based on person estimates of the Rasch model. The clinical benefits of interval-transformed scores require further evaluation.
Balachandran, Jay S; Yu, Xiaohong; Wroblewski, Kristen; Mokhlesi, Babak
2013-03-15
CPAP adherence patterns are often established very early in the course of therapy. Our objective was to quantify patients' perception of CPAP therapy using a 6-item questionnaire administered in the morning following CPAP titration. We hypothesized that questionnaire responses would independently predict CPAP adherence during the first 30 days of therapy. We retrospectively reviewed the CPAP perception questionnaires of 403 CPAP-naïve adults who underwent in-laboratory titration and who had daily CPAP adherence data available for the first 30 days of therapy. Responses to the CPAP perception questionnaire were analyzed for their association with mean CPAP adherence and with changes in daily CPAP adherence over 30 days. Patients were aged 52 ± 14 years, 53% were women, 54% were African American, the mean body mass index (BMI) was 36.3 ± 9.1 kg/m(2), and most patients had moderate-severe OSA. Four of 6 items from the CPAP perception questionnaire- regarding difficulty tolerating CPAP, discomfort with CPAP pressure, likelihood of wearing CPAP, and perceived health benefit-were significantly correlated with mean 30-day CPAP adherence, and a composite score from these 4 questions was found to be internally consistent. Stepwise linear regression modeling demonstrated that 3 variables were significant and independent predictors of reduced mean CPAP adherence: worse score on the 4-item questionnaire, African American race, and non-sleep specialist ordering polysomnogram and CPAP therapy. Furthermore, a worse score on the 4-item CPAP perception questionnaire was consistently associated with decreased mean daily CPAP adherence over the first 30 days of therapy. In this pilot study, responses to a 4-item CPAP perception questionnaire administered to patients immediately following CPAP titration independently predicted mean CPAP adherence during the first 30 days. Further prospective validation of this questionnaire in different patient populations is warranted.
ERIC Educational Resources Information Center
Carlson, Mike; Wilcox, Rand; Chou, Chih-Ping; Chang, Megan; Yang, Frances; Blanchard, Jeanine; Marterella, Abbey; Kuo, Ann; Clark, Florence
2011-01-01
Reverse-scored items on assessment scales increase cognitive processing demands and may therefore lead to measurement problems for older adult respondents. In this study, the objective was to examine possible psychometric inadequacies of reverse-scored items on the Center for Epidemiologic Studies Depression Scale (CES-D) when used to assess…
Goetz, Christopher G; Liu, Yuanyuan; Stebbins, Glenn T; Wang, Lu; Tilley, Barbara C; Teresi, Jeanne A; Merkitch, Douglas; Luo, Sheng
2016-12-01
Assess MDS-UPDRS items for gender-, age-, and race/ethnicity-based differential item functioning. Assessing differential item functioning is a core rating scale validation step. For the MDS-UPDRS, differential item functioning occurs if item-score probability among people with similar levels of parkinsonism differ according to selected covariates (gender, age, race/ethnicity). If the magnitude of differential item functioning is clinically relevant, item-score interpretation must consider influences by these covariates. Differential item functioning can be nonuniform (covariate variably influences an item-score across different levels of parkinsonism) or uniform (covariate influences an item-score consistently over all levels of parkinsonism). Using the MDS-UPDRS translation database of more than 5,000 PD patients from 14 languages, we tested gender-, age-, and race/ethnicity-based differential item functioning. To designate an item as having clinically relevant differential item functioning, we required statistical confirmation by 2 independent methods, along with a McFadden pseudo-R 2 magnitude statistic greater than "negligible." Most items showed no gender-, age- or race/ethnicity-based differential item functioning. When differential item functioning was identified, the magnitude statistic was always in the "negligible" range, and the scale-level impact was minimal. The absence of clinically relevant differential item functioning across all items and all parts of the MDS-UPDRS is strong evidence that the scale can be used confidently. As studies of Parkinson's disease increasingly involve multinational efforts and the MDS-UPDRS has several validated non-English translations, the findings support the scale's broad applicability in populations with varying gender, age, and race/ethnicity distributions. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
Tylka, Tracy L; Kroon Van Diest, Ashley M
2013-01-01
The 21-item Intuitive Eating Scale (IES; Tylka, 2006) measures individuals' tendency to follow their physical hunger and satiety cues when determining when, what, and how much to eat. While its scores have demonstrated reliability and validity with college women, the IES-2 was developed to improve upon the original version. Specifically, we added 17 positively scored items to the original IES items (which were predominantly negatively scored), integrated an additional component of intuitive eating (Body-Food Choice Congruence), and evaluated its psychometric properties with 1,405 women and 1,195 men across three studies. After we deleted 15 items (due to low item-factor loadings, high cross-loadings, and redundant content), the results supported the psychometric properties of the IES-2 with women and men. The final 23-item IES-2 contained 11 original items and 12 added items. Exploratory and second-order confirmatory factor analyses upheld its hypothesized 4-factor structure (its original 3 factors, plus Body-Food Choice Congruence) and a higher order factor. The IES-2 was largely invariant across sex, although negligible differences on 1 factor loading and 2 item intercepts were detected. Demonstrating validity, the IES-2 total scores and most IES-2 subscale scores were (a) positively related to body appreciation, self-esteem, and satisfaction with life; (b) inversely related to eating disorder symptomatology, poor interoceptive awareness, body surveillance, body shame, body mass index, and internalization of media appearance ideals; and (c) negligibly related to social desirability. IES-2 scores also garnered incremental validity by predicting psychological well-being above and beyond eating disorder symptomatology. The IES-2's applications for empirical research and clinical work are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Hellemann, G S; Green, M F; Kern, R S; Sitarenios, G; Nuechterlein, K H
2017-10-01
Measures of social cognition are increasingly being applied to psychopathology, including studies of schizophrenia and other psychotic disorders. Tests of social cognition present unique challenges for international adaptations. The Mayer-Salovey-Caruso Emotional Intelligence Test, Managing Emotions Branch (MSCEIT-ME) is a commonly-used social cognition test that involves the evaluation of social scenarios presented in vignettes. This paper presents evaluations of translations of this test in six different languages based on representative samples from the relevant countries. The goal was to identify items from the MSCEIT-ME that show different response patterns across countries using indices of discrepancy and content validity criteria. An international version of the MSCEIT-ME scoring was developed that excludes items that showed undesirable properties across countries. We then confirmed that this new version had better performance (i.e. less discrepancy across regions) in international samples than the version based on the original norms. Additionally, it provides scores that are comparable to ratings based on local norms. This paper shows that it is possible to adapt complex social cognitive tasks so they can provide valid data across different cultural contexts.
Yang, Lei; Chen, Shouming; Yang, Di; Li, Jiajin; Wu, Taixiang; Zuo, Yunxia
2018-05-15
To learn about the overall quality of clinical anaesthesia study protocols from the Chinese Clinical Trials Registry and to discuss the way to improve study protocol quality. We defined completeness of each sub-item in SPIRIT as N/A (not applicable) or with a score of 0, 1, or 2. For each protocol, we calculated the proportion of adequately reported items (score = 2 and N/A) and unreported items (score = 0). Protocol quality was determined according to the proportion of reported items, with values >50% indicating high quality. Protocol quality was determined according to the proportion of reported items. For each sub-item in SPIRIT, we calculated the adequately reported rate (percentage of all protocols with score 2 and NA on one sub-item) as well as the unreported rate (percentage of all protocols with score 0 on one sub-item). Total 126 study protocols were available for assessment. Among these, 88.1% were assessed as being of low quality. By comparison, the percentage of low-quality protocols was 88.9% after the publication of the SPIRIT statement. Among the 51 SPIRIT sub-items, 18 sub-items had an unreported rate above 90% while 16 had a higher adequately reported rate than an unreported rate. The overall quality of clinical anaesthesia study protocols registered in the ChiCTR was poor. A mandatory protocol upload and self-check based on the SPIRIT statement during the trial registration process may improve protocol quality in the future.
ERIC Educational Resources Information Center
Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam
2010-01-01
This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…
Schalet, Benjamin D; Kallen, Michael A; Heinemann, Allen W; Deutsch, Anne; Cook, Karon F; Foster, Linda; Cella, David
2018-05-24
To evaluate the Patient-Reported Outcomes Measurement Information System (PROMIS) pain interference items for use in a quality measure and to compare the resulting quality score, along with internal reliability and validity, to a similar item set in the Minimum Data Set Version 3.0 (MDS). Cross-sectional, observational study. One freestanding inpatient rehabilitation facility (IRF) and one large hospital-based IRF. Patients with neurologic disorders. Of 1055 consecutive admissions, 26% were excluded based on clinician-determined cognitive impairment or emotional distress. Of the remainder, 50% consented and completed the survey near the end of their IRF stay (N = 391). Of these, more than half (57%) reported pain over the last day (n = 224). Psychometric statistics and quality scores were computed from a 55-question survey, including the MDS and PROMIS pain interference items. Estimates for internal reliability were higher for the PROMIS 2-item scale compared to the MDS: Cronbach α (0.86 vs 0.48) and interitem correlations (0.75 vs 0.31). The PROMIS-2 items were better able to detect differences in patients with mild and severe pain intensity (Cohen d = 1.57) relative to the corresponding MDS items (Cohen d = 0.81). Two quality scores based on the PROMIS-2 items, reflecting low and high levels of pain interference, showed 46% or 12% of patients meeting these thresholds. This compared to a 30% rate when patients were classified by the MDS as experiencing pain interference. PROMIS pain interference items appear to be more internally consistent than similar MDS items. The graded PROMIS items permit the creation of multiple quality scores, showing predictable overlap with corresponding MDS quality scores. Because PROMIS items provide finer distinctions, they allow greater latitude in reporting quality scores. We recommend further study of pain interference scores across IRFs to improve their reliability and validity. Copyright © 2018 AMDA – The Society for Post-Acute and Long-Term Care Medicine. Published by Elsevier Inc. All rights reserved.
Item response analysis of the Positive and Negative Syndrome Scale
Santor, Darcy A; Ascher-Svanum, Haya; Lindenmayer, Jean-Pierre; Obenchain, Robert L
2007-01-01
Background Statistical models based on item response theory were used to examine (a) the performance of individual Positive and Negative Syndrome Scale (PANSS) items and their options, (b) the effectiveness of various subscales to discriminate among individual differences in symptom severity, and (c) the appropriateness of cutoff scores recently recommended by Andreasen and her colleagues (2005) to establish symptom remission. Methods Option characteristic curves were estimated using a nonparametric item response model to examine the probability of endorsing each of 7 options within each of 30 PANSS items as a function of standardized, overall symptom severity. Our data were baseline PANSS scores from 9205 patients with schizophrenia or schizoaffective disorder who were enrolled between 1995 and 2003 in either a large, naturalistic, observational study or else in 1 of 12 randomized, double-blind, clinical trials comparing olanzapine to other antipsychotic drugs. Results Our analyses show that the majority of items forming the Positive and Negative subscales of the PANSS perform very well. We also identified key areas for improvement or revision in items and options within the General Psychopathology subscale. The Positive and Negative subscale scores are not only more discriminating of individual differences in symptom severity than the General Psychopathology subscale score, but are also more efficient on average than the 30-item total score. Of the 8 items recently recommended to establish symptom remission, 1 performed markedly different from the 7 others and should either be deleted or rescored requiring that patients achieve a lower score of 2 (rather than 3) to signal remission. Conclusion This first item response analysis of the PANSS supports its sound psychometric properties; most PANSS items were either very good or good at assessing overall severity of illness. These analyses did identify some items which might be further improved for measuring individual severity differences or for defining remission thresholds. Findings also suggest that the Positive and Negative subscales are more sensitive to change than the PANSS total score and, thus, may constitute a "mini PANSS" that may be more reliable, require shorter administration and training time, and possibly reduce sample sizes needed for future research. PMID:18005449
Parenting Styles and Home Obesogenic Environments
Johnson, Rachel; Welk, Greg; Saint-Maurice, Pedro F.; Ihmels, Michelle
2012-01-01
Parenting behaviors are known to have a major impact on childhood obesity but it has proven difficult to isolate the specific mechanism of influence. The present study uses Baumrind’s parenting typologies (authoritative, authoritarian, and permissive) to examine associations between parenting styles and parenting practices associated with childhood obesity. Data were collected from a diverse sample of children (n = 182, ages 7–10) in an urban school district in the United States. Parenting behaviors were assessed with the Parenting Styles and Dimension Questionnaire (PSDQ), a 58-item survey that categorizes parenting practices into three styles: authoritative, authoritarian, and permissive. Parent perceptions of the home obesogenic environment were assessed with the Family Nutrition and Physical Activity (FNPA) instrument, a simple 10 item instrument that has been shown in previous research to predict risk for overweight. Cluster analyses were used to identify patterns in the PSDQ data and these clusters were related to FNPA scores and measured BMI values in children (using ANCOVA analyses that controlled for parent income and education) to examine the impact of parenting styles on risk of overweight/obesity. The FNPA score was positively (and significantly) associated with scores on the authoritative parenting scale (r = 0.29) but negatively (and significantly) associated with scores on the authoritarian scale (r = −0.22) and permissive scale (r = −0.20). Permissive parenting was significantly associated with BMIz score but this is the only dimension that exhibited a relationship with BMI. A three-cluster solution explained 40.5% of the total variance and clusters were distinguishable by low and high z-scores on different PSDQ sub-dimensions. A cluster characterized as Permissive/Authoritarian (Cluster 2) had significantly lower FNPA scores (more obesogenic) than clusters characterized as Authoritative (Cluster 1) or Authoritarian/Authoritative (Cluster 3) after controlling for family income and parent education. No direct effects of cluster were evident on the BMI outcomes but the patterns were consistent with the FNPA outcomes. The results suggest that a permissive parenting style is associated with more obesogenic environments while an authoritative parenting style is associated with less obesogenic environments. PMID:22690202
Parenting styles and home obesogenic environments.
Johnson, Rachel; Welk, Greg; Saint-Maurice, Pedro F; Ihmels, Michelle
2012-04-01
Parenting behaviors are known to have a major impact on childhood obesity but it has proven difficult to isolate the specific mechanism of influence. The present study uses Baumrind's parenting typologies (authoritative, authoritarian, and permissive) to examine associations between parenting styles and parenting practices associated with childhood obesity. Data were collected from a diverse sample of children (n = 182, ages 7-10) in an urban school district in the United States. Parenting behaviors were assessed with the Parenting Styles and Dimension Questionnaire (PSDQ), a 58-item survey that categorizes parenting practices into three styles: authoritative, authoritarian, and permissive. Parent perceptions of the home obesogenic environment were assessed with the Family Nutrition and Physical Activity (FNPA) instrument, a simple 10 item instrument that has been shown in previous research to predict risk for overweight. Cluster analyses were used to identify patterns in the PSDQ data and these clusters were related to FNPA scores and measured BMI values in children (using ANCOVA analyses that controlled for parent income and education) to examine the impact of parenting styles on risk of overweight/obesity. The FNPA score was positively (and significantly) associated with scores on the authoritative parenting scale (r = 0.29) but negatively (and significantly) associated with scores on the authoritarian scale (r = -0.22) and permissive scale (r = -0.20). Permissive parenting was significantly associated with BMIz score but this is the only dimension that exhibited a relationship with BMI. A three-cluster solution explained 40.5% of the total variance and clusters were distinguishable by low and high z-scores on different PSDQ sub-dimensions. A cluster characterized as Permissive/Authoritarian (Cluster 2) had significantly lower FNPA scores (more obesogenic) than clusters characterized as Authoritative (Cluster 1) or Authoritarian/Authoritative (Cluster 3) after controlling for family income and parent education. No direct effects of cluster were evident on the BMI outcomes but the patterns were consistent with the FNPA outcomes. The results suggest that a permissive parenting style is associated with more obesogenic environments while an authoritative parenting style is associated with less obesogenic environments.
Anderson, Ariana E; Reise, Steven P; Marder, Stephen R; Mansolf, Maxwell; Han, Carol; Bilder, Robert M
2017-12-01
Objective: Total scale scores derived by summing ratings from the 30-item PANSS are commonly used in clinical trial research to measure overall symptom severity, and percentage reductions in the total scores are sometimes used to document the efficacy of treatment. Acknowledging that some patients may have substantial changes in PANSS total scores but still be sufficiently symptomatic to warrant diagnosis, ratings on a subset of 8 items, referred to here as the "Remission set," are sometimes used to determine if patients' symptoms no longer satisfy diagnostic criteria. An unanswered question remains: is the goal of treatment better conceptualized as reduction in overall symptom severity, or reduction in symptoms below the threshold for diagnosis? We evaluated the psychometric properties of PANSS total scores, to assess whether having low symptom severity post-treatment is equivalent to attaining Remission. Design: We applied a bifactor item response theory (IRT) model to post-treatment PANSS ratings of 3,647 subjects diagnosed with schizophrenia assessed at the termination of 11 clinical trials. The bifactor model specified one general dimension to reflect overall symptom severity, and five domain-specific dimensions. We assessed how PANSS item discrimination and information parameters varied across the range of overall symptom severity (θ), with a special focus on low levels of symptoms (i.e., θ<-1), which we refer to as "Relief" from symptoms. A score of θ=-1 corresponds to an expected PANSS item score of 1.83, a rating between "Absent" and "Minimal" for a PANSS symptom. Results: The application of the bifactor IRT model revealed: (1) 88% of total score variation was attributable to variation in general symptom severity, and only 8% reflected secondary domain factors. This implies that a general factor may provide a good indicator of symptom severity, and that interpretation is not overly complicated by multidimensionality; (2) Post-treatment, 534 individuals (about 15% of the whole sample) scored in the "Relief" range of general symptom severity, but more than twice that number (n = 1351) satisfied Remission criteria (37%). 2 in 3 Remitted patients had scores that were not in a low symptom range (corresponding to Absent or Minimal item scores); (3) PANSS items vary greatly in their ability to measure the general symptom severity dimension; while many items are highly discriminating and relatively "pure" indicators of general symptom severity (delusions, conceptual disorganization), others are better indicators of specific dimensions (blunted affect, depression). The utility of a given PANSS item for assessing a patient depended on the illness level of the patient. Conclusion: Satisfying conventional Remission criteria was not strongly associated with low levels of symptoms. The items providing the most information for patients in the symptom Relief range were Delusions, Preoccupation, Suspiciousness Persecution, Unusual Thought Content, Conceptual Disorganization, Stereotyped Thinking, Active Social Avoidance, and Lack of Judgment and Insight. Lower scores on these items (item scores ≤2) were strongly associated with having a low latent trait θ or experiencing overall symptom relief. The inter-rater agreement between Remission and Relief subjects suggested that these criteria identified different subsets of patients. Alternative subsets of items may offer better indicators of general symptom severity and provide better discrimination (and lower standard errors) for scaling individuals and judging symptom relief, where the "best" subset of items ultimately depends on the illness range and treatment phase being evaluated.
Construct Validation of the FMS: Relationship between a Jump-Landing Task and FMS Items.
Kraus, Kornelius; Schütz, Elisabeth; Doyscher, Ralf
2017-08-29
Sports injuries and athletic performance are complex areas, which are characterized by manifold interdependencies. The landing error scoring system (LESS) is a valid screening tool to examine bilateral jump-landing mechanics. Whereas, the Functional Movement Screen (FMS) items are thought to operationalize flexibility and motor behaviour during low intense bodyweight patterns. The aim of the study was to explore possible interdependency of the diagnostic information of these screening tools. 53 athletes (age 23.3±2.1 yrs.) were tested in a sport scientific lab. In detail, 31 professional soccer players (3 Division) and 22 collegiate athletes were studied. Linear, partial correlational and cluster analysis were performed to examine possible trends. Generally, the sportsmen achieved a LESS score of 6.6±2 and a jumping height of 37±7.8cm. Partial correlational analysis indicates that trunk control (r=0.4; p<0.01) is moderately related to landing mechanics, which in turn was negatively related on LESS height (r=-0.67, p<0.01). In addition, clustering showed by trend, that a higher active straight leg raise (ASLR) score is related to better landing mechanics (ASLR score 1: LESS 6.9±1.8; n=15 vs. ASLR score 3: LESS 5.6±2.1; n=10). On the task-specific level, jump-landing mechanics were directly related to jumping performance in this cohort with poor mechanics. On unspecific analysis level, kinetic chain length (ASLR) and trunk control has been identified as potential moderator variables for landing mechanics, indicating that these parameter can limit landing mechanics and ought to be optimized within the individual´s context. A potential cognitive strategy shift from internal (FMS) to external focus (LESS) as well as different muscle recruitment patterns are potential explanations for the non-significant linear relationship between the FMS and LESS data.
Miller, Leonie M; Roodenrys, Steven
2012-11-01
The frequency effect in short-term serial recall is influenced by the composition of lists. In pure lists, a robust advantage in the recall of high-frequency (HF) words is observed, yet in alternating mixed lists, HF and low-frequency (LF) words are recalled equally well. It has been argued that the preexisting associations between all list items determine a single, global level of supportive activation that assists item recall. Preexisting associations between items are assumed to be a function of language co-occurrence; HF-HF associations are high, LF-LF associations are low, and mixed associations are intermediate in activation strength. This account, however, is based on results when alternating lists with equal numbers of HF and LF words were used. It is possible that directional association between adjacent list items is responsible for the recall patterns reported. In the present experiment, the recall of three forms of mixed lists-those with equal numbers of HF and LF items and pure lists-was examined to test the extent to which item-to-item associations are present in serial recall. Furthermore, conditional probabilities were used to examine more closely the evidence for a contribution, since correct-in-position scoring may mask recall that is dependent on the recall of prior items. The results suggest that an item-to-item effect is clearly present for early but not late list items, and they implicate an additional factor, perhaps the availability of resources at output, in the recall of late list items.
The Relationship of Major American Dietary Patterns to Age-related Macular Degeneration
Chiu, Chung-Jung; Chang, Min-Lee; Zhang, Fang Fang; Li, Tricia; Gensler, Gary; Schleicher, Molly; Taylor, Allen
2014-01-01
PURPOSE We hypothesized that major American dietary patterns are associated with age-related macular degeneration (AMD) risk. DESIGN Cross-sectional study METHODS 8,103 eyes from 4,088 eligible participants in the baseline Age-Related Eye Disease Study (AREDS) were classified into control (n=2,739), early AMD (n=4,599), and advanced AMD (n=765) by AREDS AMD Classification System. Food consumption data were collected by a 90-item food frequency questionnaire. RESULTS Two major dietary patterns were identified by factor (principle component) analysis based on 37 food groups and named Oriental and Western patterns. The Oriental pattern was characterized by higher intake of vegetables, legumes, fruit, whole grains, tomatoes, and seafood. The Western pattern was characterized by higher intake of red meat, processed meat, high-fat dairy products, French fries, refined grains, and eggs. We ranked our participants according to how closely their diets line up with the two patterns by calculating the two factor scores for each participant. For early AMD, multivariate-adjusted odds ratio (OR) from generalized estimating equation logistic analysis comparing the highest to lowest quintile of the Oriental pattern score was ORE5O=0.74 (95% confidence interval (CI): 0.59–0.91; Ptrend=0.01), and the OR comparing the highest to lowest quintile of the Western pattern score was ORE5W=1.56 (1.18–2.06; Ptrend=0.01). For advanced AMD, the ORA5O was 0.38 (0.27–0.54; Ptrend<0.0001), and the ORA5W was 3.70 (2.31–5.92; Ptrend<0.0001). CONCLUSIONS Our data indicate that overall diet is significantly associated with the odds of AMD and that dietary management as an AMD prevention strategy warrants further study. PMID:24792100
The relationship of major American dietary patterns to age-related macular degeneration.
Chiu, Chung-Jung; Chang, Min-Lee; Zhang, Fang Fang; Li, Tricia; Gensler, Gary; Schleicher, Molly; Taylor, Allen
2014-07-01
We hypothesized that major American dietary patterns are associated with risk for age-related macular degeneration (AMD). Cross-sectional study. We classified 8103 eyes in 4088 eligible participants in the baseline Age-Related Eye Disease Study (AREDS). They were classified into control (n = 2739), early AMD (n = 4599), and advanced AMD (n = 765) by the AREDS AMD Classification System. Food consumption data were collected by using a 90-item food frequency questionnaire. Two major dietary patterns were identified by factor (principal component) analysis based on 37 food groups and named Oriental and Western patterns. The Oriental pattern was characterized by higher intake of vegetables, legumes, fruit, whole grains, tomatoes, and seafood. The Western pattern was characterized by higher intake of red meat, processed meat, high-fat dairy products, French fries, refined grains, and eggs. We ranked our participants according to how closely their diets line up with the 2 patterns by calculating the 2 factor scores for each participant. For early AMD, multivariate-adjusted odds ratio (OR) from generalized estimating equation logistic analysis comparing the highest to lowest quintile of the Oriental pattern score was ORE5O = 0.74 (95% confidence interval (CI): 0.59-0.91; Ptrend =0.01), and the OR comparing the highest to lowest quintile of the Western pattern score was ORE5W = 1.56 (1.18-2.06; Ptrend = 0.01). For advanced AMD, the ORA5O was 0.38 (0.27-0.54; Ptrend < 0.0001), and the ORA5W was 3.70 (2.31-5.92; Ptrend < 0.0001). Our data indicate that overall diet is significantly associated with the odds of AMD and that dietary management as an AMD prevention strategy warrants further study. Copyright © 2014 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill
2014-01-01
The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Petrillo, Jennifer; Bressler, Neil M; Lamoureux, Ecosse; Ferreira, Alberto; Cano, Stefan
2017-08-14
The NEI VFQ-25 has undergone psychometric evaluation in patients with varying ocular conditions and the general population. However, important limitations which may affect the interpretation of clinical trial results have been previously identified, such as concerns with reliability and validity. The purpose of this study was to evaluate the National Eye Institute Visual Functioning Questionnaire (NEI VFQ-25) and make recommendations for a revised scoring structure, with a view to improving its psychometric performance and interpretability. Rasch Measurement Theory analyses were conducted in two stages using pooled baseline NEI VFQ-25 data for 2487 participants with retinal diseases enrolled in six clinical trials. In stage 1, we examined: scale-to-sample targeting; thresholds for item response options; item fit statistics; stability; local dependence; and reliability. In stage 2, a post-hoc revision of the scoring structure (VFQ-28R) was created and psychometrically re-evaluated. In stage 1, we found that the NEI VFQ-25 was mis-targeted to the sample, and had disordered response thresholds (15/25 items) and mis-fitting items (8/25 items). However, items appeared to be stable (differential item functioning for three items), have minimal item dependency (one pair of items) and good reliability (person-separation index, 0.93). In stage 2, the modified Rasch-scored NEI VFQ-28-R was assessed. It comprised two broad domains: Activity Limitation (19 items) and Socio-Emotional Functioning (nine items). The NEI VFQ-28-R demonstrated improved performance with fewer disordered response thresholds (no items), less item misfit (three items) and improved population targeting (reduced ceiling effect) compared with the NEI VFQ-25. Compared with the original version, the proposed NEI VFQ-28-R, with Rasch-based scoring and a two-domain structure, appears to offer improved psychometric performance and interpretability of the vision-related quality of life scale for the population analysed.
Perception of early parenting in panic and agoraphobia.
Faravelli, C; Panichi, C; Pallanti, S; Paterniti, S; Grecu, L M; Rivelli, S
1991-07-01
Thirty-two patients with a DSM-III-R diagnosis of panic disorder (PD) were administered the Parental Bonding Instrument (PBI), a 25-item self-report questionnaire devised to evaluate parental rearing practices. Compared with 32 matched healthy controls, PD patients scored both their parents as being significantly less caring and more overprotective. Moreover, the consistency of parental attitudes between the 2 parents was significantly lower, indicating lesser uniformity in the rearing patterns.
Ho, Andrew D; Yu, Carol C
2015-06-01
Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological practice. In this article, the authors extend these previous analyses to state-level educational test score distributions that are an increasingly common target of high-stakes analysis and interpretation. Among 504 scale-score and raw-score distributions from state testing programs from recent years, nonnormal distributions are common and are often associated with particular state programs. The authors explain how scaling procedures from item response theory lead to nonnormal distributions as well as unusual patterns of discreteness. The authors recommend that distributional descriptive statistics be calculated routinely to inform model selection for large-scale test score data, and they illustrate consequences of nonnormality using sensitivity studies that compare baseline results to those from normalized score scales.
Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.
Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J
2018-02-01
Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.
ERIC Educational Resources Information Center
Lee, Guemin; Park, In-Yong
2012-01-01
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
ERIC Educational Resources Information Center
Kerr, Deirdre; Mousavi, Hamid; Iseli, Markus R.
2013-01-01
The Common Core assessments emphasize short essay constructed-response items over multiple-choice items because they are more precise measures of understanding. However, such items are too costly and time consuming to be used in national assessments unless a way to score them automatically can be found. Current automatic essay-scoring techniques…
Pattern of food intolerance in patients with gastro-esophageal reflux symptoms.
Caselli, Michele; Lo Cascio, Natalina; Rabitti, Stefano; Eusebi, Leonardo H; Zeni, Elena; Soavi, Cecilia; Cassol, Francesca; Zuliani, Giovanni; Zagari, Rocco M
2017-12-01
Many food items have been involved in gastro-esophageal reflux disease pathogenesis and dietary modification has been proposed as first-line treatment. Test-based exclusion diets have shown to significantly reduce reflux symptoms. We aimed to assess the patterns of food intolerance in a series of patients with typical gastro-esophageal reflux symptoms (GERS). We retrospectively evaluated all patients with typical reflux symptoms, attending the Centre Study Association on Food Intolerance and Nutrition of Ferrara from January 2010 to October 2015, who resulted positive to at least one food item at the Leucocytotoxic Test. The presence and severity of typical GERS (heartburn and/or acid regurgitation) were assessed using the Gastro-esophageal Reflux Disease Impact Scale (GIS) questionnaire. Only individuals with a GIS Score of at least 5 points were included. Almost all patients (91.1%) were intolerant to at least 5 food items. The most frequent food intolerance (more than 33% of patients) were found for milk (55.4%), lettuce (46.4%), coffee (43.7%), brewer's yeast (42.9%), pork (42.9%), tuna (37.5%), rice (35.7%), sole (34.8%), asparagus (34.8%) and eggs (33.9%). Nine different clusters of food intolerance were detected. Patients with typical gastro-esophageal reflux symptoms seem to have intolerance to multiple food items, some of which (lettuce, brewer's yeast, tuna, rice, sole and asparagus) have not yet been associated to gastro-esophageal reflux disease.
Defining Malaysian Knowledge Society: Results from the Delphi Technique
NASA Astrophysics Data System (ADS)
Hamid, Norsiah Abdul; Zaman, Halimah Badioze
This paper outlines the findings of research where the central idea is to define the term Knowledge Society (KS) in Malaysian context. The research focuses on three important dimensions, namely knowledge, ICT and human capital. This study adopts a modified Delphi technique to seek the important dimensions that can contribute to the development of Malaysian's KS. The Delphi technique involved ten experts in a five-round iterative and controlled feedback procedure to obtain consensus on the important dimensions and to verify the proposed definition of KS. The finding shows that all three dimensions proposed initially scored high and moderate consensus. Round One (R1) proposed an initial definition of KS and required comments and inputs from the panel. These inputs were then used to develop items for a R2 questionnaire. In R2, 56 out of 73 items scored high consensus and in R3, 63 out of 90 items scored high. R4 was conducted to re-rate the new items, in which 8 out of 17 items scored high. Other items scored moderate consensus and no item scored low or no consensus in all rounds. The final round (R5) was employed to verify the final definition of KS. Findings and discovery of this study are significant to the definition of KS and the development of a framework in the Malaysian context.
The juvenile arthritis foot disability index: development and evaluation of measurement properties.
André, Marie; Hagelberg, Stefan; Stenström, Christina H
2004-12-01
To develop a new juvenile arthritis foot disability index (JAFI) and to test it for validity and reliability. Samples of 14 children/adolescents and 30 children/adolescents with juvenile idiopathic arthritis (JIA) and 29 healthy children/adolescents participated. We used a questionnaire derived from the International Classification of Functioning, Disability and Health that included 27 statements divided into the dimensions Impairment, Activity Limitation, and Participation Restriction. Comments on the contents were invited from parents and adolescents. Convergent and divergent construct validity was examined by comparing the 3 JAFI dimensions to joint impairment scores, the Childhood Health Assessment Questionnaire (CHAQ), and self-rated, foot-related participation restriction. Known groups construct validity was assessed by comparing answers from children with JIA to those from healthy children. Test-retest stability was investigated over one week. One item was added after suggestions from 2 participants. A consistent pattern of increasing JAFI scores was found with increasing joint impairment scores, CHAQ scores, and self-rated foot-related participation restriction. Foot-related disability as assessed by JAFI was more pronounced in children with JIA than in healthy controls. One statement showing a floor effect was excluded. No internal redundancy (rs > 0.90) between items was found, and internal consistency within each subscale was satisfactory (rs > 0.50) for all items but one. No systematic differences were found between test and retest, and weighted kappa coefficients for the 3 JAFI dimensions were 0.90, 0.85, and 0.88. The JAFI appears to be valid and reliable for assessing foot-related disability among children/adolescents with JIA. Its sensitivity to change remains to be investigated.
Validation of Automated Scoring of Science Assessments
ERIC Educational Resources Information Center
Liu, Ou Lydia; Rios, Joseph A.; Heilman, Michael; Gerard, Libby; Linn, Marcia C.
2016-01-01
Constructed response items can both measure the coherence of student ideas and serve as reflective experiences to strengthen instruction. We report on new automated scoring technologies that can reduce the cost and complexity of scoring constructed-response items. This study explored the accuracy of c-rater-ML, an automated scoring engine…
Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C
2015-12-01
The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
ERIC Educational Resources Information Center
Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie
2013-01-01
Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…
Reise, Steven P.; Marder, Stephen R.; Mansolf, Maxwell; Han, Carol; Bilder, Robert M.
2017-01-01
Objective: Total scale scores derived by summing ratings from the 30-item PANSS are commonly used in clinical trial research to measure overall symptom severity, and percentage reductions in the total scores are sometimes used to document the efficacy of treatment. Acknowledging that some patients may have substantial changes in PANSS total scores but still be sufficiently symptomatic to warrant diagnosis, ratings on a subset of 8 items, referred to here as the “Remission set,” are sometimes used to determine if patients’ symptoms no longer satisfy diagnostic criteria. An unanswered question remains: is the goal of treatment better conceptualized as reduction in overall symptom severity, or reduction in symptoms below the threshold for diagnosis? We evaluated the psychometric properties of PANSS total scores, to assess whether having low symptom severity post-treatment is equivalent to attaining Remission. Design: We applied a bifactor item response theory (IRT) model to post-treatment PANSS ratings of 3,647 subjects diagnosed with schizophrenia assessed at the termination of 11 clinical trials. The bifactor model specified one general dimension to reflect overall symptom severity, and five domain-specific dimensions. We assessed how PANSS item discrimination and information parameters varied across the range of overall symptom severity (θ), with a special focus on low levels of symptoms (i.e., θ<-1), which we refer to as “Relief” from symptoms. A score of θ=-1 corresponds to an expected PANSS item score of 1.83, a rating between “Absent” and “Minimal” for a PANSS symptom. Results: The application of the bifactor IRT model revealed: (1) 88% of total score variation was attributable to variation in general symptom severity, and only 8% reflected secondary domain factors. This implies that a general factor may provide a good indicator of symptom severity, and that interpretation is not overly complicated by multidimensionality; (2) Post-treatment, 534 individuals (about 15% of the whole sample) scored in the “Relief” range of general symptom severity, but more than twice that number (n = 1351) satisfied Remission criteria (37%). 2 in 3 Remitted patients had scores that were not in a low symptom range (corresponding to Absent or Minimal item scores); (3) PANSS items vary greatly in their ability to measure the general symptom severity dimension; while many items are highly discriminating and relatively “pure” indicators of general symptom severity (delusions, conceptual disorganization), others are better indicators of specific dimensions (blunted affect, depression). The utility of a given PANSS item for assessing a patient depended on the illness level of the patient. Conclusion: Satisfying conventional Remission criteria was not strongly associated with low levels of symptoms. The items providing the most information for patients in the symptom Relief range were Delusions, Preoccupation, Suspiciousness Persecution, Unusual Thought Content, Conceptual Disorganization, Stereotyped Thinking, Active Social Avoidance, and Lack of Judgment and Insight. Lower scores on these items (item scores ≤2) were strongly associated with having a low latent trait θ or experiencing overall symptom relief. The inter-rater agreement between Remission and Relief subjects suggested that these criteria identified different subsets of patients. Alternative subsets of items may offer better indicators of general symptom severity and provide better discrimination (and lower standard errors) for scaling individuals and judging symptom relief, where the “best” subset of items ultimately depends on the illness range and treatment phase being evaluated. PMID:29410936
Tian, Feng; Ni, Pengsheng; Mulcahey, M J; Hambleton, Ronald K; Tulsky, David; Haley, Stephen M; Jette, Alan M
2014-11-01
To use item response theory (IRT) methods to link scores from 2 recently developed contemporary functional outcome measures, the adult Spinal Cord Injury-Functional Index (SCI-FI) and the Pedi SCI (both the parent version and the child version). Secondary data analysis of the physical functioning items of the adult SCI-FI and the Pedi SCI instruments. We used a nonequivalent group design with items common to both instruments and the Stocking-Lord method for the linking. Linking was conducted so that the adult SCI-FI and Pedi SCI scaled scores could be compared. Community. This study included a total sample of 1558 participants. Pedi SCI items were administered to a sample of children (n=381) with SCI aged 8 to 21 years, and of parents/caregivers (n=322) of children with SCI aged 4 to 21 years. Adult SCI-FI items were administered to a sample of adults (n=855) with SCI aged 18 to 92 years. Not applicable. Five scales common to both instruments were included in the analysis: Wheelchair, Daily Routine/Self-care, Daily Routine/Fine Motor, Ambulation, and General Mobility functioning. Confirmatory factor analysis and exploratory factor analysis results indicated that the 5 scales are unidimensional. A graded response model was used to calibrate the items. Misfitting items were identified and removed from the item banks. Items that function differently between the adult and child samples (ie, exhibit differential item functioning) were identified and removed from the common items used for linking. Domain scores from the Pedi SCI instruments were transformed onto the adult SCI-FI metric. This IRT linking allowed estimation of adult SCI-FI scale scores based on Pedi SCI scale scores and vice versa; therefore, it provides clinicians with a means of tracking long-term functional data for children with an SCI across their entire lifespan. Copyright © 2014 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Delgado-Herrera, Leticia; Lasch, Kathryn; Zeiher, Bernhardt; Lembo, Anthony J.; Drossman, Douglas A.; Banderas, Benjamin; Rosa, Kathleen; Lademacher, Christopher; Arbuckle, Rob
2017-01-01
Background: To evaluate the psychometric properties of the newly developed seven-item Irritable Bowel Syndrome – Diarrhea predominant (IBS-D) Daily Symptom Diary and four-item Event Log using phase II clinical trial safety and efficacy data in patients with IBS-D. This instrument measures diarrhea (stool frequency and stool consistency), abdominal pain related to IBS-D (stomach pain, abdominal pain, abdominal cramps), immediate need to have a bowel movement (immediate need and accident occurrence), bloating, pressure, gas, and incomplete evacuation. Methods: Psychometric properties and responsiveness of the instrument were evaluated in a clinical trial population [ClinicalTrials.gov identifier: NCT01494233]. Results: A total of 434 patients were included in the analyses. Significant differences were found among severity groups (p < 0.01) defined by IBS Patient Global Impression of Severity (PGI-S) and IBS Patient Global Impression of Change (PGI-C). Severity scores for each Diary and Event Log item score and five-item, four-item, and three-item summary scores were calculated. Between-group differences in changes over time were significant for all summary scores in groups stratified by changes in PGI-S (p < 0.05), two of six Diary items, and three of four Event Log items; a one-grade change in PGI-S was considered a meaningful difference with mean change scores on all Diary items −0.13 to −0.86 [standard deviation (SD) 0.79–1.39]. Similarly, for patients who reported being ‘slightly improved’ (considered a clinically meaningful difference) on the PGI-C, mean change scores on Diary items ranged from −0.45 to −1.55 (SD 0.69–1.39). All estimates of clinically important change for each item and all summary scores were small and should be considered preliminary. These results are aligned with the previous standalone psychometric study regarding reliability and validity tests. Conclusions: These analyses provide evidence of the psychometric properties of the IBS-D Daily Symptom Diary and Event Log in a clinical trial population. PMID:28932269
Psychometric properties of a revised version of the Assisting Hand Assessment (Kids-AHA 5.0).
Holmefur, Marie M; Krumlinde-Sundholm, Lena
2016-06-01
The aim of this study was to scrutinize the Assisting Hand Assessment (AHA) version 4.4 for possible improvements and to evaluate the psychometric properties regarding internal scale validity and aspects of reliability of a revised version of the AHA. In collaboration with experts, scoring criteria were changed for four items, and one fully new item was constructed. Twenty-two original, one new, and four revised items were scored for 164 assessments of children with unilateral cerebral palsy aged 18 months to 12 years. Rasch measurement analysis was used to evaluate internal scale validity by exploring rating-scale functioning, item and person goodness-of-fit, and principal component analysis. Targeting and scale reliability were also evaluated. After removal of misfitting items, a 20-item scale showed satisfactory goodness-of-fit. Unidimensionality was confirmed by principal component analysis. The rating scale functioned well for the 20 items, and the item difficulty was well suited to the ability level of the sample. The person reliability coefficient was 0.98, indicating high separation ability of the scale. A conversion table of AHA scores between the previous version (4.4) and the new version (5.0) was constructed. The new, 20-item version of the Kids-AHA (version 5.0), demonstrated excellent internal scale validity, suggesting improved responsiveness to changes and shortened scoring time. For comparison of scores from version 4.4 to 5.0, a transformation table is presented. © 2015 Mac Keith Press.
Albuquerque, Maicon R.; Lopes, Mariana C.; de Paula, Jonas J.; Faria, Larissa O.; Pereira, Eveline T.; da Costa, Varley T.
2017-01-01
In order to understand the reasons that lead individuals to practice physical activity, researchers developed the Motives for Physical Activity Measure-Revised (MPAM-R) scale. In 2010, a translation of MPAM-R to Portuguese and its validation was performed. However, psychometric measures were not acceptable. In addition, factor scores in some sports psychology scales are calculated by the mean of scores by items of the factor. Nevertheless, it seems appropriate that items with higher factor loadings, extracted by Factor Analysis, have greater weight in the factor score, as items with lower factor loadings have less weight in the factor score. The aims of the present study are to translate, validate the MPAM-R for Portuguese versions, and investigate agreement between two methods used to calculate factor scores. Three hundred volunteers who were involved in physical activity programs for at least 6 months were collected. Confirmatory Factor Analysis of the 30 items indicated that the version did not fit the model. After excluding four items, the final model with 26 items showed acceptable model fit measures by Exploratory Factor Analysis, as well as it conceptually supports the five factors as the original proposal. When two methods are compared to calculate factors scores, our results showed that only “Enjoyment” and “Appearance” factors showed agreement between methods to calculate factor scores. So, the Portuguese version of the MPAM-R can be used in a Brazilian context, and a new proposal for the calculation of the factor score seems to be promising. PMID:28293203
Can health care providers recognise a fibromyalgia personality?
Da Silva, José A P; Jacobs, Johannes W G; Branco, Jaime C; Canaipa, Rita; Gaspar, M Filomena; Griep, Ed N; van Helmond, Toon; Oliveira, Paula J; Zijlstra, Theo J; Geenen, Rinie
2017-01-01
To determine if experienced health care providers (HCPs) can recognise patients with fibromyalgia (FM) based on a limited set of personality items, exploring the existence of a FM personality. From the 240-item NEO-PI-R personality questionnaire, 8 HCPs from two different countries each selected 20 items they considered most discriminative of FM personality. Then, evaluating the scores on these items of 129 female patients with FM and 127 female controls, each HCP rated the probability of FM for each individual on a 0-10 scale. Personality characteristics (domains and facets) of selected items were determined. Scores of patients with FM and controls on the eight 20-item sets, and HCPs' estimates of each individual's probability of FM were analysed for their discriminative value. The eight 20-item sets discriminated for FM, with areas under the receiver operating characteristic curve ranging from 0.71-0.81. The estimated probabilities for FM showed, in general, percentages of correct classifications above 50%, with rising correct percentages for higher estimated probabilities. The most often chosen and discriminatory items were predominantly of the domain neuroticism (all with higher scores in FM), followed by some items of the facet trust (lower scores in FM). HCPs can, based on a limited set of items from a personality questionnaire, distinguish patients with FM from controls with a statistically significant probability. The HCPs' expectation that personality in FM patients is associated with higher levels for aspects of neuroticism (proneness to psychological distress) and lower scores for aspects of trust, proved to be correct.
Yang, Lei; Chen, Shouming; Yang, Di; Li, Jiajin; Wu, Taixiang; Zuo, Yunxia
2018-01-01
Objective To learn about the overall quality of clinical anaesthesia study protocols from the Chinese Clinical Trials Registry and to discuss the way to improve study protocol quality. Methods We defined completeness of each sub-item in SPIRIT as N/A (not applicable) or with a score of 0, 1, or 2. For each protocol, we calculated the proportion of adequately reported items (score = 2 and N/A) and unreported items (score = 0). Protocol quality was determined according to the proportion of reported items, with values >50% indicating high quality. Protocol quality was determined according to the proportion of reported items. For each sub-item in SPIRIT, we calculated the adequately reported rate (percentage of all protocols with score 2 and NA on one sub-item) as well as the unreported rate (percentage of all protocols with score 0 on one sub-item). Results Total 126 study protocols were available for assessment. Among these, 88.1% were assessed as being of low quality. By comparison, the percentage of low-quality protocols was 88.9% after the publication of the SPIRIT statement. Among the 51 SPIRIT sub-items, 18 sub-items had an unreported rate above 90% while 16 had a higher adequately reported rate than an unreported rate. Conclusions The overall quality of clinical anaesthesia study protocols registered in the ChiCTR was poor. A mandatory protocol upload and self-check based on the SPIRIT statement during the trial registration process may improve protocol quality in the future. PMID:29872509
Item Selection and Pre-equating with Empirical Item Characteristic Curves.
ERIC Educational Resources Information Center
Livingston, Samuel A.
An empirical item characteristic curve shows the probability of a correct response as a function of the student's total test score. These curves can be estimated from large-scale pretest data. They enable test developers to select items that discriminate well in the score region where decisions are made. A similar set of curves can be used to…
Denova-Gutiérrez, Edgar; Tucker, Katherine L; Salmerón, Jorge; Flores, Mario; Barquera, Simón
2016-01-01
To examine the validity of a semi-quantitative food frequency questionnaire (SFFQ) to identify dietary patterns in an adult Mexican population. A 140-item SFFQ and two 24-hour dietary recalls (24DRs) were administered. Foods were categorized into 29 food groups used to derive dietary patterns via factor analysis. Pearson and intraclass correlations coefficients between dietary pattern scores identified from the SFFQ and 24DRs were assessed. Pattern 1 was high in snacks, fast food, soft drinks, processed meats and refined grains; pattern 2 was high in fresh vegetables, fresh fruits, and dairy products; and pattern 3 was high in legumes, eggs, sweetened foods and sugars. Pearson correlation coefficients between the SFFQ and the 24DRs for these patterns were 0.66 (P<0.001), 0.41 (P<0.001) and 0.29 (P=0.193) respectively. Our data indicate reasonable validity of the SFFQ, using factor analysis, to derive major dietary patterns in comparison with two 24DR.
Exploring sex differences in autistic traits: A factor analytic study of adults with autism.
Grove, Rachel; Hoekstra, Rosa A; Wierda, Marlies; Begeer, Sander
2017-08-01
Research has highlighted potential differences in the phenotypic and clinical presentation of autism spectrum conditions across sex. Furthermore, the measures utilised to evaluate autism spectrum conditions may be biased towards the male autism phenotype. It is important to determine whether these instruments measure the autism phenotype consistently in autistic men and women. This study evaluated the factor structure of the Autism Spectrum Quotient Short Form in a large sample of autistic adults. It also systematically explored specific sex differences at the item level, to determine whether the scale assesses the autism phenotype equivalently across males and females. Factor analyses were conducted among 265 males and 285 females. A two-factor structure consisting of a social behaviour and numbers and patterns factor was consistent across groups, indicating that the latent autism phenotype is similar among both autistic men and women. Subtle differences were observed on two social behaviour item thresholds of the Autism Spectrum Quotient Short Form, with women reporting scores more in line with the scores expected in autism on these items than men. However, these differences were not substantial. This study showed that the Autism Spectrum Quotient Short Form detects autistic traits equivalently in males and females and is not biased towards the male autism phenotype.
Petscher, Yaacov; Mitchell, Alison M; Foorman, Barbara R
2015-01-01
A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.
Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.
2016-01-01
A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568
Tailoring Multimedia Instruction to Soldier Needs
2014-12-01
Pretest Score (Mean % Items Correct) 39% 34% 48% 51% 51% 45% Posttest (Mean % Items Correct) 47% 44% 66% 60% 63% 56...Stepwise regression was used to examine the relationship between Soldiers’ posttest scores (criterion) and their pretest scores, training time, type of...differences among IMI types had no effect.) Pretest scores predicted posttest scores for both Adjust Indirect Fire (βstandardized = .66, t = 6.36
Christiansen, David Høyrup; Michener, Lori; Roy, Jean-Sébastien
2018-02-13
The Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire and the Western Ontario Rotator Cuff (WORC) index are 2 widely used patient-reported questionnaires in individuals with rotator cuff (RC) tendinopathy. In contrast to the WORC index, for which the items are specific to the affected shoulder, the items of the DASH questionnaire assess the ability to perform activities regardless of the arm used. The objective of this study is to determine whether scores on the DASH questionnaire and WORC index are affected if the symptoms are on the dominant or nondominant side in individuals with RC tendinopathy. Given the number of items that can be influenced by dominance, the hypothesis is that DASH scores will be impacted by the side of the symptoms. Individuals with RC tendinopathy (N = 149) completed questions on symptomatology and hand dominance, the DASH questionnaire, and the WORC index. Differences in total scores (independent t test) and single items (Wilcoxon rank sum test) were compared between groups of participants with dominant-side symptoms and those without dominant-side symptoms. No significant differences were observed for WORC or DASH total scores when comparing participants with and without symptoms on their dominant side. Single-item comparison revealed more items being affected by symptom side on the DASH questionnaire (6 of 30 items) than on the WORC index (2 of 21 items). The side of the symptoms does not influence the DASH and WORC total scores, as there are no systematic differences between individuals with and without symptoms in their dominant shoulder. However, the presence of dominant symptoms does influence item scores more on the DASH questionnaire than on the WORC index. Copyright © 2018 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
Development of a Short Questionnaire to Assess Diet Quality among Older Community-Dwelling Adults.
Robinson, S M; Jameson, K A; Bloom, I; Ntani, G; Crozier, S R; Syddall, H; Dennison, E M; Cooper, C R; Sayer, A A
2017-01-01
To evaluate the use of a short questionnaire to assess diet quality in older adults. Cross-sectional study. Hertfordshire, UK. 3217 community-dwelling older adults (59-73 years). Diet was assessed using an administered food frequency questionnaire (FFQ); two measures of diet quality were defined by calculating participants' 'prudent diet' scores, firstly from a principal component analysis of the data from the full FFQ (129 items) and, secondly, from a short version of the FFQ (including 24 indicator foods). Scores calculated from the full and short FFQ were compared with nutrient intake and blood concentrations of vitamin C and lipids. Prudent diet scores calculated from the full FFQ and short FFQ were highly correlated (0.912 in men, 0.904 in women). The pattern of associations between nutrient intake (full FFQ) and diet scores calculated using the short and full FFQs were very similar, both for men and women. Prudent diet scores calculated from the full and short FFQs also showed comparable patterns of association with blood measurements: in men and women, both scores were positively associated with plasma vitamin C concentration and serum HDL; in women, an inverse association with serum triglycerides was also observed. A short food-based questionnaire provides useful information about the diet quality of older adults. This simple tool does not require nutrient analysis, and has the potential to be of value to non-specialist researchers.
West, Colin P; Dyrbye, Liselotte N; Sloan, Jeff A; Shanafelt, Tait D
2009-12-01
Burnout has negative effects on work performance and patient care. The current standard for burnout assessment is the Maslach Burnout Inventory (MBI), a well-validated instrument consisting of 22 items answered on a 7-point Likert scale. However, the length of the MBI can limit its utility in physician surveys. To evaluate the performance of two questions relative to the full MBI for measuring burnout. Cross-sectional data from 2,248 medical students, 333 internal medicine residents, 465 internal medicine faculty, and 7,905 practicing surgeons. The single questions with the highest factor loading on the emotional exhaustion (EE) ("I feel burned out from my work") and depersonalization (DP) ("I have become more callous toward people since I took this job") domains of burnout were evaluated in four large samples of medical students, internal medicine residents, internal medicine faculty, and practicing surgeons. Spearman correlations between the single EE question and the full EE domain score minus that question ranged from 0.76-0.83. Spearman correlations between the single DP question and the full DP domain score minus that question ranged from 0.61-0.72. Responses to the single item measures of emotional exhaustion and depersonalization stratified risk of high burnout in the relevant domain on the full MBI, with consistent patterns across the four sampled groups. Single item measures of emotional exhaustion and depersonalization provide meaningful information on burnout in medical professionals.
Audio-Enhanced Tablet Computers to Assess Children’s Food Frequency From Migrant Farmworker Mothers
Kilanowski, Jill F.; Trapl, Erika S.; Kofron, Ryan M.
2014-01-01
This study sought to improve data collection in children’s food frequency surveys for non-English speaking immigrant/migrant farmworker mothers using audio-enhanced tablet computers (ATCs). We hypothesized that by using technological adaptations, we would be able to improve data capture and therefore reduce lost surveys. This Food Frequency Questionnaire (FFQ), a paper-based dietary assessment tool, was adapted for ATCs and assessed consumption of 66 food items asking 3 questions for each food item: frequency, quantity of consumption, and serving size. The tablet-based survey was audio enhanced with each question “read” to participants, accompanied by food item images, together with an embedded short instructional video. Results indicated that respondents were able to complete the 198 questions from the 66 food item FFQ on ATCs in approximately 23 minutes. Compared with paper-based FFQs, ATC-based FFQs had less missing data. Despite overall reductions in missing data by use of ATCs, respondents still appeared to have difficulty with question 2 of the FFQ. Ability to score the FFQ was dependent on what sections missing data were located. Unlike the paper-based FFQs, no ATC-based FFQs were unscored due to amount or location of missing data. An ATC-based FFQ was feasible and increased ability to score this survey on children’s food patterns from migrant farmworker mothers. This adapted technology may serve as an exemplar for other non-English speaking immigrant populations. PMID:25343004
Silva, Adriana Lucia Pastore E; Croci, Alberto Tesconi; Gobbi, Riccardo Gomes; Hinckel, Betina Bremer; Pecora, José Ricardo; Demange, Marco Kawamura
2017-01-01
Translation, cultural adaptation, and validation of the new version of the Knee Society Score - The 2011 KS Score - into Brazilian Portuguese and verification of its measurement properties, reproducibility, and validity. In 2012, the new version of the Knee Society Score was developed and validated. This scale comprises four separate subscales: (a) objective knee score (seven items: 100 points); (b) patient satisfaction score (five items: 40 points); (c) patient expectations score (three items: 15 points); and (d) functional activity score (19 items: 100 points). A total of 90 patients aged 55-85 years were evaluated in a clinical cross-sectional study. The pre-operative translated version was applied to patients with TKA referral, and the post-operative translated version was applied to patients who underwent TKA. Each patient answered the same questionnaire twice and was evaluated by two experts in orthopedic knee surgery. Evaluations were performed pre-operatively and three, six, or 12 months post-operatively. The reliability of the questionnaire was evaluated using the intraclass correlation coefficient (ICC) between the two applications. Internal consistency was evaluated using Cronbach's alpha. The ICC found no difference between the means of the pre-operative, three-month, and six-month post-operative evaluations between sub-scale items. The Brazilian Portuguese version of The 2011 KS Score is a valid and reliable instrument for objective and subjective evaluation of the functionality of Brazilian patients who undergo TKA and revision TKA.
Ang, Rebecca P; Chong, Wan Har; Huan, Vivien S; Yeo, Lay See
2007-01-01
This article reports the development and initial validation of scores obtained from the Adolescent Concerns Measure (ACM), a scale which assesses concerns of Asian adolescent students. In Study 1, findings from exploratory factor analysis using 619 adolescents suggested a 24-item scale with four correlated factors--Family Concerns (9 items), Peer Concerns (5 items), Personal Concerns (6 items), and School Concerns (4 items). Initial estimates of convergent validity for ACM scores were also reported. The four-factor structure of ACM scores derived from Study 1 was confirmed via confirmatory factor analysis in Study 2 using a two-fold cross-validation procedure with a separate sample of 811 adolescents. Support was found for both the multidimensional and hierarchical models of adolescent concerns using the ACM. Internal consistency and test-retest reliability estimates were adequate for research purposes. ACM scores show promise as a reliable and potentially valid measure of Asian adolescents' concerns.
ERIC Educational Resources Information Center
Store, Davie
2013-01-01
The impact of particular types of context effects on actual scores is less understood although there has been some research carried out regarding certain types of context effects under the nonequivalent anchor test (NEAT) design. In addition, the issue of the impact of item context effects on scores has not been investigated extensively when item…
ERIC Educational Resources Information Center
Monahan, Patrick O.; Ankenmann, Robert D.
2010-01-01
When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item…
ERIC Educational Resources Information Center
Öztürk-Gübes, Nese; Kelecioglu, Hülya
2016-01-01
The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…
ERIC Educational Resources Information Center
Matson, Johnny L.; Fodstad, Jill C.; Mahan, Sara
2009-01-01
Behavioral symptoms of comorbid psychopathology of 651 children 17-37 months of age who were at risk for developmental disabilities were studied using the BISCUIT-Part 2. In Study 1, norms and cutoff scores were established for this new scale on this sample. In Study 2, frequency of response on the 52 items measured was reported. Problems in…
Dominguez, Ligia J.; Bes-Rastrollo, Maira; Basterra-Gortari, Francisco Javier; Gea, Alfredo; Barbagallo, Mario; Martínez-González, Miguel A.
2015-01-01
Background Strong evidence supports that dietary modifications may decrease incident type 2 diabetes mellitus (T2DM). Numerous diabetes risk models/scores have been developed, but most do not rely specifically on dietary variables or do not fully capture the overall dietary pattern. We prospectively assessed the association of a dietary-based diabetes-risk score (DDS), which integrates optimal food patterns, with the risk of developing T2DM in the SUN (“Seguimiento Universidad de Navarra”) longitudinal study. Methods We assessed 17,292 participants initially free of diabetes, followed-up for a mean of 9.2 years. A validated 136-item FFQ was administered at baseline. Taking into account previous literature, the DDS positively weighted vegetables, fruit, whole cereals, nuts, coffee, low-fat dairy, fiber, PUFA, and alcohol in moderate amounts; while it negatively weighted red meat, processed meats and sugar-sweetened beverages. Energy-adjusted quintiles of each item (with exception of moderate alcohol consumption that received either 0 or 5 points) were used to build the DDS (maximum: 60 points). Incident T2DM was confirmed through additional detailed questionnaires and review of medical records of participants. We used Cox proportional hazards models adjusted for socio-demographic and anthropometric parameters, health-related habits, and clinical variables to estimate hazard ratios (HR) of T2DM. Results We observed 143 T2DM confirmed cases during follow-up. Better baseline conformity with the DDS was associated with lower incidence of T2DM (multivariable-adjusted HR for intermediate (25–39 points) vs. low (11–24) category 0.43 [95% confidence interval (CI) 0.21, 0.89]; and for high (40–60) vs. low category 0.32 [95% CI: 0.14, 0.69]; p for linear trend: 0.019). Conclusions The DDS, a simple score exclusively based on dietary components, showed a strong inverse association with incident T2DM. This score may be applicable in clinical practice to improve dietary habits of subjects at high risk of T2DM and also as an educational tool for laypeople to help them in self-assessing their future risk for developing diabetes. PMID:26544985
Predictors of maternal responsiveness.
Drake, Emily E; Humenick, Sharron S; Amankwaa, Linda; Younger, Janet; Roux, Gayle
2007-01-01
To explore maternal responsiveness in the first 2 to 4 months after delivery and to evaluate potential predictors of maternal responsiveness, including infant feeding, maternal characteristics, and demographic factors such as age, socioeconomic status, and educational level. A cross-sectional survey design was used to assess the variables of maternal responsiveness, feeding patterns, and maternal characteristics in a convenience sample of 177 mothers in the first 2 to 4 months after delivery. The 60-item self-report instrument included scales to measure maternal responsiveness, self-esteem, and satisfaction with life as well as infant feeding questions and sociodemographic items. An online data-collection strategy was used, resulting in participants from 41 U.S. states. Multiple regression analysis showed that satisfaction with life, self-esteem, and number of children, but not breastfeeding, explained a significant portion of the variance in self-reported maternal responsiveness scores. In this analysis, sociodemographic variables such as age, education, income, and work status showed little or no relationship to maternal responsiveness scores. This study provides additional information about patterns of maternal behavior in the transition to motherhood and some of the variables that influence that transition. Satisfaction with life was a new predictor of maternal responsiveness. However, with only 15% of the variance explained by the predictors in this study, a large portion of the variance in maternal responsiveness remains unexplained. Further research in this area is needed.
ERIC Educational Resources Information Center
Wang, Ze; Rohrer, David; Chuang, Chi-ching; Fujiki, Mayo; Herman, Keith; Reinke, Wendy
2015-01-01
This study compared 5 scoring methods in terms of their statistical assumptions. They were then used to score the Teacher Observation of Classroom Adaptation Checklist, a measure consisting of 3 subscales and 21 Likert-type items. The 5 methods used were (a) sum/average scores of items, (b) latent factor scores with continuous indicators, (c)…
Do large-scale assessments measure students' ability to integrate scientific knowledge?
NASA Astrophysics Data System (ADS)
Lee, Hee-Sun
2010-03-01
Large-scale assessments are used as means to diagnose the current status of student achievement in science and compare students across schools, states, and countries. For efficiency, multiple-choice items and dichotomously-scored open-ended items are pervasively used in large-scale assessments such as Trends in International Math and Science Study (TIMSS). This study investigated how well these items measure secondary school students' ability to integrate scientific knowledge. This study collected responses of 8400 students to 116 multiple-choice and 84 open-ended items and applied an Item Response Theory analysis based on the Rasch Partial Credit Model. Results indicate that most multiple-choice items and dichotomously-scored open-ended items can be used to determine whether students have normative ideas about science topics, but cannot measure whether students integrate multiple pieces of relevant science ideas. Only when the scoring rubric is redesigned to capture subtle nuances of student open-ended responses, open-ended items become a valid and reliable tool to assess students' knowledge integration ability.
Perceptions of Culture of Safety in Hemodialysis Centers.
Davis, Kristina K; Harris, Kathleen G; Mahishi, Vrinda; Bartholomew, Edward G; Kenward, Kevin
2016-01-01
Staff members, physicians, nurse practitioners, and physician assistants from a sample of hemodialysis facilities in Network 6 (North Carolina, South Carolina, and Georgia) and Network 11 (Michigan, Minnesota, North Dakota, South Dakota, and Wisconsin) completed a 10-item assessment with modified questions from the Hospital Survey on Patient Safety Culture, with an emphasis on safety culture related to vascular access infections. A composite score was constructed, which was the average of the percent-positive scores of the items. Overall, scores were high, indicating a positive patient safety culture. Composite scores varied by role type, with nurses, patient care technicians, and other technicians reporting the lowest composite scores. Network 6 participants reported higher scores on two of the survey items. Fewer staff within a facility were associated with higher composite scores.
Boston, Raymond C.; Coyne, James C.; Farrar, John T.
2010-01-01
Objective To develop and psychometrically test an owner self-administered questionnaire designed to assess severity and impact of chronic pain in dogs with osteoarthritis. Sample Population 70 owners of dogs with osteoarthritis and 50 owners of clinically normal dogs. Procedures Standard methods for the stepwise development and testing of instruments designed to assess subjective states were used. Items were generated through focus groups and an expert panel. Items were tested for readability and ambiguity, and poorly performing items were removed. The reduced set of items was subjected to factor analysis, reliability testing, and validity testing. Results Severity of pain and interference with function were 2 factors identified and named on the basis of the items contained in them. Cronbach’s α was 0.93 and 0.89, respectively, suggesting that the items in each factor could be assessed as a group to compute factor scores (ie, severity score and interference score). The test-retest analysis revealed κ values of 0.75 for the severity score and 0.81 for the interference score. Scores correlated moderately well (r = 0.51 and 0.50, respectively) with the overall quality-of-life (QOL) question, such that as severity and interference scores increased, QOL decreased. Clinically normal dogs had significantly lower severity and interference scores than dogs with osteoarthritis. Conclusions and Clinical Relevance A psychometrically sound instrument was developed. Responsiveness testing must be conducted to determine whether the questionnaire will be useful in reliably obtaining quantifiable assessments from owners regarding the severity and impact of chronic pain and its treatment on dogs with osteoarthritis. PMID:17542696
Distinctions between Item Format and Objectivity in Scoring.
ERIC Educational Resources Information Center
Terwilliger, James S.
This paper clarifies important distinctions in item writing and item scoring and considers the implications of these distinctions for developing guidelines related to test construction for training teachers. The terminology used to describe and classify paper and pencil test questions frequently confuses two distinct features of questions:…
Improving Factor Score Estimation Through the Use of Observed Background Characteristics
Curran, Patrick J.; Cole, Veronica; Bauer, Daniel J.; Hussong, Andrea M.; Gottfredson, Nisha
2016-01-01
A challenge facing nearly all studies in the psychological sciences is how to best combine multiple items into a valid and reliable score to be used in subsequent modelling. The most ubiquitous method is to compute a mean of items, but more contemporary approaches use various forms of latent score estimation. Regardless of approach, outside of large-scale testing applications, scoring models rarely include background characteristics to improve score quality. The current paper used a Monte Carlo simulation design to study score quality for different psychometric models that did and did not include covariates across levels of sample size, number of items, and degree of measurement invariance. The inclusion of covariates improved score quality for nearly all design factors, and in no case did the covariates degrade score quality relative to not considering the influences at all. Results suggest that the inclusion of observed covariates can improve factor score estimation. PMID:28757790
Effects of levomilnacipran ER on fatigue symptoms associated with major depressive disorder
Fava, Maurizio; Gommoll, Carl; Chen, Changzheng; Greenberg, William M.; Ruth, Adam
2016-01-01
The aim of this study was to evaluate the effects of levomilnacipran extended-release (ER) on depression-related fatigue in adults with major depressive disorder. Post-hoc analyses of five phase III trials were carried out, with evaluation of fatigue symptoms based on score changes in four items: Montgomery–Åsberg Depression Rating Scale (MADRS) item 7 (lassitude), and 17-item Hamilton Depression Rating Scale (HAMD17) items 7 (work/activities), 8 (retardation), and 13 (somatic symptoms). Symptom remission was analyzed on the basis of score shifts from baseline to end of treatment: MADRS item 7 and HAMD17 item 7 (from ≥2 to ≤1); HAMD17 items 8 and 13 (from ≥1 to 0). The mean change in MADRS total score was analyzed in patients with low and high fatigue (MADRS item 7 baseline score <4 and ≥4, respectively). Patients receiving levomilnacipran ER had significantly greater mean improvements and symptom remission (no/minimal residual fatigue) on all fatigue-related items: lassitude (35 vs. 28%), work/activities (43 vs. 35%), retardation (46 vs. 39%), somatic symptoms (26 vs. 18%; all Ps<0.01 versus placebo). The mean change in MADRS total score was significantly greater with levomilnacipran ER versus placebo in both low (least squares mean difference=−2.8, P=0.0018) and high (least squares mean difference=−3.1, P<0.0001) fatigue subgroups. Levomilnacipran ER treatment was effective in reducing depression-related fatigue in adult patients with major depressive disorder and was associated with remission of fatigue symptoms. PMID:26584326
ERIC Educational Resources Information Center
Yao, Lihua
2012-01-01
Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…
A novel task-oriented optimal design for P300-based brain-computer interfaces.
Zhou, Zongtan; Yin, Erwei; Liu, Yang; Jiang, Jun; Hu, Dewen
2014-10-01
Objective. The number of items of a P300-based brain-computer interface (BCI) should be adjustable in accordance with the requirements of the specific tasks. To address this issue, we propose a novel task-oriented optimal approach aimed at increasing the performance of general P300 BCIs with different numbers of items. Approach. First, we proposed a stimulus presentation with variable dimensions (VD) paradigm as a generalization of the conventional single-character (SC) and row-column (RC) stimulus paradigms. Furthermore, an embedding design approach was employed for any given number of items. Finally, based on the score-P model of each subject, the VD flash pattern was selected by a linear interpolation approach for a certain task. Main results. The results indicate that the optimal BCI design consistently outperforms the conventional approaches, i.e., the SC and RC paradigms. Specifically, there is significant improvement in the practical information transfer rate for a large number of items. Significance. The results suggest that the proposed optimal approach would provide useful guidance in the practical design of general P300-based BCIs.
A novel task-oriented optimal design for P300-based brain-computer interfaces
NASA Astrophysics Data System (ADS)
Zhou, Zongtan; Yin, Erwei; Liu, Yang; Jiang, Jun; Hu, Dewen
2014-10-01
Objective. The number of items of a P300-based brain-computer interface (BCI) should be adjustable in accordance with the requirements of the specific tasks. To address this issue, we propose a novel task-oriented optimal approach aimed at increasing the performance of general P300 BCIs with different numbers of items. Approach. First, we proposed a stimulus presentation with variable dimensions (VD) paradigm as a generalization of the conventional single-character (SC) and row-column (RC) stimulus paradigms. Furthermore, an embedding design approach was employed for any given number of items. Finally, based on the score-P model of each subject, the VD flash pattern was selected by a linear interpolation approach for a certain task. Main results. The results indicate that the optimal BCI design consistently outperforms the conventional approaches, i.e., the SC and RC paradigms. Specifically, there is significant improvement in the practical information transfer rate for a large number of items. Significance. The results suggest that the proposed optimal approach would provide useful guidance in the practical design of general P300-based BCIs.
Beierlein, V; Köllner, V; Neu, R; Schulz, H
2016-12-01
Objectives: The assessment of work pressures is of particular importance in psychosomatic rehabilitation. An established questionnaire is the Occupational Stress and Coping Inventory (German abbr. AVEM), but it is quite long and with regard to scoring time-consuming in routine clinical care. It should therefore be tested, whether a shortened version of the AVEM can be developed, which is able to assess the formerly described three second-order factors of the AVEM, namely Working Commitment, Resilience, and Emotions, sufficiently reliable and valid, and which also may be used for screening of patients with prominent work-related behavior and experience patterns. Methods: Data were collected at admission from consecutive samples of three hospitals of psychosomatic rehabilitation ( N = 10,635 patients). The sample was randomly divided in two subsamples (design and validation sample). Using exploratory principal component analyses in the design sample, items with the highest factor loadings for the three new scales were selected and evaluated psychometrically using the validation sample. Possible Cut-off values ought to be derived from distribution patterns of scores in the scales. Relationships with sociodemographic, occupational and diagnosis-related characteristics, as well as with patterns of work-related experiences and behaviors are examined. Results: The three performed principal component analyses explained in the design sample on the respective first factor between 31 % and 34 % of the variance. The selected 20 items were assigned to the 3-factor structure in the validation sample as expected. The three new scales are sufficiently reliable with values of Cronbach's α between 0,84 and 0,88. The naming of the three new scales is based on the names of the secondary factors. Cut-off values for the identification of distinctive patient-reported data are proposed. Conclusion: Main advantages of the proposed shortened version AVEM-3D are that with a considerable smaller number of items the three main dimensions of relevant work-related behavior and experience patterns can be reliably measured. The proposed measure is simple and economic to use and interpret. Based on the present sample we provide means and standard deviations as reference at admission of psychosomatic rehabilitation. As a limitation it should be mentioned that further evaluation of reliability, validity and sensitivity to change restricted to the items of the shortened version is necessary. The practicability and validity of the proposed cut-off values cannot yet be conclusively assessed. Finally, the validity of the AVEM-3D in groups of indications other than psychosomatic patients and in healthy persons remains to be examined. © Georg Thieme Verlag KG Stuttgart · New York.
Rohan, Kelly J; Rough, Jennifer N; Evans, Maggie; Ho, Sheau-Yan; Meyerhoff, Jonah; Roberts, Lorinda M; Vacek, Pamela M
2016-08-01
We present a fully articulated protocol for the Hamilton Rating Scale for Depression (HAM-D), including item scoring rules, rater training procedures, and a data management algorithm to increase accuracy of scores prior to outcome analyses. The latter involves identifying potentially inaccurate scores as interviews with discrepancies between two independent raters on the basis of either scores >=5-point difference) or meeting threshold for depression recurrence status, a long-term treatment outcome with public health significance. Discrepancies are resolved by assigning two new raters, identifying items with disagreement per an algorithm, and reaching consensus on the most accurate scores for those items. These methods were applied in a clinical trial where the primary outcome was the Structured Interview Guide for the Hamilton Rating Scale for Depression-Seasonal Affective Disorder version (SIGH-SAD), which includes the 21-item HAM-D and 8 items assessing atypical symptoms. 177 seasonally depressed adult patients were enrolled and interviewed at 10 time points across treatment and the 2-year followup interval for a total of 1589 completed interviews with 1535 (96.6%) archived. Inter-rater reliability ranged from ICCs of .923-.967. Only 86 (5.6%) interviews met criteria for a between-rater discrepancy. HAM-D items "Depressed Mood", "Work and Activities", "Middle Insomnia", and "Hypochondriasis" and Atypical items "Fatigability" and "Hypersomnia" contributed most to discrepancies. Generalizability beyond well-trained, experienced raters in a clinical trial is unknown. Researchers might want to consider adopting this protocol in part or full. Clinicians might want to tailor it to their needs. Copyright © 2016 Elsevier B.V. All rights reserved.
Berdeaux, Gilles; Meunier, Juliette; Arnould, Benoit; Viala-Danten, Muriel
2010-05-24
The purpose of this study was to reduce the number of items, create a scoring method and assess the psychometric properties of the Freedom from Glasses Value Scale (FGVS), which measures benefits of freedom from glasses perceived by cataract and presbyopic patients after multifocal intraocular lens (IOL) surgery. The 21-item FGVS, developed simultaneously in French and Spanish, was administered by phone during an observational study to 152 French and 152 Spanish patients who had undergone cataract or presbyopia surgery at least 1 year before the study. Reduction of items and creation of the scoring method employed statistical methods (principal component analysis, multitrait analysis) and content analysis. Psychometric properties (validation of the structure, internal consistency reliability, and known-group validity) of the resulting version were assessed in the pooled population and per country. One item was deleted and 3 were kept but not aggregated in a dimension. The other 17 items were grouped into 2 dimensions ('global evaluation', 9 items; 'advantages', 8 items) and divided into 5 sub-dimensions, with higher scores indicating higher benefit of surgery. The structure was validated (good item convergent and discriminant validity). Internal consistency reliability was good for all dimensions and sub-dimensions (Cronbach's alphas above 0.70). The FGVS was able to discriminate between patients wearing glasses or not after surgery (higher scores for patients not wearing glasses). FGVS scores were significantly higher in Spain than France; however, the measure had similar psychometric performances in both countries. The FGVS is a valid and reliable instrument measuring benefits of freedom from glasses perceived by cataract and presbyopic patients after multifocal IOL surgery.
2010-01-01
Background The purpose of this study was to reduce the number of items, create a scoring method and assess the psychometric properties of the Freedom from Glasses Value Scale (FGVS), which measures benefits of freedom from glasses perceived by cataract and presbyopic patients after multifocal intraocular lens (IOL) surgery. Methods The 21-item FGVS, developed simultaneously in French and Spanish, was administered by phone during an observational study to 152 French and 152 Spanish patients who had undergone cataract or presbyopia surgery at least 1 year before the study. Reduction of items and creation of the scoring method employed statistical methods (principal component analysis, multitrait analysis) and content analysis. Psychometric properties (validation of the structure, internal consistency reliability, and known-group validity) of the resulting version were assessed in the pooled population and per country. Results One item was deleted and 3 were kept but not aggregated in a dimension. The other 17 items were grouped into 2 dimensions ('global evaluation', 9 items; 'advantages', 8 items) and divided into 5 sub-dimensions, with higher scores indicating higher benefit of surgery. The structure was validated (good item convergent and discriminant validity). Internal consistency reliability was good for all dimensions and sub-dimensions (Cronbach's alphas above 0.70). The FGVS was able to discriminate between patients wearing glasses or not after surgery (higher scores for patients not wearing glasses). FGVS scores were significantly higher in Spain than France; however, the measure had similar psychometric performances in both countries. Conclusions The FGVS is a valid and reliable instrument measuring benefits of freedom from glasses perceived by cataract and presbyopic patients after multifocal IOL surgery. PMID:20497555
A Life Events Scale for Armed Forces personnel
Chaudhury, Suprakash; Srivastava, Kalpana; Raju, M.S.V. Kama; Salujha, S.K.
2006-01-01
Background: Armed Forces personnel are routinely exposed to a number of unique stressful life events. None of the available scales are relevant to service personnel. Aim: To construct a scale to measure life events in service personnel. Methods: In the first stage of the study open-ended questions along with items generated by the expert group by consensus method were administered to 50 soldiers. During the second stage a scale comprising 59 items and open-ended questions was administered to 165 service personnel. The final scale of 52 items was administered to 200 service personnel in group setting. Weightage was assigned on a 0 to 100 range. For normative study the Armed Forces Medical College Life Events Scale (AFMC LES) was administered to 1200 Army, 100 Air Force and 100 Navy personnel. Results: Service personnel experience an average of 4 life events in past one year and 13 events in a life-time. On an average service personnel experience 115 life change unit scores in past one year and 577 life change unit scores in life-time on the AFMC LES. The scale has concurrent validity when compared with the Presumptive Stressful Life Events Scale (PSLES). There is internal consistency in the scale with the routine items being rated very low. There is a pattern of uniformity with the civilian counterparts along with differences in the items specific to service personnel. Conclusions: The AFMC LES includes the unique stresses of service personnel that are not included in any life events scale available in India or in the west and should be used to assess stressful life events in service personnel. PMID:20844647
Martinková, Patrícia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.
2017-01-01
We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments. PMID:28572182
Keck, Andrea D; Foocharoen, Chingching; Rosato, Edoardo; Smith, Vanessa; Allanore, Yannick; Distler, Oliver; Stamenkovic, Bojana; Pereira Da Silva, José Antonio; Hadj Khelifa, Sondess; Denisov, Lev N; Hachulla, Eric; García de la Peña Lefebvre, Paloma; Sibilia, Jean; Airò, Paolo; Caramaschi, Paola; Müller-Ladner, Ulf; Wiland, Piotr; Walker, Ulrich A
2014-04-01
The objective of this study was to analyse an association between nailfold capillary abnormalities and the presence and severity of erectile dysfunction (ED) in men with SSc. A cross-sectional analysis of the prospective European League Against Rheumatism (EULAR) Scleroderma Trial and Research database was performed. Men with SSc were included if they had undergone nailfold capillaroscopy and simultaneous ED assessment with the 5-item International Index for Erectile Function (IIEF-5). Eighty-six men met the inclusion criteria. Eight men (9.3%) had not had sexual intercourse and could not be assigned an IIEF-5 score. Sixty-nine of the 78 men (88.5%) with an IIEF-5 score had nailfold capillary abnormalities, of whom 54 (78.3%) suffered from ED. Nine men (11.5%) had no nailfold capillary abnormalities, of whom six (66.7%) had ED (P = 0.44). ED was more frequent in older men (P = 0.002) and in men with diffuse disease (P = 0.06). Men with abnormal capillaroscopy had a higher median EULAR disease activity than men without (P = 0.02), a lower diffusing capacity of the lung (P = 0.001) and a higher modified Rodnan skin score (P = 0.04), but mean IIEF-5 scores did not differ [15.7 (S.D. 6.2) vs 15.7 (S.D. 6.3)]. IIEF-5 scores did not differ between men with early (n = 12), active (n = 27) or late (n = 27) patterns (IIEF-5 scores of 17.9, 16.3 and 14.7, respectively). There were no differences in the prevalence of early, active and late capillaroscopy patterns between men with or without ED. Neither the presence or absence of abnormal capillaroscopy findings nor the subdivision into early, active and late patterns is associated with coexistent ED in SSc.
Patterns of source monitoring bias in incarcerated youths with and without conduct problems.
Morosan, Larisa; Badoud, Deborah; Salaminios, George; Eliez, Stephan; Van der Linden, Martial; Heller, Patrick; Debbané, Martin
2018-01-01
Antisocial individuals present behaviours that violate the social norms and the rights of others. In the present study, we examine whether biases in monitoring the self-generated cognitive material might be linked to antisocial manifestations during adolescence. We further examine the association with psychopathic traits and conduct problems (CPs). Sixty-five incarcerated adolescents (IAs; M age = 15.85, SD = 1.30) and 88 community adolescents (CAs; M age = 15.78, SD = 1.60) participated in our study. In the IA group, 28 adolescents presented CPs (M age = 16.06, SD = 1.41) and 19 did not meet the diagnostic criteria for CPs (M age = 15.97, SD = 1.20). Source monitoring was assessed through a speech-monitoring task, using items requiring different levels of cognitive effort; recognition and source-monitoring bias scores (internalising and externalising biases) were calculated. Between-group comparisons indicate greater overall biases and different patterns of biases in the source monitoring. IA participants manifest a greater externalising bias, whereas CA participants present a greater internalising bias. In addition, IA with CPs present different patterns of item recognition. These results indicate that the two groups of adolescents present different types of source-monitoring bias for self-generated speech. In addition, the IAs with CPs present impairments in item recognition. Future studies may examine the developmental implications of self-monitoring biases in the perseverance of antisocial behaviours from adolescence to adulthood.
Multi-institutional validation of a web-based core competency assessment system.
Tabuenca, Arnold; Welling, Richard; Sachdeva, Ajit K; Blair, Patrice G; Horvath, Karen; Tarpley, John; Savino, John A; Gray, Richard; Gulley, Julie; Arnold, Teresa; Wolfe, Kevin; Risucci, Donald A
2007-01-01
The Association of Program Directors in Surgery and the Division of Education of the American College of Surgeons developed and implemented a web-based system for end-of-rotation faculty assessment of ACGME core competencies of residents. This study assesses its reliability and validity across multiple programs. Each assessment included ratings (1-5 scale) on 23 items reflecting the 6 core competencies. A total of 4241 end-of-rotation assessments were completed for 332 general surgery residents (> or =5 evaluations each) at 5 sites during the 2004-2005 and 2005-2006 academic years. The mean rating for each resident on each item was computed for each academic year. The mean rating of items representing each competency was computed for each resident. Additional data included USMLE and ABSITE scores, PGY, and status in program (categorical, designated preliminary, and undesignated preliminary). Coefficient alpha was greater than 0.90 for each competency score. Mean ratings for each competency increased significantly (p < 0.01) as a function of PGY. Mean ratings for professionalism and interpersonal/communication skills (IPC) were significantly higher than all other competencies at all PGY levels. Competency ratings of PGY 1 residents correlated significantly with USMLE Step I, ranging from (r = 0.26, p < 0.01) for Professionalism to (r = 0.41, p < 0.001) for Systems-Based Practice. Ratings of Knowledge (r = 0.31, p < 0.01), Practice-Based Learning & Improvement (PBLI; r = 0.22, p < 0.05), and Systems-Based Practice (r = 0.20, p < 0.05) correlated significantly with 2005 ABSITE Total Percentile. Ratings of all competencies correlated significantly with the 2006 ABSITE Total Percentile Score (range: r = 0.20, p < 0.05 for professionalism to r = 0.35, p < 0.001 for knowledge). Categorical and designated preliminary residents received significantly higher ratings (p < 0.05) than nondesignated preliminaries for knowledge, patient care, PBLI, and systems-based practice only. Faculty ratings of core competencies are internally consistent. The pattern of statistically significant correlations between competency ratings and USMLE and ABSITE scores supports the postdictive and concurrent validity, respectively, of faculty perceptions of resident knowledge. The pattern of increased ratings as a function of PGY supports the construct validity of faculty ratings of resident core competencies.
Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses
NASA Astrophysics Data System (ADS)
Sukkarieh, Jana Z.
2011-03-01
Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community—one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types—which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.
Item Response Theory Modeling of the Philadelphia Naming Test
ERIC Educational Resources Information Center
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.
2015-01-01
Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…
An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores.
ERIC Educational Resources Information Center
Marco, Gary L.; And Others
Three item response models were evaluated for estimating item parameters and equating test scores. The models, which approximated the traditional three-parameter model, included: (1) the Rasch one-parameter model, operationalized in the BICAL computer program; (2) an approximate three-parameter logistic model based on coarse group data divided…
Use of Automated Scoring Features to Generate Hypotheses Regarding Language-Based DIF
ERIC Educational Resources Information Center
Shermis, Mark D.; Mao, Liyang; Mulholland, Matthew; Kieftenbeld, Vincent
2017-01-01
This study uses the feature sets employed by two automated scoring engines to determine if a "linguistic profile" could be formulated that would help identify items that are likely to exhibit differential item functioning (DIF) based on linguistic features. Sixteen items were administered to 1200 students where demographic information…
Nonparametric Item Response Curve Estimation with Correction for Measurement Error
ERIC Educational Resources Information Center
Guo, Hongwen; Sinharay, Sandip
2011-01-01
Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…
Item response theory scoring and the detection of curvilinear relationships.
Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A
2017-03-01
Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Developing a Clinician Friendly Tool to Identify Useful Clinical Practice Guidelines: G-TRUST.
Shaughnessy, Allen F; Vaswani, Akansha; Andrews, Bonnie K; Erlich, Deborah R; D'Amico, Frank; Lexchin, Joel; Cosgrove, Lisa
2017-09-01
Clinicians are faced with a plethora of guidelines. To rate guidelines, they can select from a number of evaluation tools, most of which are long and difficult to apply. The goal of this project was to develop a simple, easy-to-use checklist for clinicians to use to identify trustworthy, relevant, and useful practice guidelines, the Guideline Trustworthiness, Relevance, and Utility Scoring Tool (G-TRUST). A modified Delphi process was used to obtain consensus of experts and guideline developers regarding a checklist of items and their relative impact on guideline quality. We conducted 4 rounds of sampling to refine wording, add and subtract items, and develop a scoring system. Multiple attribute utility analysis was used to develop a weighted utility score for each item to determine scoring. Twenty-two experts in evidence-based medicine, 17 developers of high-quality guidelines, and 1 consumer representative participated. In rounds 1 and 2, items were rewritten or dropped, and 2 items were added. In round 3, weighted scores were calculated from rankings and relative weights assigned by the expert panel. In the last round, more than 75% of experts indicated 3 of the 8 checklist items to be major indicators of guideline usefulness and, using the AGREE tool as a reference standard, a scoring system was developed to identify guidelines as useful, may not be useful, and not useful. The 8-item G-TRUST is potentially helpful as a tool for clinicians to identify useful guidelines. Further research will focus on its reliability when used by clinicians. © 2017 Annals of Family Medicine, Inc.
NASA Astrophysics Data System (ADS)
Yates, Gregory C. R.; Chandler, Margaret
2000-12-01
Is belief in the paranormal alive and well within preservice teachers? In this survey 232 undergraduates (including 202 preservice primary teachers) were asked to react to a series of eight statements reflecting paranormal New Age beliefs rated earlier by a faculty panel as “totally unbelievable.” Overall, the students' modal response was expressed as “no particular opinion,” although for five of the eight items the modal respónse was “slightly believable.” It was found that only four students within the sample consistently rejected all eight statements. The frequency of ‘believers’ outnumbered the ‘skeptics’ in relation to three items (beliefs in UFOs, psychic seances, and Nostradamus). New Age factor scores were not significantly related to undergraduate faculty or year level, to holding anti-scientific beliefs or to a measure of TV-viewing, and did not correlate significantly with the personality scale Need for Cognition. Females evidenced higher New Age scores than males, but attitudes to science were unrelated to gender.
Implications of Changing Answers on Objective Test Items
ERIC Educational Resources Information Center
Mueller, Daniel J.; Wasser, Virginia
1977-01-01
Eighteen studies of the effects of changing initial answers to objective test items are reviewed. While students throughout the total test score range tended to gain more points than they lost, higher scoring students gain more than did lower scoring students. Suggestions for further research are made. (Author/JKS)
Late life changes in mental health: a longitudinal study of 9683 women.
Leigh, Lucy; Byles, Julie E; Chojenta, Catherine; Pachana, Nancy A
2016-10-01
To identify latent subgroups of women in late life who are alike in terms of their mental health trajectories. Longitudinal data are for 9683 participants in the 1921-1926 cohort of the Australian Longitudinal Study on Women's Health, who completed at least two surveys between 1999 (aged 73-78 years) and 2008 (aged 82-87 years). Mental health was measured using the five-item mental health inventory (MHI-5). Latent profile analysis uncovered patterns of change in MHI-5 scores. Three patterns of change were identified for women who were still alive in 2008 (n = 7061), and three similar patterns for deceased women (n = 2622): (1) 'poor mental health' representing women with low MHI-5 scores, (2) 'good mental health' and (3) 'excellent' mental health, where scores remained very high. Deceased women had lower mental health scores for each class. Remote areas of residence, higher education, single marital status, higher Body Mass Index (BMI) and falls were the covariates associated with mental health in the survivor group. For the deceased group, education, BMI and falls were significant. Arthritis, stroke, heart disease, bronchitis/emphysema, diabetes and osteoporosis were associated with worse mental health for both groups, while asthma increased these odds significantly for the survivor group only. Hypertension and cancer were not significant predictors of poor mental health. The results show associations between chronic disease and level of mental health in older age, but no evidence of a large decline in mental health in the period prior to death.
Koydemir, Selda; Demir, Ayhan
2007-06-01
The purpose of the study was to report initial data on the psychometric properties of the Brief Fear of Negative Evaluation Scale. The scale was applied to a nonclinical sample of 250 (137 women, 113 men) Turkish undergraduate students selected randomly from Middle East Technical University. Their mean age was 20.4 yr. (SD= 1.9). The factor structure of the Turkish version, its criterion validity, and internal reliability coefficients were assessed. Although maximum likelihood factor analysis initially indicated that the scale had only one factor, a forced two-factor solution accounted for more variance (61%) in scale scores than a single factor. The straightforward items loaded on the first factor, and the reverse-coded items loaded on the second factor. The total score was significantly positively correlated with scores on the Revised Cheek and Buss Shyness Scale and significantly negatively correlated with scores on the Rosenberg Self-Esteem Scale. Factor 1 (straightforward items) correlated more highly with both Shyness and Self-esteem than Factor 2 (reverse-coded items). Internal consistency estimate was .94 for the Total scores, .91 for the Factor 1 (straightforward items), and .87 for the Factor 2 (reverse-coded items). No sex differences were evident for Fear of Negative Evaluation.
Constantine, Melissa L; Pauls, Rachel N; Rogers, Rebecca R; Rockwood, Todd H
2017-12-01
The Prolapse/Incontinence Sexual Questionnaire-International Urogynecology Association (IUGA) Revised (PISQ-IR) measures sexual function in women with pelvic floor disorders (PFDs) yet is unwieldy, with six individual subscale scores for sexually active women and four for women who are not. We hypothesized that a valid and responsive summary score could be created for the PISQ-IR. Item response data from participating women who completed a revised version of the PISQ-IR at three clinical sites were used to generate item weights using a magnitude estimation (ME) and Q-sort (Q) approaches. Item weights were applied to data from the original PISQ-IR validation to generate summary scores. Correlation and factor analysis methods were used to evaluate validity and responsiveness of summary scores. Weighted and nonweighted summary scores for the sexually active PISQ-IR demonstrated good criterion validity with condition-specific measures: Incontinence Severity Index = 0.12, 0.11, 0.11; Pelvic Floor Distress Inventory-20 = 0.39, 0.39, 0.12; Epidemiology of Prolapse and Incontinence Questionnaire-Q35 = 0.26 0,.25, 0.40); Female Sexual Functioning Index subscale total score = 0.72, 0.75, 0.72 for nonweighted, ME, and Q summary scores, respectively. Responsiveness evaluation showed weighted and nonweighted summary scores detected moderate effect sizes (Cohen's d > 0.5). Weighted items for those NSA demonstrated significant floor effects and did not meet criterion validity. A PISQ-IR summary score for use with sexually active women, nonweighted or calculated with ME or Q item weights, is a valid and reliable measure for clinical use. The summary scores provide value for assesing clinical treatment of pelvic floor disorders.
Social competence: evaluation of assertiveness in Spanish adolescents.
Castedo, Antonio López; Juste, Margarita Pino; Alonso, José Domínguez
2015-02-01
Relations between assertiveness in adolescents' social behavior and demographic variables were assessed in 4,943 Spanish adolescents, ages 12 to 17 years, enrolled in 32 schools for Compulsory Secondary Education. Province of residence, school size, age, grade, and academic focus were statistically significant sources of variance in assertiveness scores. All effects were small. Patterns in responses indicate the items should be reviewed to improve the measure for adolescents, and as a tool for addressing teens' social competence in real life situations.
Vaccarino, Anthony L.; Anderson, Karen; Borowsky, Beth; Duff, Kevin; Giuliano, Joseph; Guttman, Mark; Ho, Aileen K.; Orth, Michael; Paulsen, Jane S.; Sills, Terrence; van Kammen, Daniel P.; Evans, Kenneth R.
2011-01-01
Although the Unified Huntington's Disease Rating Scale (UHDRS) is widely used in the assessment of Huntington disease (HD), the ability of individual items to discriminate individual differences in motor or behavioral manifestations has not been extensively studied in HD gene expansion carriers without a motor-defined clinical diagnosis (i.e., prodromal-HD or prHD). To elucidate the relationship between scores on individual motor and behavioral UHDRS items and total score for each subscale, a non-parametric item response analysis was performed on retrospective data from two multicentre, longitudinal studies. Motor and Behavioral assessments were supplied for 737 prHD individuals with data from 2114 visits (PREDICT-HD) and 686 HD individuals with data from 1482 visits (REGISTRY). Option characteristic curves were generated for UHDRS subscale items in relation to their subscale score. In prHD, overall severity of motor signs was low and participants had scores of 2 or above on very few items. In HD, motor items that assessed ocular pursuit, saccade initiation, finger tapping, tandem walking, and to a lesser extent saccade velocity, dysarthia, tongue protrusion, pronation/supination, Luria, bradykinesia, choreas, gait and balance on the retropulsion test were found to discriminate individual differences across a broad range of motor severity. In prHD, depressed mood, anxiety, and irritable behavior demonstrated good discriminative properties. In HD, depressed mood demonstrated a good relationship with the overall behavioral score. These data suggest that at least some UHDRS items appear to have utility across a broad range of severity, although many items demonstrate problematic features. PMID:21370269
Vaccarino, Anthony L; Anderson, Karen; Borowsky, Beth; Duff, Kevin; Giuliano, Joseph; Guttman, Mark; Ho, Aileen K; Orth, Michael; Paulsen, Jane S; Sills, Terrence; van Kammen, Daniel P; Evans, Kenneth R
2011-04-01
Although the Unified Huntington's Disease Rating Scale (UHDRS) is widely used in the assessment of Huntington disease (HD), the ability of individual items to discriminate individual differences in motor or behavioral manifestations has not been extensively studied in HD gene expansion carriers without a motor-defined clinical diagnosis (ie, prodromal-HD or prHD). To elucidate the relationship between scores on individual motor and behavioral UHDRS items and total score for each subscale, a nonparametric item response analysis was performed on retrospective data from 2 multicenter longitudinal studies. Motor and behavioral assessments were supplied for 737 prHD individuals with data from 2114 visits (PREDICT-HD) and 686 HD individuals with data from 1482 visits (REGISTRY). Option characteristic curves were generated for UHDRS subscale items in relation to their subscale score. In prHD, overall severity of motor signs was low, and participants had scores of 2 or above on very few items. In HD, motor items that assessed ocular pursuit, saccade initiation, finger tapping, tandem walking, and to a lesser extent, saccade velocity, dysarthria, tongue protrusion, pronation/supination, Luria, bradykinesia, choreas, gait, and balance on the retropulsion test were found to discriminate individual differences across a broad range of motor severity. In prHD, depressed mood, anxiety, and irritable behavior demonstrated good discriminative properties. In HD, depressed mood demonstrated a good relationship with the overall behavioral score. These data suggest that at least some UHDRS items appear to have utility across a broad range of severity, although many items demonstrate problematic features. Copyright © 2011 Movement Disorder Society.
Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C
2017-02-01
The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.
Martínez-González, Miguel Angel; García-Arellano, Ana; Toledo, Estefanía; Salas-Salvadó, Jordi; Buil-Cosiales, Pilar; Corella, Dolores; Covas, Maria Isabel; Schröder, Helmut; Arós, Fernando; Gómez-Gracia, Enrique; Fiol, Miquel; Ruiz-Gutiérrez, Valentina; Lapetra, José; Lamuela-Raventos, Rosa Maria; Serra-Majem, Lluís; Pintó, Xavier; Muñoz, Miguel Angel; Wärnberg, Julia; Ros, Emilio; Estruch, Ramón
2012-01-01
Independently of total caloric intake, a better quality of the diet (for example, conformity to the Mediterranean diet) is associated with lower obesity risk. It is unclear whether a brief dietary assessment tool, instead of full-length comprehensive methods, can also capture this association. In addition to reduced costs, a brief tool has the interesting advantage of allowing immediate feedback to participants in interventional studies. Another relevant question is which individual items of such a brief tool are responsible for this association. We examined these associations using a 14-item tool of adherence to the Mediterranean diet as exposure and body mass index, waist circumference and waist-to-height ratio (WHtR) as outcomes. Cross-sectional assessment of all participants in the "PREvención con DIeta MEDiterránea" (PREDIMED) trial. 7,447 participants (55-80 years, 57% women) free of cardiovascular disease, but with either type 2 diabetes or ≥ 3 cardiovascular risk factors. Trained dietitians used both a validated 14-item questionnaire and a full-length validated 137-item food frequency questionnaire to assess dietary habits. Trained nurses measured weight, height and waist circumference. Strong inverse linear associations between the 14-item tool and all adiposity indexes were found. For a two-point increment in the 14-item score, the multivariable-adjusted differences in WHtR were -0.0066 (95% confidence interval, -0.0088 to -0.0049) for women and -0.0059 (-0.0079 to -0.0038) for men. The multivariable-adjusted odds ratio for a WHtR>0.6 in participants scoring ≥ 10 points versus ≤ 7 points was 0.68 (0.57 to 0.80) for women and 0.66 (0.54 to 0.80) for men. High consumption of nuts and low consumption of sweetened/carbonated beverages presented the strongest inverse associations with abdominal obesity. A brief 14-item tool was able to capture a strong monotonic inverse association between adherence to a good quality dietary pattern (Mediterranean diet) and obesity indexes in a population of adults at high cardiovascular risk.
Martínez-González, Miguel Angel; García-Arellano, Ana; Toledo, Estefanía; Salas-Salvadó, Jordi; Buil-Cosiales, Pilar; Corella, Dolores; Covas, Maria Isabel; Schröder, Helmut; Arós, Fernando; Gómez-Gracia, Enrique; Fiol, Miquel; Ruiz-Gutiérrez, Valentina; Lapetra, José; Lamuela-Raventos, Rosa Maria; Serra-Majem, Lluís; Pintó, Xavier; Muñoz, Miguel Angel; Wärnberg, Julia; Ros, Emilio; Estruch, Ramón
2012-01-01
Objective Independently of total caloric intake, a better quality of the diet (for example, conformity to the Mediterranean diet) is associated with lower obesity risk. It is unclear whether a brief dietary assessment tool, instead of full-length comprehensive methods, can also capture this association. In addition to reduced costs, a brief tool has the interesting advantage of allowing immediate feedback to participants in interventional studies. Another relevant question is which individual items of such a brief tool are responsible for this association. We examined these associations using a 14-item tool of adherence to the Mediterranean diet as exposure and body mass index, waist circumference and waist-to-height ratio (WHtR) as outcomes. Design Cross-sectional assessment of all participants in the “PREvención con DIeta MEDiterránea” (PREDIMED) trial. Subjects 7,447 participants (55–80 years, 57% women) free of cardiovascular disease, but with either type 2 diabetes or ≥3 cardiovascular risk factors. Trained dietitians used both a validated 14-item questionnaire and a full-length validated 137-item food frequency questionnaire to assess dietary habits. Trained nurses measured weight, height and waist circumference. Results Strong inverse linear associations between the 14-item tool and all adiposity indexes were found. For a two-point increment in the 14-item score, the multivariable-adjusted differences in WHtR were −0.0066 (95% confidence interval, –0.0088 to −0.0049) for women and –0.0059 (–0.0079 to –0.0038) for men. The multivariable-adjusted odds ratio for a WHtR>0.6 in participants scoring ≥10 points versus ≤7 points was 0.68 (0.57 to 0.80) for women and 0.66 (0.54 to 0.80) for men. High consumption of nuts and low consumption of sweetened/carbonated beverages presented the strongest inverse associations with abdominal obesity. Conclusions A brief 14-item tool was able to capture a strong monotonic inverse association between adherence to a good quality dietary pattern (Mediterranean diet) and obesity indexes in a population of adults at high cardiovascular risk. PMID:22905215
A PROMIS Measure of Neuropathic Pain Quality
Askew, Robert L.; Cook, Karon F.; Keefe, Francis J.; Nowinski, Cindy J; Cella, David; Revicki, Dennis A.; DeWitt, Esi M. Morgan; Michaud, Kaleb; Trence, Dace L.; Amtmann, Dagmar
2016-01-01
Objectives Neuropathic pain is a consequence of many chronic conditions. This study aimed to develop a unidimensional neuropathic pain scale whose scores represent levels of neuropathic pain and distinguish between individuals with neuropathic and non-neuropathic pain conditions. Methods A candidate item pool of 42 pain quality descriptors was administered to participants with osteoarthritis, rheumatoid arthritis, diabetic neuropathy, and cancer chemotherapy-induced peripheral neuropathy. A subset of pain quality descriptors (items) that best distinguished between participants with and those without neuropathic pain conditions were identified. Dimensionality of pain descriptors was evaluated in a development sample and cross-validated in a hold-out sample. Item responses were calibrated using an item response theory model, and scores were generated on a T-score metric. Neuropathic pain scale scores were evaluated in terms of reliability, validity, and the ability to distinguish between participants with and without conditions typically associated with neuropathic pain. Results Of the 42 initial items, 5 were identified for the Patient Reported Outcome Measurement Information System (PROMIS) Neuropathic Pain Quality scale (PROMIS-PQ-Neuro). The IRT-generated T-scores exhibited good discriminatory ability based on receiver operator characteristic analysis. Score thresholds were identified that optimize sensitivity and specificity. Construct, criterion, and discriminant validity, and reliability of scale scores were supported. Conclusions The 5-item PROMIS PQ-Neuro is a short and practical measure that can be used to identify patients more likely to have neuropathic pain and to distinguish levels of neuropathic pain. The data collected will support future research that targets other unidimensional pain quality domains (e.g., nociceptive pain). PMID:27565279
Flens, Gerard; Smits, Niels; Terwee, Caroline B; Dekker, Joost; Huijbrechts, Irma; de Beurs, Edwin
2017-03-01
We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample ( N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development.
Dietary patterns and household food insecurity in rural populations of Kilosa district, Tanzania.
Ntwenya, Julius Edward; Kinabo, Joyce; Msuya, John; Mamiro, Peter; Majili, Zahara Saidi
2015-01-01
Few studies have investigated the relationship between dietary pattern and household food insecurity. The objective of the present analysis was to describe the food consumption patterns and to relate these with the prevalence of food insecurity in the context of a rural community. Three hundred and seven (307) randomly selected households in Kilosa district participated in the study. Data were collected during the rainy season (February-May) and post harvest season (September-October) in the year 2011. Food consumption pattern was determined using a 24-h dietary recall method. Food insecurity data were based on the 30 day recall experience to food insecurity in the household. Factor analysis method using Principal Components extraction function was used to derive the dietary patterns and correlation analysis was used to establish the existing relationship between household food insecurity and dietary patterns factor score. Four food consumption patterns namely (I) Meat and milk; (II) Pulses, legumes, nuts and cooking oils; (III) fish (and other sea foods), roots and tubers; (IV) Cereals, vegetables and fruits consumption patterns were identified during harvest season. Dietary patterns identified during the rainy season were as follows: (I) Fruits, cooking oils, fats, roots and tubers (II) Eggs, meat, milk and milk products (III) Fish, other sea foods, vegetables, roots and tubers and (IV) Pulses, legumes, nuts, cereals and vegetables. Household food insecurity was 80% and 69% during rainy and harvest-seasons, respectively (P = 0.01). Household food insecurity access scale score was negatively correlated with the factor scores on household dietary diversity. Food consumption patterns and food insecurity varied by seasons with worst scenarios most prevalent during the rainy season. The risk for inadequate dietary diversity was higher among food insecure households compared to food secure households. Effort geared at alleviating household food insecurity could contribute to consumption of a wide range of food items at the household level.
Validation of the Dutch language version of the Safety Attitudes Questionnaire (SAQ-NL).
Haerkens, Marck Htm; van Leeuwen, Wouter; Sexton, J Bryan; Pickkers, Peter; van der Hoeven, Johannes G
2016-08-15
As the first objective of caring for patients is to do no harm, patient safety is a priority in delivering clinical care. An essential component of safe care in a clinical department is its safety climate. Safety climate correlates with safety-specific behaviour, injury rates, and accidents. Safety climate in healthcare can be assessed by the Safety Attitudes Questionnaire (SAQ), which provides insight by scoring six dimensions: Teamwork Climate, Job Satisfaction, Safety Climate, Stress Recognition, Working Conditions and Perceptions of Management. The objective of this study was to assess the psychometric properties of the Dutch language version of the SAQ in a variety of clinical departments in Dutch hospitals. The Dutch version (SAQ-NL) of the SAQ was back translated, and analyzed for semantic characteristics and content. From October 2010 to November 2015 SAQ-NL surveys were carried out in 17 departments in two university and seven large non-university teaching hospitals in the Netherlands, prior to a Crew Resource Management human factors intervention. Statistical analyses were used to examine response patterns, mean scores, correlations, internal consistency reliability and model fit. Cronbach's α's and inter-item correlations were calculated to examine internal consistency reliability. One thousand three hundred fourteen completed questionnaires were returned from 2113 administered to health care workers, resulting in a response rate of 62 %. Confirmatory Factor Analysis revealed the 6-factor structure fit the data adequately. Response patterns were similar for professional positions, departments, physicians and nurses, and university and non-university teaching hospitals. The SAQ-NL showed strong internal consistency (α = .87). Exploratory analysis revealed differences in scores on the SAQ dimensions when comparing different professional positions, when comparing physicians to nurses and when comparing university to non-university hospitals. The SAQ-NL demonstrated good psychometric properties and is therefore a useful instrument to measure patient safety climate in Dutch clinical work settings. As removal of one item resulted in an increased reliability of the Working Conditions dimension, revision or deletion of this item should be considered. The results from this study provide researchers and practitioners with insight into safety climate in a variety of departments and functional positions in Dutch hospitals.
Saudek, Kris; Treat, Robert
2015-01-01
Purpose At our institution, speculation amongst medical students and faculty exists as to whether team-based learning (TBL) can improve scores on high-stakes examinations over traditional didactic lectures. Faculty with experience using TBL developed and piloted a required TBL blood disorders (BD) module for third-year medical students on their pediatric clerkship. The purpose of this study is to analyze the BD scores from the NBME subject exams before and after the introduction of the module. Methods We analyzed institutional and national item difficulties for BD items from the NBME pediatrics content area item analysis reports from 2011 to 2014 before (pre) and after (post) the pilot (October 2012). Total scores of 590 NBME subject examination students from examinee performance profiles were analyzed pre/post. t-Tests and Cohen's d effect sizes were used to analyze item difficulties for institutional versus national scores and pre/post comparisons of item difficulties and total scores. Results BD scores for our institution were 0.65 (±0.19) compared to 0.62 (±0.15) nationally (P=0.346; Cohen's d=0.15). The average of post-consecutive BD scores for our students was 0.70(±0.21) compared to examinees nationally [0.64 (±0.15)] with a significant mean difference (P=0.031; Cohen's d=0.43). The difference in our institutions pre [0.65 (±0.19)] and post [0.70 (±0.21)] BD scores trended higher (P=0.391; Cohen's d=0.27). Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms. Conclusions Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms.
Measuring adolescent science motivation
NASA Astrophysics Data System (ADS)
Schumm, Maximiliane F.; Bogner, Franz X.
2016-02-01
To monitor science motivation, 232 tenth graders of the college preparatory level ('Gymnasium') completed the Science Motivation Questionnaire II (SMQ-II). Additionally, personality data were collected using a 10-item version of the Big Five Inventory. A subsequent exploratory factor analysis based on the eigenvalue-greater-than-one criterion, extracted a loading pattern, which in principle, followed the SMQ-II frame. Two items were dropped due to inappropriate loadings. The remaining SMQ-II seems to provide a consistent scale matching the findings in literature. Nevertheless, also possible shortcomings of the scale are discussed. Data showed a higher perceived self-determination in girls which seems compensated by their lower self-efficacy beliefs leading to equality of females and males in overall science motivation scores. Additionally, the Big Five personality traits and science motivation components show little relationship.
Characterizing Sources of Uncertainty in Item Response Theory Scale Scores
ERIC Educational Resources Information Center
Yang, Ji Seung; Hansen, Mark; Cai, Li
2012-01-01
Traditional estimators of item response theory scale scores ignore uncertainty carried over from the item calibration process, which can lead to incorrect estimates of the standard errors of measurement (SEMs). Here, the authors review a variety of approaches that have been applied to this problem and compare them on the basis of their statistical…
Global, Local, and Graphical Person-Fit Analysis Using Person-Response Functions
ERIC Educational Resources Information Center
Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R.
2005-01-01
Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the…
Sex Differences in the Tendency to Omit Items on Multiple-Choice Tests: 1980-2000
ERIC Educational Resources Information Center
von Schrader, Sarah; Ansley, Timothy
2006-01-01
Much has been written concerning the potential group differences in responding to multiple-choice achievement test items. This discussion has included references to possible disparities in tendency to omit such test items. When test scores are used for high-stakes decision making, even small differences in scores and rankings that arise from male…
The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring
ERIC Educational Resources Information Center
Kane, Michael; Moloney, James
1978-01-01
The answer-until-correct (AUC) procedure requires that examinees respond to a multi-choice item until they answer it correctly. Using a modified version of Horst's model for examinee behavior, this paper compares the effect of guessing on item reliability for the AUC procedure and the zero-one scoring procedure. (Author/CTM)
Measurement Error in Nonparametric Item Response Curve Estimation. Research Report. ETS RR-11-28
ERIC Educational Resources Information Center
Guo, Hongwen; Sinharay, Sandip
2011-01-01
Nonparametric, or kernel, estimation of item response curve (IRC) is a concern theoretically and operationally. Accuracy of this estimation, often used in item analysis in testing programs, is biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. In this study, we investigate…
ERIC Educational Resources Information Center
Daniels, Vijay J.; Bordage, Georges; Gierl, Mark J.; Yudkowsky, Rachel
2014-01-01
Objective structured clinical examinations (OSCEs) are used worldwide for summative examinations but often lack acceptable reliability. Research has shown that reliability of scores increases if OSCE checklists for medical students include only clinically relevant items. Also, checklists are often missing evidence-based items that high-achieving…
Sensitivity of Equated Aggregate Scores to the Treatment of Misbehaving Common Items
ERIC Educational Resources Information Center
Michaelides, Michalis P.
2010-01-01
The delta-plot method (Angoff, 1972) is a graphical technique used in the context of test equating for identifying common items with aberrant changes in their item difficulties across administrations or alternate forms. This brief research report explores the effects on equated aggregate scores when delta-plot outliers are either retained in or…
Balachandran, Jay S.; Yu, Xiaohong; Wroblewski, Kristen; Mokhlesi, Babak
2013-01-01
Background: CPAP adherence patterns are often established very early in the course of therapy. Our objective was to quantify patients' perception of CPAP therapy using a 6-item questionnaire administered in the morning following CPAP titration. We hypothesized that questionnaire responses would independently predict CPAP adherence during the first 30 days of therapy. Methods: We retrospectively reviewed the CPAP perception questionnaires of 403 CPAP-naïve adults who underwent in-laboratory titration and who had daily CPAP adherence data available for the first 30 days of therapy. Responses to the CPAP perception questionnaire were analyzed for their association with mean CPAP adherence and with changes in daily CPAP adherence over 30 days. Results: Patients were aged 52 ± 14 years, 53% were women, 54% were African American, the mean body mass index (BMI) was 36.3 ± 9.1 kg/m2, and most patients had moderate-severe OSA. Four of 6 items from the CPAP perception questionnaire— regarding difficulty tolerating CPAP, discomfort with CPAP pressure, likelihood of wearing CPAP, and perceived health benefit—were significantly correlated with mean 30-day CPAP adherence, and a composite score from these 4 questions was found to be internally consistent. Stepwise linear regression modeling demonstrated that 3 variables were significant and independent predictors of reduced mean CPAP adherence: worse score on the 4-item questionnaire, African American race, and non-sleep specialist ordering polysomnogram and CPAP therapy. Furthermore, a worse score on the 4-item CPAP perception questionnaire was consistently associated with decreased mean daily CPAP adherence over the first 30 days of therapy. Conclusions: In this pilot study, responses to a 4-item CPAP perception questionnaire administered to patients immediately following CPAP titration independently predicted mean CPAP adherence during the first 30 days. Further prospective validation of this questionnaire in different patient populations is warranted. Commentary: A commentary on this article appears in this issue on page 207. Citation: Balachandran JS; Yu X; Wroblewski K; Mokhlesi B. A brief survey of patients' first impression after CPAP titration predicts future CPAP adherence: a pilot study. J Clin Sleep Med 2013;9(3):199-205. PMID:23493772
Schalet, Benjamin D; Rothrock, Nan E; Hays, Ron D; Kazis, Lewis E; Cook, Karon F; Rutsohn, Joshua P; Cella, David
2015-10-01
Global health measures represent an attractive option for researchers and clinicians seeking a brief snapshot of a patient's overall perspective on his or her health. Because scores on different global health measures are not comparable, comparative effectiveness research (CER) is challenging. To establish a common reporting metric so that the physical and mental health scores on the Veterans RAND 12-Item Health Survey (VR-12 (©) ) can be converted into scores on the corresponding Patient Reported Outcomes Measurement Information System (PROMIS(®)) Global Health scores. Following a single-sample linking design, participants from an Internet panel completed items from the PROMIS Global Health and VR-12 Health Survey. A common metric was created using analyses based on item response theory (IRT), producing score cross-walk tables for the mental and physical health components of each measure. The linking relationships were evaluated by calculating the standard deviation of differences between the observed and linked PROMIS scores and estimating confidence intervals by sample size. Participants (N = 2025) were 49 % male and 73 % white; mean age was 46 years. Mental and physical health subscales of the PROMIS Global Health and the VR-12. The mean VR-12 physical component and mental component scores were 45.2 and 46.6, respectively; the mean PROMIS physical and mental health scores were 48.3 and 48.5, respectively. We found evidence that the combined set of VR-12 and PROMIS items were relatively unidimensional and that we could proceed with linking. Linking worked better between the physical health than mental health scores using VR-12 item responses (vs. linking based on algorithmic scores). For each of the cross-walks, users can minimize the impact of linking error with modest increases in sample sizes. VR-12 scores can be expressed on the PROMIS Global Health metric to facilitate the evaluation of treatment, including CER. Extending these results to other common measures of global health is encouraged.
Harris, Paul B; Houston, John M; Vazquez, Jose A; Smither, Janan A; Harms, Amanda; Dahlke, Jeffrey A; Sachau, Daniel A
2014-11-01
Surveys of 1217 undergraduate students supported the reliability (inter-item and test-retest) and validity of the Prosocial and Aggressive Driving Inventory (PADI). Principal component analyses on the PADI items yielded two scales: Prosocial Driving (17 items) and Aggressive Driving (12 items). Prosocial Driving was associated with fewer reported traffic accidents and violations, with participants who were older and female, and with lower Boredom Susceptibility and Hostility scores, and higher scores on Agreeableness, Conscientiousness, Openness, and Neuroticism. Aggressive Driving was associated with more frequent traffic violations, with female participants, and with higher scores on Competitiveness, Sensation Seeking, Hostility, and Extraversion, and lower scores on Conscientiousness, Agreeableness, and Openness. The theoretical and practical implications of the PADI's dual focus on safe and unsafe driving are discussed. Copyright © 2014 Elsevier Ltd. All rights reserved.
Development of a brachytherapy audit checklist tool.
Prisciandaro, Joann; Hadley, Scott; Jolly, Shruti; Lee, Choonik; Roberson, Peter; Roberts, Donald; Ritter, Timothy
2015-01-01
To develop a brachytherapy audit checklist that could be used to prepare for Nuclear Regulatory Commission or agreement state inspections, to aid in readiness for a practice accreditation visit, or to be used as an annual internal audit tool. Six board-certified medical physicists and one radiation oncologist conducted a thorough review of brachytherapy-related literature and practice guidelines published by professional organizations and federal regulations. The team members worked at two facilities that are part of a large, academic health care center. Checklist items were given a score based on their judged importance. Four clinical sites performed an audit of their program using the checklist. The sites were asked to score each item based on a defined severity scale for their noncompliance, and final audit scores were tallied by summing the products of importance score and severity score for each item. The final audit checklist, which is available online, contains 83 items. The audit scores from the beta sites ranged from 17 to 71 (out of 690) and identified a total of 7-16 noncompliance items. The total time to conduct the audit ranged from 1.5 to 5 hours. A comprehensive audit checklist was developed which can be implemented by any facility that wishes to perform a program audit in support of their own brachytherapy program. The checklist is designed to allow users to identify areas of noncompliance and to prioritize how these items are addressed to minimize deviations from nationally-recognized standards. Copyright © 2015 American Brachytherapy Society. All rights reserved.
Arda, Ersan; Cakiroglu, Basri; Tas, Tuncay; Ekici, Sinan; Uyanik, Bekir Sami
2016-11-01
To determine the positive subdomain numbers and distribution of the UPOINT classification in chronic prostatitis and to compare the erectile dysfunction (ED) pattern. From 2008 to 2013, 839 patients with symptomatic chronic prostatitis or chronic pelvic pain syndrome were included in this study. The correlation between UPOINT domains and National Institutes of Health chronic prostatitis symptom index (NIH-CPSI) total score, subscores, and the 5-item International Index of Erectile Function scores were evaluated retrospectively. The mean patient age was calculated as 37.7 ± 7.4 (range 21-65). The average total NIH-CPSI score was determined as 9.07 (range 1-40) and the average positive UPOINT subdomain number was determined as 2.87 ± 0.32 (range 1-6). Subdomain patient numbers and rates were calculated as 529 urinary (63%), 462 psychosocial (55%), 382 organ specific (45%), 290 infection (34%), 288 neurological or systemic (34%), and 418 tenderness (skeletal muscle) (50%), respectively. It was determined that ED, determining the subdomain of sexual dysfunction in patients, was positive in a total of 326 (39.9%) patients, with 220 patients having mild (26.2%), 76 mild to moderate (9.1%), 19 moderate (2.3%), and 5 with severe (0.6%) ED. A statistically significant correlation was not determined between the 5-item International Index of Erectile Function score and UPOINT subdomain number and NIH-CPSI score. It has been determined that although there is a strong and significant correlation between UPOINT classification and NIH-CPSI score in Turkish patients with chronic prostatitis or chronic pelvic pain syndrome, the inclusion of ED as an independent subdomain to the UPOINT classification is not statistically significant. Copyright © 2016 Elsevier Inc. All rights reserved.
Walter, Emily M; Henderson, Charles R; Beach, Andrea L; Williams, Cody T
Researchers, administrators, and policy makers need valid and reliable information about teaching practices. The Postsecondary Instructional Practices Survey (PIPS) is designed to measure the instructional practices of postsecondary instructors from any discipline. The PIPS has 24 instructional practice statements and nine demographic questions. Users calculate PIPS scores by an intuitive proportion-based scoring convention. Factor analyses from 72 departments at four institutions (N = 891) support a 2- or 5-factor solution for the PIPS; both models include all 24 instructional practice items and have good model fit statistics. Factors in the 2-factor model include (a) instructor-centered practices, nine items; and (b) student-centered practices, 13 items. Factors in the 5-factor model include (a) student-student interactions, six items; (b) content delivery, four items; (c) formative assessment, five items; (d) student-content engagement, five items; and (e) summative assessment, four items. In this article, we describe our development and validation processes, provide scoring conventions and outputs for results, and describe wider applications of the instrument. © 2016 E. M. Walter et al. CBE—Life Sciences Education © 2016 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
Liegl, Gregor; Wahl, Inka; Berghöfer, Anne; Nolte, Sandra; Pieh, Christoph; Rose, Matthias; Fischer, Felix
2016-03-01
To investigate the validity of a common depression metric in independent samples. We applied a common metrics approach based on item-response theory for measuring depression to four German-speaking samples that completed the Patient Health Questionnaire (PHQ-9). We compared the PHQ item parameters reported for this common metric to reestimated item parameters that derived from fitting a generalized partial credit model solely to the PHQ-9 items. We calibrated the new model on the same scale as the common metric using two approaches (estimation with shifted prior and Stocking-Lord linking). By fitting a mixed-effects model and using Bland-Altman plots, we investigated the agreement between latent depression scores resulting from the different estimation models. We found different item parameters across samples and estimation methods. Although differences in latent depression scores between different estimation methods were statistically significant, these were clinically irrelevant. Our findings provide evidence that it is possible to estimate latent depression scores by using the item parameters from a common metric instead of reestimating and linking a model. The use of common metric parameters is simple, for example, using a Web application (http://www.common-metrics.org) and offers a long-term perspective to improve the comparability of patient-reported outcome measures. Copyright © 2016 Elsevier Inc. All rights reserved.
Wiggins Content Scales and the MMPI-2.
Kohutek, K J
1992-03-01
The omission of the Wiggins Content Scales occurred because of the number of items deleted as well as the addition of items to the MMPI-2. The purpose of this study is to compare scorings of the items on the Wiggins Scales of the MMPI and the items that remain on these scales on the MMPI-2. The scales of Religious Fundamentalism and Authority Conflict appear to be those most seriously affected by the item change on the MMPI-2. The scales Depression and Family Conflict maintained all of their items, and the remaining nine were not found to be statistically different when the two scorings were compared.
Terwee, Caroline B; Mokkink, Lidwine B; Knol, Dirk L; Ostelo, Raymond W J G; Bouter, Lex M; de Vet, Henrica C W
2012-05-01
The COSMIN checklist is a standardized tool for assessing the methodological quality of studies on measurement properties. It contains 9 boxes, each dealing with one measurement property, with 5-18 items per box about design aspects and statistical methods. Our aim was to develop a scoring system for the COSMIN checklist to calculate quality scores per measurement property when using the checklist in systematic reviews of measurement properties. The scoring system was developed based on discussions among experts and testing of the scoring system on 46 articles from a systematic review. Four response options were defined for each COSMIN item (excellent, good, fair, and poor). A quality score per measurement property is obtained by taking the lowest rating of any item in a box ("worst score counts"). Specific criteria for excellent, good, fair, and poor quality for each COSMIN item are described. In defining the criteria, the "worst score counts" algorithm was taken into consideration. This means that only fatal flaws were defined as poor quality. The scores of the 46 articles show how the scoring system can be used to provide an overview of the methodological quality of studies included in a systematic review of measurement properties. Based on experience in testing this scoring system on 46 articles, the COSMIN checklist with the proposed scoring system seems to be a useful tool for assessing the methodological quality of studies included in systematic reviews of measurement properties.
Effect of Item Arrangement, Knowledge of Arrangement, and Test Anxiety on Two Scoring Methods.
ERIC Educational Resources Information Center
Plake, Barbara S.; And Others
1981-01-01
Number right and elimination scores were analyzed on a college level mathematics exam assembled from pretest data. Anxiety measures were administered along with the experimental forms to undergraduates. Results suggest that neither test scores nor attitudes are influenced by item order knowledge thereof, or anxiety level. (Author/GK)
Analysis of Open-Ended Statistics Questions with Many Facet Rasch Model
ERIC Educational Resources Information Center
Güler, Nese
2014-01-01
Problem Statement: The most significant disadvantage of open-ended items that allow the valid measurement of upper level cognitive behaviours, such as synthesis and evaluation, is scoring. The difficulty associated with objectively scoring the answers to the items contributes to the reduction of the reliability of the scores. Moreover, other…
ERIC Educational Resources Information Center
Finch, W. Holmes; Hernández Finch, Maria E.; French, Brian F.
2016-01-01
Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments…
Development and validity of a method for the evaluation of printed education material
Castro, Mauro Silveira; Pilger, Diogo; Fuchs, Flávio Danni; Ferreira, Maria Beatriz Cardoso
Objectives To develop and study the validity of an instrument for evaluation of Printed Education Materials (PEM); to evaluate the use of acceptability indices; to identify possible influences of professional aspects. Methods An instrument for PEM evaluation was developed which included tree steps: domain identification, item generation and instrument design. A reading to easy PEM was developed for education of patient with systemic hypertension and its treatment with hydrochlorothiazide. Construct validity was measured based on previously established errors purposively introduced into the PEM, which served as extreme groups. An acceptability index was applied taking into account the rate of professionals who should approve each item. Participants were 10 physicians (9 men) and 5 nurses (all women). Results Many professionals identified intentional errors of crude character. Few participants identified errors that needed more careful evaluation, and no one detected the intentional error that required literature analysis. Physicians considered as acceptable 95.8% of the items of the PEM, and nurses 29.2%. The differences between the scoring were statistically significant in 27% of the items. In the overall evaluation, 66.6% were considered as acceptable. The analysis of each item revealed a behavioral pattern for each professional group. Conclusions The use of instruments for evaluation of printed education materials is required and may improve the quality of the PEM available for the patients. Not always are the acceptability indices totally correct or represent high quality of information. The professional experience, the practice pattern, and perhaps the gendre of the reviewers may influence their evaluation. An analysis of the PEM by professionals in communication, in drug information, and patients should be carried out to improve the quality of the proposed material. PMID:25214924
A Rasch measure of teachers' views of teacher-student relationships in the primary school.
Leitao, Natalie; Waugh, Russell F
2012-01-01
This study investigated teacher-student relationships from the teachers' point of view at Perth metropolitan schools in Western Australia. The study identified three key social and emotional aspects that affect teacher-student relationships, namely, Connectedness, Availability and Communication. Data were collected by questionnaire (N = 139) with stem-items answered in three perspectives: (1) Idealistic: this is what I would like to happen; (2) Capability: this is what I am capable of; and (3) Behaviour: this is what actually happens, using four ordered response categories: not at all (score 1), some of the time (score 2), most of the time (score 3), and almost always (score 4). Data were analysed with a Rasch measurement model and a uni-dimensional, linear scale with 24 items, ordered from easy to hard, was created. The data were shown to be highly reliable, so that valid inferences could be made from the scale. The Person Separation Index (akin to a reliability index) was 0.93; there was good global teacher and item fit to the measurement model; there was good item fit; the targeting of the item difficulties against the teacher measures was good, and the response categories were answered consistently and logically. Teachers said that the ideal items were all easier than their corresponding capability items which were in turn easier than the behaviour items (where the items fitted the model), as conceptualized. The easiest ideal items were: I like this child and This child and I get along well together. The hardest ideal item (but still easy) was: I am available for this child. The easiest behaviour item (but still hard) was: This child and I get along well together. The hardest behaviour item (and very hard) was: I am interested to learn about this child's personal thoughts, feelings and experiences. The difficulties of the items supported the conceptual structure of the variable.
Is infant feeding pattern associated with father's quality of life?
Chen, Yi Chun; Chie, Wei-Chu; Chang, Pei-Jen; Chuang, Chao-Hua; Lin, Yu-Hsuan; Lin, Shio-Jean; Chen, Pau-Chung
2010-12-01
The aim of this study was to compare the health-related quality of life of fathers under different infant feeding type scenarios. The Medical Outcomes Study 36-item Short-Form was used to measure the health-related quality of life of 1,699 fathers, and the scores were used to look for associations with different infant feeding methods. Multivariable linear regression analysis was used to explore the contribution of the other potential related factors on fathers' quality of life. After controlling for confounding factors, fathers whose infants were ever being breast-fed reported lower scores than fathers whose infants were bottle-fed. Except for the infant feeding pattern, having a job, higher family income, and being the major caregiver were positively related to the father's quality of life. Fathers may not benefit during breast-feeding process. Because fathers' involvement plays an important role in the success of breast-feeding, the development of interventions that enable fathers to support their breast-feeding partner is very important.
Quality and quantity of information in summary basis of decision documents issued by health Canada.
Habibi, Roojin; Lexchin, Joel
2014-01-01
Health Canada's Summary Basis of Decision (SBD) documents outline the clinical trial information that was considered in approving a new drug. We examined the ability of SBDs to inform clinician decision-making. We asked if SBDs answered three questions that clinicians might have prior to prescribing a new drug: 1) Do the characteristics of patients enrolled in trials match those of patients in their practice? 2) What are the details concerning the drug's risks and benefits? 3) What are the basic characteristics of trials? 14 items of clinical trial information were identified from all SBDs published on or before April 2012. Each item received a score of 2 (present), 1 (unclear) or 0 (absent). The unit of analysis was the individual SBD, and an overall SBD score was derived based on the sum of points for each item. Scores were expressed as a percentage of the maximum possible points, and then classified into five descriptive categories based on that score. Additionally, three overall 'component' scores were tallied for each SBD: "patient characteristics", "benefit/risk information" and "basic trial characteristics". 161 documents, spanning 456 trials, were analyzed. The majority (126/161) were rated as having information sometimes present (score of >33 to 66%). No SBDs had either no information on any item, or 100% of the information. Items in the patient characteristics component scored poorest (mean component score of 40.4%), while items corresponding to basic trial information were most frequently provided (mean component score of 71%). The significant omissions in the level of clinical trial information in SBDs provide little to aid clinicians in their decision-making. Clinicians' preferred source of information is scientific knowledge, but in Canada, access to such information is limited. Consequently, we believe that clinicians are being denied crucial tools for decision-making.
NASA Astrophysics Data System (ADS)
Federer, Meghan Rector
Assessment is a key element in the process of science education teaching and research. Understanding sources of performance bias in science assessment is a major challenge for science education reforms. Prior research has documented several limitations of instrument types on the measurement of students' scientific knowledge (Liu et al., 2011; Messick, 1995; Popham, 2010). Furthermore, a large body of work has been devoted to reducing assessment biases that distort inferences about students' science understanding, particularly in multiple-choice [MC] instruments. Despite the above documented biases, much has yet to be determined for constructed response [CR] assessments in biology and their use for evaluating students' conceptual understanding of scientific practices (such as explanation). Understanding differences in science achievement provides important insights into whether science curricula and/or assessments are valid representations of student abilities. Using the integrative framework put forth by the National Research Council (2012), this dissertation aimed to explore whether assessment biases occur for assessment practices intended to measure students' conceptual understanding and proficiency in scientific practices. Using a large corpus of undergraduate biology students' explanations, three studies were conducted to examine whether known biases of MC instruments were also apparent in a CR instrument designed to assess students' explanatory practice and understanding of evolutionary change (ACORNS: Assessment of COntextual Reasoning about Natural Selection). The first study investigated the challenge of interpreting and scoring lexically ambiguous language in CR answers. The incorporation of 'multivalent' terms into scientific discourse practices often results in statements or explanations that are difficult to interpret and can produce faulty inferences about student knowledge. The results of this study indicate that many undergraduate biology majors frequently incorporate multivalent concepts into explanations of change, resulting in explanatory practices that were scientifically non-normative. However, use of follow-up question approaches was found to resolve this source of bias and thereby increase the validity of inferences about student understanding. The second study focused on issues of item and instrument structure, specifically item feature effects and item position effects, which have been shown to influence measures of student performance across assessment tasks. Results indicated that, along the instrument item sequence, items with similar surface features produced greater sequencing effects than sequences of items with dissimilar surface features. This bias could be addressed by use of a counterbalanced design (i.e., Latin Square) at the population level of analysis. Explanation scores were also highly correlated with student verbosity, despite verbosity being an intrinsically trivial aspect of explanation quality. Attempting to standardize student response length was one proposed solution to the verbosity bias. The third study explored gender differences in students' performance on constructed-response explanation tasks using impact (i.e., mean raw scores) and differential item function (i.e., item difficulties) patterns. While prior research in science education has suggested that females tend to perform better on constructed-response items, the results of this study revealed no overall differences in gender achievement. However, evaluation of specific item features patterns suggested that female respondents have a slight advantage on unfamiliar explanation tasks. That is, male students tended to incorporate fewer scientifically normative concepts (i.e., key concepts) than females for unfamiliar taxa. Conversely, females tended to incorporate more scientifically non-normative ideas (i.e., naive ideas) than males for familiar taxa. Together these results indicate that gender achievement differences for this CR instrument may be a result of differences in how males and females interpret and respond to combinations of item features. Overall, the results presented in the subsequent chapters suggest that as science education shifts toward the evaluation of fused scientific knowledge and practice (e.g., explanation), it is essential that educators and researchers investigate potential sources of bias inherent to specific assessment practices. This dissertation revealed significant sources of CR assessment bias, and provided solutions to address these problems.
Item validity vs. item discrimination index: a redundancy?
NASA Astrophysics Data System (ADS)
Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.
2018-03-01
In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
ERIC Educational Resources Information Center
Truell, Allen D.; Zhao, Jensen J.; Alexander, Melody W.
2005-01-01
The purposes of this study were to determine if there is a significant difference in postsecondary business student scores and test completion time based on settable test item exposure control interface format, and to determine if there is a significant difference in student scores and test completion time based on settable test item exposure…
Real Time Cockpit Resource Management (CRM) Training
2010-10-01
to post-test. Table 4 Learning Scores for the Five Spiral 1 Classes Spiral 1 Class Pilots Sensors Pretest Posttest Difference Pretest Posttest ...results from the five Spiral 1 classes. Table 6 Pretest / Posttest Gain Scores Associated with Each Learning Test Item Test Item Class Item...SMALL BUSINESS INNOVATION RESEARCH (SBIR) PHASE II REPORT. Distribution A: Approved for public release; distribution unlimited. (Approval given
ERIC Educational Resources Information Center
Deville, Craig W.; Chalhoub-Deville, Micheline
A study demonstrated the utility of item analyses to investigate which items function well or poorly in a second language reading recall protocol instrument. Data were drawn from a larger study of 56 learners of German as a second language at various proficiency levels. Pausal units of scored recall protocols were analyzed using both classical…
Musculoskeletal disorders among bank office workers in Kuwait.
Akrouf, Q A S; Crawford, J O; Al-Shatti, A S; Kamel, M I
2010-01-01
This cross-sectional observational study assessed the pattern of musculoskeletal disorder (MSDs) suffered by bank office workers in Kuwait. A self-administered validated questionnaire was used that included the Nordic musculoskeletal questionnaire and 12-item general health questionnaire (GHQ12). Of 750 employees, 80% suffered at least 1 episode of MSD during the previous year and 42% suffered at least 1 disabling episode. The most affected body parts were the neck (53.5%), lower back (51.1%), shoulders (49.2%) and upper back (38.4%). Nationality, GHQ12 score, smoking and sex were significant predictors of MSDs during the previous year, while alcohol drinking, marital status, GHQ12 score, years in Kuwait and sex were significant predictors of disabling MSDs during the previous year.
Baik, Sharon H; Fox, Rina S; Mills, Sarah D; Roesch, Scott C; Sadler, Georgia Robins; Klonoff, Elizabeth A; Malcarne, Vanessa L
2017-01-01
This study examined the psychometric properties of the Perceived Stress Scale-10 among 436 community-dwelling Hispanic Americans with English or Spanish language preference. Multigroup confirmatory factor analysis examined the factorial invariance of the Perceived Stress Scale-10 across language groups. Results supported a two-factor model (negative, positive) with equivalent response patterns and item intercepts but different factor covariances across languages. Internal consistency reliability of the Perceived Stress Scale-10 total and subscale scores was good in both language groups. Convergent validity was supported by expected relationships of Perceived Stress Scale-10 scores to measures of anxiety and depression. These results support the use of the Perceived Stress Scale-10 among Hispanic Americans.
Girard, Todd A; Wilkins, Leanne K; Lyons, Kathleen M; Yang, Lixia; Christensen, Bruce K
2018-05-31
Introduction Working-memory (WM) is a core cognitive deficit among individuals with Schizophrenia Spectrum Disorders (SSD). However, the underlying cognitive mechanisms of this deficit are less known. This study applies a modified version of the Corsi Block Test to investigate the role of proactive interference in visuospatial WM (VSWM) impairment in SSD. Methods Healthy and SSD participants completed a modified version of the Corsi Block Test involving both high (typical ascending set size from 4 to 7 items) and low (descending set size from 7 to 4 items) proactive interference conditions. Results The results confirmed that the SSD group performed worse overall relative to a healthy comparison group. More importantly, the SSD group demonstrated greater VSWM scores under low (Descending) versus high (Ascending) proactive interference; this pattern is opposite to that of healthy participants. Conclusions This differential pattern of performance supports that proactive interference associated with the traditional administration format contributes to VSWM impairment in SSD. Further research investigating associated neurocognitive mechanisms and the contribution of proactive interference across other domains of cognition in SSD is warranted.
Nielsen, Anne Molgaard; Vach, Werner; Kent, Peter; Hestbaek, Lise; Kongsted, Alice
2016-01-01
Latent class analysis (LCA) is increasingly being used in health research, but optimal approaches to handling complex clinical data are unclear. One issue is that commonly used questionnaires are multidimensional, but expressed as summary scores. Using the example of low back pain (LBP), the aim of this study was to explore and descriptively compare the application of LCA when using questionnaire summary scores and when using single items to subgrouping of patients based on multidimensional data. Baseline data from 928 LBP patients in an observational study were classified into four health domains (psychology, pain, activity, and participation) using the World Health Organization's International Classification of Functioning, Disability, and Health framework. LCA was performed within each health domain using the strategies of summary-score and single-item analyses. The resulting subgroups were descriptively compared using statistical measures and clinical interpretability. For each health domain, the preferred model solution ranged from five to seven subgroups for the summary-score strategy and seven to eight subgroups for the single-item strategy. There was considerable overlap between the results of the two strategies, indicating that they were reflecting the same underlying data structure. However, in three of the four health domains, the single-item strategy resulted in a more nuanced description, in terms of more subgroups and more distinct clinical characteristics. In these data, application of both the summary-score strategy and the single-item strategy in the LCA subgrouping resulted in clinically interpretable subgroups, but the single-item strategy generally revealed more distinguishing characteristics. These results 1) warrant further analyses in other data sets to determine the consistency of this finding, and 2) warrant investigation in longitudinal data to test whether the finer detail provided by the single-item strategy results in improved prediction of outcomes and treatment response.
Test-retest stability of the Task and Ego Orientation Questionnaire.
Lane, Andrew M; Nevill, Alan M; Bowes, Neal; Fox, Kenneth R
2005-09-01
Establishing stability, defined as observing minimal measurement error in a test-retest assessment, is vital to validating psychometric tools. Correlational methods, such as Pearson product-moment, intraclass, and kappa are tests of association or consistency, whereas stability or reproducibility (regarded here as synonymous) assesses the agreement between test-retest scores. Indexes of reproducibility using the Task and Ego Orientation in Sport Questionnaire (TEOSQ; Duda & Nicholls, 1992) were investigated using correlational (Pearson product-moment, intraclass, and kappa) methods, repeated measures multivariate analysis of variance, and calculating the proportion of agreement within a referent value of +/-1 as suggested by Nevill, Lane, Kilgour, Bowes, and Whyte (2001). Two hundred thirteen soccer players completed the TEOSQ on two occasions, 1 week apart. Correlation analyses indicated a stronger test-retest correlation for the Ego subscale than the Task subscale. Multivariate analysis of variance indicated stability for ego items but with significant increases in four task items. The proportion of test-retest agreement scores indicated that all ego items reported relatively poor stability statistics with test-retest scores within a range of +/-1, ranging from 82.7-86.9%. By contrast, all task items showed test-retest difference scores ranging from 92.5-99%, although further analysis indicated that four task subscale items increased significantly. Findings illustrated that correlational methods (Pearson product-moment, intraclass, and kappa) are influenced by the range in scores, and calculating the proportion of agreement of test-retest differences with a referent value of +/-1 could provide additional insight into the stability of the questionnaire. It is suggested that the item-by-item proportion of agreement method proposed by Nevill et al. (2001) should be used to supplement existing methods and could be especially helpful in identifying rogue items in the initial stages of psychometric questionnaire validation.
Mowrer, Robert R; Parker, Keesha N
2004-12-01
In a 2002 publication, Mowrer and McCarver reported weak but significant correlations (r =.24) between scores on the Multicultural Perspective Index and scores on Neugarten, Havighurst, and Tobin's 1961 Life Satisfaction Index-A and the Life Satisfaction Scale developed in 1985 by Diener, Emmons, Larsen, and Griffin. Using 382 undergraduate students the present study reduced the Index from 42 to 29 items based on each item's correlation with total items. An additional 104 undergraduate students then completed the modified 29-item version, Rosenberg's Self-esteem Scale, Cheek and Buss's Shyness Scale, the Self-rating Depression Scale by Zung, and the Neugarten, et al. Life Satisfaction Index-A. Scores on the modified Index were negatively correlated with those on the Depression and Shyness scales and positively correlated with scores on the Self-esteem and Life Satisfaction scales (p< .05).
ERIC Educational Resources Information Center
Topczewski, Anna Marie
2013-01-01
Developmental score scales represent the performance of students along a continuum, where as students learn more they move higher along that continuum. Unidimensional item response theory (UIRT) vertical scaling has become a commonly used method to create developmental score scales. Research has shown that UIRT vertical scaling methods can be…
Using Empirical Data to Set Cutoff Scores.
ERIC Educational Resources Information Center
Hills, John R.
Six experimental approaches to the problems of setting cutoff scores and choosing proper test length are briefly mentioned. Most of these methods share the premise that a test is a random sample of items, from a domain associated with a carefully specified objective. Each item is independent and is scored zero or one, with no provision for…
Prediction of true test scores from observed item scores and ancillary data.
Haberman, Shelby J; Yao, Lili; Sinharay, Sandip
2015-05-01
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
Reliability of the Adult Myopathy Assessment Tool in Individuals with Myositis
Harris-Love, Michael O.; Joe, Galen; Davenport, Todd E.; Koziol, Deloris; Rose, Kristen Abbett; Shrader, Joseph A.; Vasconcelos, Olavo M.; McElroy, Beverly; Dalakas, Marinos C.
2015-01-01
Objective The Adult Myopathy Assessment Tool (AMAT) is a 13-item performance-based battery developed to assess functional status and muscle endurance. The purpose of this study was to determine the intrarater and interrater reliability of the AMAT in adults with myosits. Methods Nineteen raters (13 physical therapists and 6 physicians) scored videotaped recordings of patients with myositis performing the AMAT for a total of 114 tests and 1,482 item observations per session. Raters rescored the AMAT test and item observations during a follow up session (19 ±6 days between scoring sessions). All raters completed a single, self-directed, electronic training module prior to the initial scoring session. Results Intrarater and interrater reliability correlation coefficients were .94 or greater for the AMAT Functional Subscale, Endurance Subscale, and Total score (all p < 0.02 for Ho:ρ ≤ 0.75). All AMAT items had satisfactory intrarater agreement (Kappa statistics with Fleiss-Cohen weights, Kw = .57-1.00). Interrater agreement was acceptable for each AMAT item (K = .56-.89) except the sit up (K = .16). The standard error of measurement and 95% confidence interval range for the AMAT Total scores did not exceed 2 points across all observations (AMAT Total score range = 0-45). Conclusions The AMAT is a reliable, domain-specific assessment of functional status and muscle endurance for adult subjects with myositis. Results of this study suggest that physicians and physical therapists may reliably score the AMAT following a single training session. The AMAT Functional Subscale, Endurance Subscale, and Total score exhibit interrater and intrarater reliability suitable for clinical and research use. PMID:25201624
Comparison of scoring approaches for the NEI VFQ-25 in low vision.
Dougherty, Bradley E; Bullimore, Mark A
2010-08-01
The aim of this study was to evaluate different approaches to scoring the National Eye Institute Visual Functioning Questionnaire-25 (NEI VFQ-25) in patients with low vision including scoring by the standard method, by Rasch analysis, and by use of an algorithm created by Massof to approximate Rasch person measure. Subscale validity and use of a 7-item short form instrument proposed by Ryan et al. were also investigated. NEI VFQ-25 data from 50 patients with low vision were analyzed using the standard method of summing Likert-type scores and calculating an overall average, Rasch analysis using Winsteps software, and the Massof algorithm in Excel. Correlations between scores were calculated. Rasch person separation reliability and other indicators were calculated to determine the validity of the subscales and of the 7-item instrument. Scores calculated using all three methods were highly correlated, but evidence of floor and ceiling effects was found with the standard scoring method. None of the subscales investigated proved valid. The 7-item instrument showed acceptable person separation reliability and good targeting and item performance. Although standard scores and Rasch scores are highly correlated, Rasch analysis has the advantages of eliminating floor and ceiling effects and producing interval-scaled data. The Massof algorithm for approximation of the Rasch person measure performed well in this group of low-vision patients. The validity of the subscales VFQ-25 should be reconsidered.
[Development of competency to stand trial rating scale in offenders with mental disorders].
Chen, Xiao-Bing; Cai, Wei-Xiong
2013-04-01
According with Chinese legal system, to develop a competency to stand trial rating scale in offenders with mental disorders. Proceeding from the juristical elements, 15 items were extracted and formulated a preliminary instrument named the competency to stand trial rating scale in offenders with mental disorders. The item analysis included six aspects, which were critical ratio, item-total correlation, corrected item-total correlation, alpha value if item deleted, communalities of items, and factor loading. The Logistic regression equation and cut-off score of ROC curve were used to explore the diagnostic efficiency. The data of critical ratio of extreme group were 18.390-46.763; item-total correlation, 0.639-0.952; corrected item-total correlation, 0.582-0.944; communalities of items, 0.377-0.916; and factor loadings, 0.614-0.957. Seven items were included in the regression equation and the accuracy of back substitution test was 96.0%. The score of 33 was ascertained as the cut-off score by ROC fitting curve, the overlapping ratio compared with the expertise was 95.8%. The sensibility and the specificity were 0.938 and 0.966, respectively, while the positive and negative likelihood ratios were 27.67 and 0.06, respectively. With all items satisfied the requirement of homogeneity test, the rating scale has a reasonable construct and excellent diagnostic efficiency.
Tavakoli, Hamid Reza; Dini-Talatappeh, Hossein; Rahmati-Najarkolaei, Fatemeh; Gholami Fesharaki, Mohammad
2016-11-01
Using various models of behavior change, a number of studies in the area of nutrition education have confirmed that nutrition habits and behaviors can be improved. This study sought to determine the effects of education on patterns of dietary consumption among medical students at the military university of Tehran, with a view to correcting those patterns. In this quasi-experimental study, 242 medical students from the Military University of Tehran were chosen by convenience sampling and then divided into control (n = 107) and intervention groups (n = 135) by block randomization. The self-administered questionnaire involving six categories of item (knowledge, perceived benefits, perceived barriers, perceived threats, self-efficacy and behavior) has been validated (Cronbach alpha > 0.7 for each). Following the educational intervention, the mean score of knowledge, health belief model (HBM) structure, and behavior of students in relation to healthy patterns of food intake increased significantly (P < 0.05). The mean pre-intervention knowledge score was 6.76 (1.452), referring to threats to HBM constructs including perceived threat 2.93 (1.147), perceived benefits 7.28 (1.07), perceived barriers 5.44 (1.831), self- efficacy 4.28 (1.479), and behavior 8.84 (2.527). The post-intervention scores all improved as follows: knowledge 8.3 (1.503), perceived threats 3.29 (1.196), perceived benefits 7.71 (0.762), perceived barriers 5.9 (1.719), self- efficacy 4.6 (1.472), and behavior 9.45 (2.324). This difference in mean scores for knowledge, health belief structures and employee behavior before and after educational intervention was significant (P ≤ 0.05). The significant improvement in the experimental group's mean knowledge, HBM structures , and behavior scores indicates the positive effect of the intervention.
Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D
2017-06-01
About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.
Harasym, Peter H; Woloschuk, Wayne; Cunning, Leslie
2008-12-01
Physician-patient communication is a clinical skill that can be learned and has a positive impact on patient satisfaction and health outcomes. A concerted effort at all medical schools is now directed at teaching and evaluating this core skill. Student communication skills are often assessed by an Objective Structure Clinical Examination (OSCE). However, it is unknown what sources of error variance are introduced into examinee communication scores by various OSCE components. This study primarily examined the effect different examiners had on the evaluation of students' communication skills assessed at the end of a family medicine clerkship rotation. The communication performance of clinical clerks from Classes 2005 and 2006 were assessed using six OSCE stations. Performance was rated at each station using the 28-item Calgary-Cambridge guide. Item Response Theory analysis using a Multifaceted Rasch model was used to partition the various sources of error variance and generate a "true" communication score where the effects of examiner, case, and items are removed. Variance and reliability of scores were as follows: communication scores (.20 and .87), examiner stringency/leniency (.86 and .91), case (.03 and .96), and item (.86 and .99), respectively. All facet scores were reliable (.87-.99). Examiner variance (.86) was more than four times the examinee variance (.20). About 11% of the clerks' outcome status shifted using "true" rather than observed/raw scores. There was large variability in examinee scores due to variation in examiner stringency/leniency behaviors that may impact pass-fail decisions. Exploring the benefits of examiner training and employing "true" scores generated using Item Response Theory analyses prior to making pass/fail decisions are recommended.
Khan, Arif; Durgam, Suresh; Tang, Xiongwen; Ruth, Adam; Mathews, Maju; Gommoll, Carl P.
2016-01-01
Objective To investigate vilazodone, currently approved for major depressive disorder in adults, for generalized anxiety disorder (GAD). Method Three randomized, double-blind, placebo-controlled studies showing positive results for vilazodone (2,040 mg/d) in adult patients with GAD (DSM-IV-TR) were pooled for analyses; data were collected from June 2012 to March 2014. Post hoc outcomes in the pooled intent-to-treat population (n = 1,462) included mean change from baseline to week 8 in Hamilton Anxiety Rating Scale (HARS) total score, psychic and somatic anxiety subscale scores, and individual item scores; HARS response (≥ 50% total score improvement) and remission (total score ≤ 7) at week 8; and category shifts, defined as HARS item score ≥ 2 at baseline (moderate to very severe symptoms) and score of 0 at week 8 (no symptoms). Results The least squares mean difference was statistically significant for vilazodone versus placebo in change from baseline to week 8 in HARS total score (−1.83, P < .0001) and in psychic anxiety (−1.21, P < .0001) and somatic anxiety (−0.63, P < .01) subscale scores; differences from placebo were significant on 11 of 14 HARS items (P < .05). Response rates were higher with vilazodone than placebo (48% vs 39%, P < .001), as were remission rates (27% vs 21%, P < .01). The percentage of patients who shifted to no symptoms was significant for vilazodone on several items: anxious mood, tension, intellectual, depressed mood, somatic-muscular, somatic-sensory, cardiovascular, respiratory, and autonomic symptoms (P < .05). Conclusions Treatment with vilazodone versus placebo was effective in adult GAD patients, with significant differences between treatment groups found on both psychic and somatic HARS items. Trial Registration ClinicalTrials.gov identifiers: NCT01629966, NCT01766401, NCT01844115. PMID:27486544
Khan, Arif; Durgam, Suresh; Tang, Xiongwen; Ruth, Adam; Mathews, Maju; Gommoll, Carl P
2016-01-01
To investigate vilazodone, currently approved for major depressive disorder in adults, for generalized anxiety disorder (GAD). Three randomized, double-blind, placebo-controlled studies showing positive results for vilazodone (2,040 mg/d) in adult patients with GAD (DSM-IV-TR) were pooled for analyses; data were collected from June 2012 to March 2014. Post hoc outcomes in the pooled intent-to-treat population (n = 1,462) included mean change from baseline to week 8 in Hamilton Anxiety Rating Scale (HARS) total score, psychic and somatic anxiety subscale scores, and individual item scores; HARS response (≥ 50% total score improvement) and remission (total score ≤ 7) at week 8; and category shifts, defined as HARS item score ≥ 2 at baseline (moderate to very severe symptoms) and score of 0 at week 8 (no symptoms). The least squares mean difference was statistically significant for vilazodone versus placebo in change from baseline to week 8 in HARS total score (-1.83, P < .0001) and in psychic anxiety (-1.21, P < .0001) and somatic anxiety (-0.63, P < .01) subscale scores; differences from placebo were significant on 11 of 14 HARS items (P < .05). Response rates were higher with vilazodone than placebo (48% vs 39%, P < .001), as were remission rates (27% vs 21%, P < .01). The percentage of patients who shifted to no symptoms was significant for vilazodone on several items: anxious mood, tension, intellectual, depressed mood, somatic-muscular, somatic-sensory, cardiovascular, respiratory, and autonomic symptoms (P < .05). Treatment with vilazodone versus placebo was effective in adult GAD patients, with significant differences between treatment groups found on both psychic and somatic HARS items. ClinicalTrials.gov identifiers: NCT01629966, NCT01766401, NCT01844115.
Heerman, William J; Lounds-Taylor, Julie; Mitchell, Stephanie; Barkin, Shari L
2018-01-01
Understanding the contribution of parental feeding practices to childhood obesity among Latino children is a solution-oriented approach that can lead to interventions supporting healthy childhood growth and lowering rates of obesity. The purpose of this study was to confirm the reliability and validity of the Toddler Feeding Questionnaire (TFQ) to measure parental feeding practices among a sample of Spanish-speaking parent-preschool child pairs (n = 529), and to test the hypothesis that parent characteristics of body mass index (BMI), stress, and health literacy are associated with more indulgent and less authoritative feeding practices. Standardized parent-report questionnaires were completed during baseline interviews in a randomized controlled trial of an obesity prevention intervention. The TFQ includes subscales for indulgent practices (11 items), authoritative practices (7 items), and environmental influences (6 items) with response options scored on a 5-point Likert scale and averaged. Factor analysis confirmed a three-factor structure. Internal consistency was good for indulgent (α = 0.66) and authoritative (α = 0.65) practices but lower for environmental (α = 0.48). Spearman correlation showed indulgent practices and environmental influences were associated with unhealthy child diet patterns, whereas authoritative practices were associated with a healthier child diet. Multivariate linear regression showed higher parent stress was associated with higher indulgent and lower authoritative scores; higher parent health literacy was positively associated with indulgent scores. These results indicate the TFQ is a valid measure of authoritative and indulgent parent feeding practices among Spanish-speaking parents of preschool-age children and that stress and health literacy, potentially modifiable parent characteristics, could be targeted to support healthy feeding practices. Copyright © 2017 Elsevier Inc. All rights reserved.
Checklist content on a standardized patient assessment: an ex post facto review.
Boulet, John R; van Zanten, Marta; de Champlain, André; Hawkins, Richard E; Peitzman, Steven J
2008-03-01
While checklists are often used to score standardized patient based clinical assessments, little research has focused on issues related to their development or the level of agreement with respect to the importance of specific items. Five physicians independently reviewed checklists from 11 simulation scenarios that were part of the former Educational Commission for Foreign Medical Graduate's Clinical Skills Assessment and classified the clinical appropriateness of each of the checklist items. Approximately 78% of the original checklist items were judged to be needed, or indicated, given the presenting complaint and the purpose of the assessment. Rater agreement was relatively poor with pairwise associations (Kappa coefficient) ranging from 0.09 to 0.29. However, when only consensus indicated items were included, there was little change in examinee scores, including their reliability over encounters. Although most checklist items in this sample were judged to be appropriate, some could potentially be eliminated, thereby minimizing the scoring burden placed on the standardized patients. Periodic review of checklist items, concentrating on their clinical importance, is warranted.
Rasch Analysis of the General Self-Efficacy Scale in Workers with Traumatic Limb Injuries.
Wu, Tzu-Yi; Yu, Wan-Hui; Huang, Chien-Yu; Hou, Wen-Hsuan; Hsieh, Ching-Lin
2016-09-01
Purpose The purpose of this study was to apply Rasch analysis to examine the unidimensionality and reliability of the General Self-Efficacy Scale (GSE) in workers with traumatic limb injuries. Furthermore, if the items of the GSE fitted the Rasch model's assumptions, we transformed the raw sum ordinal scores of the GSE into Rasch interval scores. Methods A total of 1076 participants completed the GSE at 1 month post injury. Rasch analysis was used to examine the unidimensionality and person reliability of the GSE. The unidimensionality of the GSE was verified by determining whether the items fit the Rasch model's assumptions: (1) item fit indices: infit and outfit mean square (MNSQ) ranged from 0.6 to 1.4; and (2) the eigenvalue of the first factor extracted from principal component analysis (PCA) for residuals was <2. Person reliability was calculated. Results The unidimensionality of the 10-item GSE was supported in terms of good item fit statistics (infit and outfit MNSQ ranging from 0.92 to 1.32) and acceptable eigenvalues (1.6) of the first factor of the PCA, with person reliability = 0.89. Consequently, the raw sum scores of the GSE were transformed into Rasch scores. Conclusions The results indicated that the items of GSE are unidimensional and have acceptable person reliability in workers with traumatic limb injuries. Additionally, the raw sum scores of the GSE can be transformed into Rasch interval scores for prospective users to quantify workers' levels of self-efficacy and to conduct further statistical analyses.
Yost, Kathleen J; Waller, Niels G; Lee, Minji K; Vincent, Ann
2017-06-01
Efficient management of fibromyalgia (FM) requires precise measurement of FM-specific symptoms. Our objective was to assess the measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) fatigue item bank (FIB) in people with FM. We applied classical psychometric and item response theory methods to cross-sectional PROMIS-FIB data from two samples. Data on the clinical FM sample were obtained at a tertiary medical center. Data for the U.S. general population sample were obtained from the PROMIS network. The full 95-item bank was administered to both samples. We investigated dimensionality of the item bank in both samples by separately fitting a bifactor model with two group factors; experience and impact. We assessed measurement invariance between samples, and we explored an alternate factor structure with the normative sample and subsequently confirmed that structure in the clinical sample. Finally, we assessed whether reporting FM subdomain scores added value over reporting a single total score. The item bank was dominated by a general fatigue factor. The fit of the initial bifactor model and evidence of measurement invariance indicated that the same constructs were measured across the samples. An alternative bifactor model with three group factors demonstrated slightly improved fit. Subdomain scores add value over a total score. We demonstrated that the PROMIS-FIB is appropriate for measuring fatigue in clinical samples of FM patients. The construct can be presented by a single score; however, subdomain scores for the three group factors identified in the alternative model may also be reported.
Perceived Perfectionism from God Scale: Development and Initial Evidence.
Wang, Kenneth T; Allen, G E Kawika; Stokes, Hannah I; Suh, Han Na
2017-05-03
In this study, the Perceived Perfectionism from God Scale (PPGS) was developed with Latter-day Saints (Mormons) across two samples. Sample 1 (N = 421) was used for EFA to select items for the Perceived Standards from God (5 items) and the Perceived Discrepancy from God (5 items) subscales. Sample 2 (N = 420) was used for CFA and cross-validated the 2-factor oblique model as well as a bifactor model. Perceived Standards from God scores had Cronbach alphas ranging from .73 to .78, and Perceived Discrepancy from God scores had Cronbach alphas ranging from .82 to .84. Standards from God scores were positively correlated with positive affect, whereas Discrepancy from God scores was positively correlated with negative affect, shame and guilt. Moreover, these two PPGS subscale scores added significant incremental variances in predicting associated variables over and above corresponding personal perfectionism scores.
Dellinges, Mark A; Curtis, Donald A
2017-08-01
Faculty members are expected to write high-quality multiple-choice questions (MCQs) in order to accurately assess dental students' achievement. However, most dental school faculty members are not trained to write MCQs. Extensive faculty development programs have been used to help educators write better test items. The aim of this pilot study was to determine if a short workshop would result in improved MCQ item-writing by dental school faculty at one U.S. dental school. A total of 24 dental school faculty members who had previously written MCQs were randomized into a no-intervention group and an intervention group in 2015. Six previously written MCQs were randomly selected from each of the faculty members and given an item quality score. The intervention group participated in a training session of one-hour duration that focused on reviewing standard item-writing guidelines to improve in-house MCQs. The no-intervention group did not receive any training but did receive encouragement and an explanation of why good MCQ writing was important. The faculty members were then asked to revise their previously written questions, and these were given an item quality score. The item quality scores for each faculty member were averaged, and the difference from pre-training to post-training scores was evaluated. The results showed a significant difference between pre-training and post-training MCQ difference scores for the intervention group (p=0.04). This pilot study provides evidence that the training session of short duration was effective in improving the quality of in-house MCQs.
ERIC Educational Resources Information Center
Wei, Youhua; Morgan, Rick
2016-01-01
As an alternative to common-item equating when common items do not function as expected, the single-group growth model (SGGM) scaling uses common examinees or repeaters to link test scores on different forms. The SGGM scaling assumes that, for repeaters taking adjacent administrations, the conditional distribution of scale scores in later…
ERIC Educational Resources Information Center
Kim, Sooyeon; Robin, Frederic
2017-01-01
In this study, we examined the potential impact of item misfit on the reported scores of an admission test from the subpopulation invariance perspective. The target population of the test consisted of 3 major subgroups with different geographic regions. We used the logistic regression function to estimate item parameters of the operational items…
On an Extension of the Rasch Model to the Case of Polychotomously Scored Items.
ERIC Educational Resources Information Center
Vogt, Dorothee K.
The Rasch model for the probability of a person's response to an item is extended to the case where this response depends on a set of scoring or category weights, in addition to person and item parameters. The maximum likelihood approach introduced by Wright for the dichotomous case is applicable here also, and it is shown to yield a unique…
ERIC Educational Resources Information Center
Moses, Tim; Miao, Jing; Dorans, Neil
2010-01-01
This study compared the accuracies of four differential item functioning (DIF) estimation methods, where each method makes use of only one of the following: raw data, logistic regression, loglinear models, or kernel smoothing. The major focus was on the estimation strategies' potential for estimating score-level, conditional DIF. A secondary focus…
Reduced-Item Food Audits Based on the Nutrition Environment Measures Surveys.
Partington, Susan N; Menzies, Tim J; Colburn, Trina A; Saelens, Brian E; Glanz, Karen
2015-10-01
The community food environment may contribute to obesity by influencing food choice. Store and restaurant audits are increasingly common methods for assessing food environments, but are time consuming and costly. A valid, reliable brief measurement tool is needed. The purpose of this study was to develop and validate reduced-item food environment audit tools for stores and restaurants. Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed in 820 stores and 1,795 restaurants in West Virginia, San Diego, and Seattle. Data mining techniques (correlation-based feature selection and linear regression) were used to identify survey items highly correlated to total survey scores and produce reduced-item audit tools that were subsequently validated against full NEMS surveys. Regression coefficients were used as weights that were applied to reduced-item tool items to generate comparable scores to full NEMS surveys. Data were collected and analyzed in 2008-2013. The reduced-item tools included eight items for grocery, ten for convenience, seven for variety, and five for other stores; and 16 items for sit-down, 14 for fast casual, 19 for fast food, and 13 for specialty restaurants-10% of the full NEMS-S and 25% of the full NEMS-R. There were no significant differences in median scores for varying types of retail food outlets when compared to the full survey scores. Median in-store audit time was reduced 25%-50%. Reduced-item audit tools can reduce the burden and complexity of large-scale or repeated assessments of the retail food environment without compromising measurement quality. Copyright © 2015 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Vivat, B; Young, T E; Winstanley, J; Arraras, J I; Black, K; Boyle, F; Bredart, A; Costantini, A; Guo, J; Irarrazaval, M E; Kobayashi, K; Kruizinga, R; Navarro, M; Omidvari, S; Rohde, G E; Serpentini, S; Spry, N; Van Laarhoven, H W M; Yang, G M
2017-11-01
The EORTC Quality of Life Group has just completed the final phase (field-testing and validation) of an international project to develop a stand-alone measure of spiritual well-being (SWB) for palliative cancer patients. Participants (n = 451)-from 14 countries on four continents; 54% female; 188 Christian; 50 Muslim; 156 with no religion-completed a provisional 36-item measure of SWB plus the EORTC QLQ-C15-PAL (PAL), then took part in a structured debriefing interview. All items showed good score distribution across response categories. We assessed scale structure using principal component analysis and Rasch analysis, and explored construct validity, and convergent/divergent validity with the PAL. Twenty-two items in four scoring scales (Relationship with Self, Relationships with Others, Relationship with Someone or Something Greater, and Existential) explained 53% of the variance. The measure also includes a global SWB item and nine other items. Scores on the PAL global quality-of-life item and Emotional Functioning scale weakly-moderately correlated with scores on the global SWB item and two of the four SWB scales. This new validated 32-item SWB measure addresses a distinct aspect of quality-of-life, and is now available for use in research and clinical practice, with a role as both a measurement and an intervention tool. © 2017 John Wiley & Sons Ltd.
Silverstein, Michael J; Faraone, Stephen V; Alperin, Samuel; Leon, Terry L; Biederman, Joseph; Spencer, Thomas J; Adler, Lenard A
2018-02-01
The aim of this study is to validate the Adult ADHD Self-Report Scale (ASRS) and Adult ADHD Investigator Symptom Rating Scale (AISRS) expanded versions, including executive function deficits (EFDs) and emotional dyscontrol (EC) items, and to present ASRS and AISRS pilot normative data. Two patient samples (referred and primary care physician [PCP] controls) were pooled together for these analyses. Final analysis included 297 respondents, 171 with adult ADHD. Cronbach's alphas were high for all sections of the scales. Examining histograms of ASRS 31-item and AISRS 18-item total scores for ADHD controls, 95% cutoff scores were 70 and 23, respectively; histograms for pilot normative sample suggest cutoffs of 82 and 26, respectively. (a) ASRS- and AISRS-expanded versions have high validity in assessment of core 18 adult ADHD Diagnostic and Statistical Manual of Mental Disorders ( DSM) symptoms and EFD and EC symptoms. (b) ASRS (31-item) scores 70 to 82 and AISRS (18-item) scores from 23 to 26 suggest a high likelihood of adult ADHD.
[Elaboration of an ethogram for the diagnosis of the A Pattern in coronary pathology].
Etienne, T; Isingrini, M; Benhamou, M; Tichet, F; Raynaud, P; Brochier, M
1990-09-01
In order to develop a technique which allows the detection of Pattern A (PA) we present in this paper a series of steps for constructing an observation gril (ethogram) which allows for the quantification of behavior in situation of structured interview. The behavioral units making up the final ethogram are derived from inter-item correlations taken from a population of 48 subjects who had suffered heart attacks. The observations on this population permit an inclusion score in the PA. These observations also confirm that the PA present a risk factor which is independent of classical risk factors. A significative positive correlation with work stress has been found showing, in accordance with the view of Friedman and Rosenman that the PA corresponds to a particular behavioral pattern which is dependent on the work environment.
Analyzing force concept inventory with item response theory
NASA Astrophysics Data System (ADS)
Wang, Jing; Bao, Lei
2010-10-01
Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Detection of Differential Item Functioning Using the Lasso Approach
ERIC Educational Resources Information Center
Magis, David; Tuerlinckx, Francis; De Boeck, Paul
2015-01-01
This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
Zhao, Yue; Chan, Wai; Lo, Barbara Chuen Yee
2017-04-04
Item response theory (IRT) has been increasingly applied to patient-reported outcome (PRO) measures. The purpose of this study is to apply IRT to examine item properties (discrimination and severity of depressive symptoms), measurement precision and score comparability across five depression measures, which is the first study of its kind in the Chinese context. A clinical sample of 207 Hong Kong Chinese outpatients was recruited. Data analyses were performed including classical item analysis, IRT concurrent calibration and IRT true score equating. The IRT assumptions of unidimensionality and local independence were tested respectively using confirmatory factor analysis and chi-square statistics. The IRT linking assumptions of construct similarity, equity and subgroup invariance were also tested. The graded response model was applied to concurrently calibrate all five depression measures in a single IRT run, resulting in the item parameter estimates of these measures being placed onto a single common metric. IRT true score equating was implemented to perform the outcome score linking and construct score concordances so as to link scores from one measure to corresponding scores on another measure for direct comparability. Findings suggested that (a) symptoms on depressed mood, suicidality and feeling of worthlessness served as the strongest discriminating indicators, and symptoms concerning suicidality, changes in appetite, depressed mood, feeling of worthlessness and psychomotor agitation or retardation reflected high levels of severity in the clinical sample. (b) The five depression measures contributed to various degrees of measurement precision at varied levels of depression. (c) After outcome score linking was performed across the five measures, the cut-off scores led to either consistent or discrepant diagnoses for depression. The study provides additional evidence regarding the psychometric properties and clinical utility of the five depression measures, offers methodological contributions to the appropriate use of IRT in PRO measures, and helps elucidate cultural variation in depressive symptomatology. The approach of concurrently calibrating and linking multiple PRO measures can be applied to the assessment of PROs other than the depression context.
Correlates of cognitive function scores in elderly outpatients.
Mangione, C M; Seddon, J M; Cook, E F; Krug, J H; Sahagian, C R; Campion, E W; Glynn, R J
1993-05-01
To determine medical, ophthalmologic, and demographic predictors of cognitive function scores as measured by the Telephone Interview for Cognitive Status (TICS), an adaptation of the Folstein Mini-Mental Status Exam. A secondary objective was to perform an item-by-item analysis of the TICS scores to determine which items correlated most highly with the overall scores. Cross-sectional cohort study. The Glaucoma Consultation Service of the Massachusetts Eye and Ear Infirmary. 472 of 565 consecutive patients age 65 and older who were seen at the Glaucoma Consultation Service between November 1, 1987 and October 31, 1988. Each subject had a standard visual examination and review of medical history at entry, followed by a telephone interview that collected information on demographic characteristics, cognitive status, health status, accidents, falls, symptoms of depression, and alcohol intake. A multivariate linear regression model of correlates of TICS score found the strongest correlates to be education, age, occupation, and the presence of depressive symptoms. The only significant ocular condition that correlated with lower TICS score was the presence of surgical aphakia (model R2 = .46). Forty-six percent (216/472) of patients fell below the established definition of normal on the mental status scale. In a logistic regression analysis, the strongest correlates of an abnormal cognitive function score were age, diabetes, educational status, and occupational status. An item analysis using step-wise linear regression showed that 85 percent of the variance in the TICS score was explained by the ability to perform serial sevens and to repeat 10 items immediately after hearing them. Educational status correlated most highly with both of these items (Kendall Tau R = .43 and Kendall Tau R = .30, respectively). Education, occupation, depression, and age were the strongest correlates of the score on this new screening test for assessing cognitive status. These factors were stronger correlates of the TICS score than chronic medical conditions, visual loss, or medications. The Telephone Interview for Cognitive Status is a useful instrument, but it may overestimate the prevalence of dementia in studies with a high prevalence of persons with less than a high school education.
Item Response Theory Modeling of the Philadelphia Naming Test.
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D
2015-06-01
In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
Guyatt, G H; Cook, D J; King, D; Norman, G R; Kane, S L; van Ineveld, C
1999-02-01
To determine whether framing questions positively or negatively influences residents' apparent satisfaction with their training. In 1993-94, 276 residents at five Canadian internal medicine residency programs responded to 53 Likert-scale items designed to determine sources of the residents' satisfaction and stress. Two versions of the questionnaire were randomly distributed: one in which half the items were stated positively and the other half negatively, the other version in which the items were stated in the opposite way. The residents scored 43 of the 53 items higher when stated positively and scored ten higher when stated negatively (p < .0001). When analyzed using an analysis-of-variance model, the effect of positive versus negative framing was highly significant (F = 129.81, p < .0001). While the interaction between item and framing was also significant, the effect was much less strong (F = 5.56, p < .0001). On a scale where 1 represented the lowest possible level of satisfaction and 7 the highest, the mean score of the positively stated items was 4.1 and that of the negatively stated items, 3.8, an effect of 0.3. These results suggest a significant "response acquiescence bias." To minimize this bias, questionnaires assessing attitudes toward educational programs should include a mix of positively and negatively stated items.
Cross-cultural validity of the scale for interpersonal behavior.
Nota, Laura; Arrindell, Willem A; Soresi, Salvatore; van der Ende, Jan; Sanavio, Ezio
2011-01-01
The Scale for Interpersonal Behavior (SIB) is a 50-item multidimensional measure of difficulty and distress in assertiveness. The SIB assesses negative assertion, expression of and dealing with personal limitations, initiating assertiveness and positive assertion. The SIB was originally developed in the Netherlands. The present study attempted to replicate the original factors with an Italian student sample (n = 995). The four distress and four performance factors were replicable across two methods of analysis (the multiple group method of confirmatory analysis and Tucker's coefficient of congruence (phi). The corresponding scales were internally consistent and showed predicted patterns of correlations with a measure of self-efficacy. Sex and age differences in assertiveness were generally negligible. Italian students had higher positive assertion-performance scores than the Dutch and comparable scores on other performance scales; by contrast, the Italian subjects had significantly higher scores on all SIB distress scales than their Dutch equivalents. This was ascribed to the stronger pressure on people in Italian society to behave assertively (Hofstede's National Masculinity score = 70) as opposed to the Dutch society (National Masculinity score = 14).
Evaluation of the psychometric properties of the Nighttime Symptoms of COPD Instrument.
Mocarski, Michelle; Zaiser, Erica; Trundell, Dylan; Make, Barry J; Hareendran, Asha
2015-01-01
Nighttime symptoms can negatively impact the quality of life of patients with chronic obstructive pulmonary disease (COPD). The Nighttime Symptoms of COPD Instrument (NiSCI) was designed to measure the occurrence and severity of nighttime symptoms in patients with COPD, the impact of symptoms on nighttime awakenings, and rescue medication use. The objective of this study was to explore item reduction, inform scoring recommendations, and evaluate the psychometric properties of the NiSCI. COPD patients participating in a Phase III clinical trial completed the NiSCI daily. Item analyses were conducted using weekly mean and single day scores. Descriptive statistics (including percentage of respondents at floor/ceiling and inter-item correlations), factor analyses, and Rasch model analyses were conducted to examine item performance and scoring. Test-retest reliability was assessed for the final instrument using the intraclass correlation coefficient (ICC). Correlations with assessments conducted during study visits were used to evaluate convergent and known-groups validity. Data from 1,663 COPD patients aged 40-93 years were analyzed. Item analyses supported the generation of four scores. A one-factor structure was confirmed with factor analysis and Rasch analysis for the symptom severity score. Test-retest reliability was confirmed for the six-item symptom severity (ICC, 0.85), number of nighttime awakenings (ICC, 0.82), and rescue medication (ICC, 0.68) scores. Convergent validity was supported by significant correlations between the NiSCI, St George's Respiratory Questionnaire, and Exacerbations of Chronic Obstructive Pulmonary Disease Tool-Respiratory Symptoms scores. The results suggest that the NiSCI can be used to determine the severity of nighttime COPD symptoms, the number of nighttime awakenings due to COPD symptoms, and the nighttime use of rescue medication. The NiSCI is a reliable and valid instrument to evaluate these concepts in COPD patients in clinical trials and clinical practice. Scoring recommendations and steps for further research are discussed.
Moss, Alan C; Lillis, Yvonne; Edwards George, Jessica B; Choudhry, Niteesh K; Berg, Anders H; Cheifetz, Adam S; Horowitz, Gary; Leffler, Dan A
2014-12-01
Poor adherence to mesalamine is common and driven by a combination of lifestyle and behavioral factors, as well as health beliefs. We sought to develop a valid tool to identify barriers to patient adherence and predict those at risk for future nonadherence. A 10-item survey was developed from patient-reported barriers to adherence. The survey was administered to 106 patients with ulcerative colitis who were prescribed mesalamine, and correlated with prospectively collected 12-month pharmacy refills (medication possession ratio (MPR)), urine levels of salicylates, and self-reported adherence (Morisky Medication Adherence Scale (MMAS)-8). From the initial 10-item survey, 8 items correlated highly with the MMAS-8 score at enrollment. Computer-generated randomization produced a derivation cohort of 60 subjects and a validation cohort of 46 subjects to assess the survey items in their ability to predict future adherence. Two items from the patient survey correlated with objective measures of long-term adherence: their belief in the importance of maintenance mesalamine even when in remission and their concerns about side effects. The additive score based on these two items correlated with 12-month MPR in both the derivation and validation cohorts (P<0.05). Scores on these two items were associated with a higher risk of being nonadherent over the subsequent 12 months (relative risk (RR) =2.2, 95% confidence interval=1.5-3.5, P=0.04). The area under the curve for the performance of this 2-item tool was greater than that of the 10-item MMAS-8 score for predicting MPR scores over 12 months (area under the curve 0.7 vs. 0.5). Patients' beliefs about the need for maintenance mesalamine and their concerns about side effects influence their adherence to mesalamine over time. These concerns could easily be raised in practice to identify patients at risk of nonadherence (Clinical Trial number NCT01349504).
Avoiding and Correcting Bias in Score-Based Latent Variable Regression with Discrete Manifest Items
ERIC Educational Resources Information Center
Lu, Irene R. R.; Thomas, D. Roland
2008-01-01
This article considers models involving a single structural equation with latent explanatory and/or latent dependent variables where discrete items are used to measure the latent variables. Our primary focus is the use of scores as proxies for the latent variables and carrying out ordinary least squares (OLS) regression on such scores to estimate…
ERIC Educational Resources Information Center
Zechner, Klaus; Chen, Lei; Davis, Larry; Evanini, Keelan; Lee, Chong Min; Leong, Chee Wee; Wang, Xinhao; Yoon, Su-Youn
2015-01-01
This research report presents a summary of research and development efforts devoted to creating scoring models for automatically scoring spoken item responses of a pilot administration of the Test of English-for-Teaching ("TEFT"™) within the "ELTeach"™ framework.The test consists of items for all four language modalities:…
Martényi, F; Metcalfe, S; Schausberger, B; Dossenbach, M R
2001-01-01
Thirty-five patients suffering from schizophrenia, as diagnosed by the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, were preselected from 7 clinical trials according to a priori criteria of catatonic signs and symptoms based on 3 Positive and Negative Syndrome Scale (PANSS) items: scores for PANSS item 19 (mannerism and posturing) and either item 4 (excitement) or item 21 (motor retardation) had to exceed or equal 4 at baseline. This particular patient population represents a severely psychotic sample: mean +/- SD PANSS total scores at baseline were 129.26 +/- 19.76. After I week of olanzapine treatment, mean PANSS total score was decreased significantly (-13.14; p < .001), as was mean PANSS total score after 6 weeks of olanzapine treatment (-45.16; p < .001); additionally, the positive subscale, negative subscale, and mood scores improved significantly. A significant improvement in the catatonic signs and symptoms composite score was also observed at week 6 (-4.96; p < .001). The mean +/- SD daily dose of olanzapine was 18.00 +/- 2.89 mg after 6 weeks of treatment. The present data analysis suggests the efficacy of olanzapine in the treatment of severely ill schizophrenic patients with nonspecified catatonic signs and symptoms.
MOTHER-INFANT INTERACTION IMPROVES WITH A DEVELOPMENTAL INTERVENTION FOR MOTHER-PRETERM INFANT DYADS
White-Traut, Rosemary; Norr, Kathleen F.; Fabiyi, Camille; Rankin, Kristin M.; Li, Zhyouing; Liu, Li
2013-01-01
While premature infants have a high need for positive interactions, both infants and their mothers are challenged by the infant‘s biological immaturity. This randomized clinical trial of 198 premature infants born at 29–34 weeks gestation and their mothers examined the impact of the H-HOPE (Hospital to Home: Optimizing the Infant’s Environment) intervention on mother-premature infant interaction patterns at 6-weeks corrected age (CA). Mothers had at least 2 social environmental risk factors such as minority status or less than high school education. Mother-infant dyads were randomly assigned to the H-HOPE intervention group or an attention Control group. H-HOPE is an integrated intervention that included (1) twice-daily infant stimulation using the ATVV (auditory, tactile, visual, and vestibular-rocking stimulation) and (2) four maternal participatory guidance sessions plus two telephone calls by a nurse-community advocate team. Mother-infant interaction was assessed at 6-weeks CA using the Nursing Child Assessment Satellite Training–Feeding Scale (NCAST, 76 items) and the Dyadic Mutuality Code (DMC, 6-item contingency scale during a 5-minute play session). NCAST and DMC scores for the Control and H-HOPE groups were compared using t-tests, chi-square tests and multivariable analysis. Compared with the Control group (n = 76), the H-HOPE group (n = 66) had higher overall NCAST scores and higher maternal Social-Emotional Growth Fostering Subscale scores. The H-HOPE group also had significantly higher scores for the overall infant subscale and the Infant Clarity of Cues Subscale (p < 0.05). H-HOPE dyads were also more likely to have high responsiveness during play as measured by the DMC (67.6% versus 58.1% of controls). After adjustment for significant maternal and infant characteristics, H-HOPE dyads had marginally higher scores during feeding on overall mother-infant interaction (β = 2.03, p = .06) and significantly higher scores on the infant subscale (β = 0.75, p = .05) when compared to controls. In the adjusted analysis, H-HOPE dyads had increased odds of high versus low mutual responsiveness during play (OR = 2.37, 95% CI = 0.97, 5.80). Intervening with both mother and infant is a promising approach to help premature infants achieve the social interaction patterns essential for optimal development. PMID:23962543
White-Traut, Rosemary; Norr, Kathleen F; Fabiyi, Camille; Rankin, Kristin M; Li, Zhyouing; Liu, Li
2013-12-01
While premature infants have a high need for positive interactions, both infants and their mothers are challenged by the infant's biological immaturity. This randomized clinical trial of 198 premature infants born at 29-34 weeks gestation and their mothers examined the impact of the H-HOPE (Hospital to Home: Optimizing the Infant's Environment) intervention on mother-premature infant interaction patterns at 6-weeks corrected age (CA). Mothers had at least 2 social environmental risk factors such as minority status or less than high school education. Mother-infant dyads were randomly assigned to the H-HOPE intervention group or an attention control group. H-HOPE is an integrated intervention that included (1) twice-daily infant stimulation using the ATVV (auditory, tactile, visual, and vestibular-rocking stimulation) and (2) four maternal participatory guidance sessions plus two telephone calls by a nurse-community advocate team. Mother-infant interaction was assessed at 6-weeks CA using the Nursing Child Assessment Satellite Training-Feeding Scale (NCAST, 76 items) and the Dyadic Mutuality Code (DMC, 6-item contingency scale during a 5-min play session). NCAST and DMC scores for the Control and H-HOPE groups were compared using t-tests, chi-square tests and multivariable analysis. Compared with the Control group (n = 76), the H-HOPE group (n = 66) had higher overall NCAST scores and higher maternal Social-Emotional Growth Fostering Subscale scores. The H-HOPE group also had significantly higher scores for the overall infant subscale and the Infant Clarity of Cues Subscale (p < 0.05). H-HOPE dyads were also more likely to have high responsiveness during play as measured by the DMC (67.6% versus 58.1% of controls). After adjustment for significant maternal and infant characteristics, H-HOPE dyads had marginally higher scores during feeding on overall mother-infant interaction (β = 2.03, p = 0.06) and significantly higher scores on the infant subscale (β = 0.75, p = 0.05) when compared to controls. In the adjusted analysis, H-HOPE dyads had increased odds of high versus low mutual responsiveness during play (OR = 2.37, 95% CI = 0.97, 5.80). Intervening with both mother and infant is a promising approach to help premature infants achieve the social interaction patterns essential for optimal development. Copyright © 2013 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Chen, Hanwei; Cui, Zhongmin; Zhu, Rongchun; Gao, Xiaohong
2010-01-01
The most critical feature of a common-item nonequivalent groups equating design is that the average score difference between the new and old groups can be accurately decomposed into a group ability difference and a form difficulty difference. Two widely used observed-score linear equating methods, the Tucker and the Levine observed-score methods,…
Paschoal, Sérgio Márcio Pacheco; Filho, Wilson Jacob; Litvoc, Júlio
2008-01-01
OBJECTIVE To describe item reduction and its distribution into dimensions in the construction process of a quality of life evaluation instrument for the elderly. METHODS The sampling method was chosen by convenience through quotas, with selection of elderly subjects from four programs to achieve heterogeneity in the “health status”, “functional capacity”, “gender”, and “age” variables. The Clinical Impact Method was used, consisting of the spontaneous and elicited selection by the respondents of relevant items to the construct Quality of Life in Old Age from a previously elaborated item pool. The respondents rated each item’s importance using a 5-point Likert scale. The product of the proportion of elderly selecting the item as relevant (frequency) and the mean importance score they attributed to it (importance) represented the overall impact of that item in their quality of life (impact). The items were ordered according to their impact scores and the top 46 scoring items were grouped in dimensions by three experts. A review of the negative items was performed. RESULTS One hundred and ninety three people (122 women and 71 men) were interviewed. Experts distributed the 46 items into eight dimensions. Closely related items were grouped and dimensions not reaching the minimum expected number of items received additional items resulting in eight dimensions and 43 items. DISCUSSION The sample was heterogeneous and similar to what was expected. The dimensions and items demonstrated the multidimensionality of the construct. The Clinical Impact Method was appropriate to construct the instrument, which was named Elderly Quality of Life Index - EQoLI. An accuracy process will be examined in the future. PMID:18438571
ERIC Educational Resources Information Center
Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon
2014-01-01
Previous research has demonstrated that differential item functioning (DIF) methods that do not account for multilevel data structure could result in too frequent rejection of the null hypothesis (i.e., no DIF) when the intraclass correlation coefficient (?) of the studied item was the same as the ? of the total score. The current study extended…
Packham, Tara L; Cappelleri, Joseph C; Sadosky, Alesia; MacDermid, Joy C; Brunner, Florian
2017-03-04
painDETECT (PD-Q) is a self-reported assessment of pain qualities developed as a screening tool for pain of neuropathic origin. Rasch analysis is a strategy for examining the measurement characteristics of a scale using a form of item response theory. We conducted a Rasch analysis to consider if the scoring and measurement properties of PD-Q would support its use as an outcome measure. Rasch analysis was conducted on PD-Q scores drawn from a cross-sectional study of the burden and costs of NeP. The analysis followed an iterative process based on recommendations in the literature, including examination of sequential scoring categories, unidimensionality, reliability and differential item function. Data from 624 persons with a diagnosis of painful diabetic polyneuropathy, small fibre neuropathy, and neuropathic pain associated with chronic low back pain, spinal cord injury, HIV-related pain, or chronic post-surgical pain was used for this analysis. PD-Q demonstrated fit to the Rasch model after adjustments of scoring categories for four items, and omission of the time course and radiating questions. The resulting seven-item scale of pain qualities demonstrated good reliability with a person-separation index of 0.79. No scoring bias (differential item functioning) was found for this version. Rasch modelling suggests the seven pain-qualities items from PD-Q may be used as an outcome measure. Further research is required to confirm validity and responsiveness in a clinical setting.
Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions
ERIC Educational Resources Information Center
Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.
2003-01-01
Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
Development of the outcome expectancy scale for self-care among periodontal disease patients.
Kakudate, Naoki; Morita, Manabu; Fukuhara, Shunichi; Sugai, Makoto; Nagayama, Masato; Isogai, Emiko; Kawanami, Masamitsu; Chiba, Itsuo
2011-12-01
The theory of self-efficacy states that specific efficacy expectations affect behaviour. Two types of efficacy expectations are described within the theory. Self-efficacy expectations are the beliefs in the capacity to perform a specific behaviour. Outcome expectations are the beliefs that carrying out a specific behaviour will lead to a desired outcome. To develop and examine the reliability and validity of an outcome expectancy scale for self-care (OESS) among periodontal disease patients. A 34-item scale was tested on 101 patients at a dental clinic. Accuracy was improved by item analysis, and internal consistency and test-retest stability were investigated. Concurrent validity was tested by examining associations of the OESS score with the self-efficacy scale for self-care (SESS) score and plaque index score. Construct validity was examined by comparing OESS scores between periodontal patients at initial visit (group 1) and those continuing maintenance care (group 2). Item analysis identified 13 items for the OESS. Factor analysis extracted three factors: social-, oral- and self-evaluative outcome expectancy. Cronbach's alpha coefficient for the OESS was 0.90. A significant association was observed between test and retest scores, and between the OESS and SESS and plaque index scores. Further, group 2 had a significantly higher mean OESS score than group 1. We developed a 13-item OESS with high reliability and validity which may be used to assess outcome expectancy for self-care. A patient's psychological condition with regard to behaviour and affective status can be accurately evaluated using the OESS with SESS. © 2011 Blackwell Publishing Ltd.
Diet, Lung Function, and Asthma Exacerbations in Puerto Rican Children.
Han, Yueh-Ying; Forno, Erick; Alvarez, Maria; Colón-Semidey, Angel; Acosta-Perez, Edna; Canino, Glorisa; Celedón, Juan C
2017-12-01
Changes in dietary patterns may partly explain the epidemic of asthma in industrialized countries. The objective of this study was to examine the relationship between dietary patterns and lung function and asthma exacerbations in Puerto Rican children. This is a case-control study of 678 Puerto Rican children (ages 6-14 years) in San Juan (Puerto Rico). All participants completed a respiratory health questionnaire and a 75-item food frequency questionnaire. Food items were aggregated into 7 groups: fruits, vegetables, grains, protein, dairy, fats, and sweets. Logistic regression was used to evaluate consumption frequency of each group and asthma. Based on the results, a dietary score was created [range from -2 (unhealthy diet: high consumption of dairy and sweets, low consumption of vegetables and grains) to 2 (healthy diet: high consumption of vegetables and grains and low consumption of dairy and sweet)]. Multivariable linear or logistic regression was used to assess the relationship between dietary score and lung function or asthma exacerbations. After adjustment for covariates, a healthier diet (each 1-point increment in dietary score) was associated with significantly higher %predicted forced expiratory volume in the first second (FEV 1 ) and %predicted forced vital capacity (FVC) in control subjects. Dietary pattern alone was not associated with asthma exacerbations, but children with an unhealthy diet and vitamin D insufficiency (plasma 25(OH)D <30 ng/mL) had higher odds of ≥1 severe asthma exacerbation [odds ratio (OR) = 3.4, 95% confidence interval (CI) = 1.5-7.5] or ≥1 hospitalization due to asthma (OR = 3.9, 95% CI = 1.6-9.8, OR = 3.4, 95% CI = 1.5-7.5) than children who ate a healthy diet and were vitamin D sufficient. A healthy diet, with frequent consumption of vegetables and grains and low consumption of dairy products and sweets, was associated with higher lung function (as measured by FEV 1 and FVC). Vitamin D insufficiency, together with an unhealthy diet, may have detrimental effects on asthma exacerbations in children.
Lilienfeld, S O; Andrews, B P
1996-06-01
Research on psychopathology has been hindered by persisting difficulties and controversies regarding its assessment. The primary goals of this set of studies were to (a) develop, and initiate the construct validation of, a self-report measure that assesses the major personality traits of psychopathy in noncriminal populations and (b) clarify the nature of these traits via an exploratory approach to test construction. This measure, the Psychopathic Personality Inventory (PPI), was developed by writing items to assess a large number of personality domains relevant to psychopathy and performing successive item-level factor analyses and revisions on three undergraduate samples. The PPI total score and its eight subscales were found to possess satisfactory internal consistency and test-retest reliability. In four studies with undergraduates, the PPI and its subscales exhibited a promising pattern of convergent and discriminant validity with self-report, psychiatric interview, observer rating, and family history data. In addition, the PPI total score demonstrated incremental validity relative to several commonly used self-report psychopathy-related measures. Future construct validation studies, unresolved conceptual issues regarding the assessment of psychopathy, and potential research uses of the PPI are outlined.
Vaona, Alberto; Marcon, Alessandro; Rava, Marta; Buzzetti, Roberto; Sartori, Marco; Abbinante, Crescenza; Moser, Andrea; Seddaiu, Antonia; Prontera, Manuela; Quaglio, Alessandro; Pallazzoni, Piera; Sartori, Valentina; Rigon, Giulio
2011-12-01
Many medical journals provide patient information leaflets on the correct use of medicines and/or appropriate lifestyles. Only a few studies have assessed the quality of this patient-specific literature. The purpose of this study was to evaluate the quality of JAMA Patient Pages on diabetes using the Ensuring Quality Information for Patient (EQIP) tool. A multidisciplinary group of 10 medical doctors analyzed all diabetes-related Patient Pages published by JAMA from 1998 to 2010 using the EQIP tool. Inter-rater reliability was assessed using the percentage of observed total agreement (p(o)). A quality score between 0 and 1 (the higher score indicating higher quality) was calculated for each item on every page as a function of raters' answers to the EQIP checklist. A mean score per item and a mean score per page were then calculated. We found 8 Patient Pages on diabetes on the JAMA web site. The overall quality score of the documents ranged between 0.55 (Managing Diabetes and Diabetes) and 0.67 (weight and diabetes). p(o) was at least moderate (>50%) for 15 of the 20 EQIP items. Despite generally favorable quality scores, some items received low scores. The worst scores were for the item assessing provision of an empty space to customize information for individual patients (score=0.01, p(o)=95%) and patients involvement in document drafting (score=0.11, p(o)=79%). The Patient Pages on diabetes published by JAMA were found to present weak points that limit their overall quality and may jeopardize their efficacy. We therefore recommend that authors and publishers of written patient information comply with published quality criteria. Further research is needed to evaluate the quality and efficacy of existing written health care information. Copyright © 2011 Primary Care Diabetes Europe. Published by Elsevier Ltd. All rights reserved.
Darzins, Susan; Imms, Christine; Di Stefano, Marilyn; Taylor, Nicholas F; Pallant, Julie F
2014-11-05
The Personal Care Participation Assessment and Resource Tool (PC-PART) is a 43-item, clinician-administered assessment, designed to identify patients' unmet needs (participation restrictions) in activities of daily living (ADL) required for community life. This information is important for identifying problems that need addressing to enable, for example, discharge from inpatient settings to community living. The objective of this study was to evaluate internal construct validity of the PC-PART using Rasch methods. Fit to the Rasch model was evaluated for 41 PC-PART items, assessing threshold ordering, overall model fit, individual item fit, person fit, internal consistency, Differential Item Functioning (DIF), targeting of items and dimensionality. Data used in this research were taken from admission data from a randomised controlled trial conducted at two publically funded inpatient rehabilitation units in Melbourne, Australia, with 996 participants (63% women; mean age 74 years) and with various impairment types. PC-PART items assessed as one scale, and original PC-PART domains evaluated as separate scales, demonstrated poor fit to the Rasch model. Adequate fit to the Rasch model was achieved in two newly formed PC-PART scales: Self-Care (16 items) and Domestic Life (14 items). Both scales were unidimensional, had acceptable internal consistency (PSI =0.85, 0.76, respectively) and well-targeted items. Rasch analysis did not support conventional summation of all PC-PART item scores to create a total score. However, internal construct validity of the newly formed PC-PART scales, Self-Care and Domestic Life, was supported. Their Rasch-derived scores provided interval-level measurement enabling summation of scores to form a total score on each scale. These scales may assist clinicians, managers and researchers in rehabilitation settings to assess and measure changes in ADL participation restrictions relevant to community living. Data used in this research were gathered during a registered randomised controlled trial: Australian and New Zealand Clinical Trials Registry ACTRN12609000973213. Ethics committee approval was gained for secondary analysis of data for this study.
An NCME Instructional Module on Polytomous Item Response Theory Models
ERIC Educational Resources Information Center
Penfield, Randall David
2014-01-01
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…
Dietary patterns and odds of Type 2 diabetes in Beirut, Lebanon: a case-control study.
Naja, Farah; Hwalla, Nahla; Itani, Leila; Salem, Maya; Azar, Sami T; Zeidan, Maya Nabhani; Nasreddine, Lara
2012-12-27
In Lebanon, Type 2 diabetes (T2D) has a major public health impact through high disease prevalence, significant downstream pathophysiologic effects, and enormous financial liabilities. Diet is an important environmental factor in the development and prevention of T2D. Dietary patterns may exert greater effects on health than individual foods, nutrients, or food groups. The objective of this study is to examine the association between dietary patterns and the odds of T2D among Lebanese adults. Fifty-eight recently diagnosed cases of T2D and 116 population-based age, sex, and place of residence matched control participants were interviewed. Data collection included a standard socio-demographic and lifestyle questionnaire. Dietary intake was evaluated by a semi-quantitative 97-item food frequency questionnaire. Anthropometric measurements including weight, height, waist circumference, and percent body fat were also obtained. Dietary patterns were identified by factor analysis. Multivariate logistic regression analysis was used to evaluate the associations of extracted patterns with T2D. Pearson correlations between these patterns and obesity markers, energy, and nutrient intakes were also examined. Four dietary patterns were identified: Refined Grains & Desserts, Traditional Lebanese, Fast Food and Meat & Alcohol. While scores of the "Refined Grains & Desserts" had the highest correlations with energy (r = 0.74) and carbohydrates (r = 0.22), those of the "Fast Food" had the highest correlation with fat intake (r = 0.34). After adjustment for socio-demographic and lifestyle characteristics, scores of the Refined Grains & Desserts and Fast Food patterns were associated with higher odds of T2D (OR: 3.85, CI: 1.13-11.23 and OR: 2.80, CI: 1.14-5.59; respectively) and scores of the Traditional Lebanese pattern were inversely associated with the odds of T2D (OR: 0.46, CI: 0.22-0.97). The findings of this study demonstrate direct associations of the Refined Grains & Desserts and Fast Food patterns with T2D and an inverse association between the Traditional Lebanese pattern and the disease among Lebanese adults. These results may guide the development of nutrition interventions for the prevention and management of T2D among Lebanese adults.
An Isotonic Partial Credit Model for Ordering Subjects on the Basis of Their Sum Scores
ERIC Educational Resources Information Center
Ligtvoet, Rudy
2012-01-01
In practice, the sum of the item scores is often used as a basis for comparing subjects. For items that have more than two ordered score categories, only the partial credit model (PCM) and special cases of this model imply that the subjects are stochastically ordered on the common latent variable. However, the PCM is very restrictive with respect…
ERIC Educational Resources Information Center
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee
2017-01-01
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
A Note on Stochastic Ordering of the Latent Trait Using the Sum of Polytomous Item Scores
ERIC Educational Resources Information Center
van der Ark, L. Andries; Bergsma, Wicher P.
2010-01-01
In contrast to dichotomous item response theory (IRT) models, most well-known polytomous IRT models do not imply stochastic ordering of the latent trait by the total test score (SOL). This has been thought to make the ordering of respondents on the latent trait using the total test score questionable and throws doubt on the justifiability of using…
ERIC Educational Resources Information Center
Loukina, Anastassia; Zechner, Klaus; Yoon, Su-Youn; Zhang, Mo; Tao, Jidong; Wang, Xinhao; Lee, Chong Min; Mulholland, Matthew
2017-01-01
This report presents an overview of the "SpeechRater"? automated scoring engine model building and evaluation process for several item types with a focus on a low-English-proficiency test-taker population. We discuss each stage of speech scoring, including automatic speech recognition, filtering models for nonscorable responses, and…
Pelzer, Jacquelyn M; Hodgson, Jennifer L; Werre, Stephen R
2014-03-24
The Dundee Ready Education Environment Measure (DREEM) has been widely used to evaluate the learning environment within health sciences education, however, this tool has not been applied in veterinary medical education. The aim of this study was to evaluate the reliability and validity of the DREEM tool in a veterinary medical program and to determine veterinary students' perceptions of their learning environment. The DREEM is a survey tool which quantitatively measures students' perceptions of their learning environment. The survey consists of 50 items, each scored 0-4 on a Likert Scale. The 50 items are subsequently analysed within five subscales related to students' perceptions of learning, faculty (teachers), academic atmosphere, and self-perceptions (academic and social). An overall score is obtained by summing the mean score for each subscale, with an overall possible score of 200. All students in the program were asked to complete the DREEM. Means and standard deviations were calculated for the 50 items, the five subscale scores and the overall score. Cronbach's alpha was determined for the five subscales and overall score to evaluate reliability. Confirmatory factor analysis was used to evaluate construct validity. 224 responses (53%) were received. The Cronbach's alpha for the overall score was 0.93 and for the five subscales were; perceptions of learning 0.85, perceptions of faculty 0.79, perceptions of atmosphere 0.81, academic self-perceptions 0.68, and social self-perceptions 0.72. Construct validity was determined to be acceptable (p < 0.001) and all items contributed to the overall validity of the DREEM. The overall DREEM score was 128.9/200, which is a positive result based on the developers' descriptors and comparable to other health science education programs. Four individual items of concern were identified by students. In this setting the DREEM was a reliable and valid tool to measure veterinary students' perceptions of their learning environment. The four items identified as concerning originated from four of the five subscales, but all related to workload. Negative perceptions regarding workload is a common concern of students in health education programs. If not addressed, this perception may have an unfavourable impact on veterinary students' learning environment.
Gopinath, Bamini; Russell, Joanna; Flood, Victoria M; Burlutsky, George; Mitchell, Paul
2014-02-01
Nutritional parameters could influence self-perceived health and functional status of older adults. We prospectively determined the association between diet quality and quality of life and activities of daily living. This was an observational cohort study in which total diet scores, reflecting adherence to dietary guidelines, were determined. Dietary intakes were assessed using a food frequency questionnaire at baseline. Total diet scores were allocated for intake of selected food groups and nutrients for each participant as described in the Australian Guide to Healthy Eating. Higher scores indicated closer adherence to dietary guidelines. In Sydney, Australia, 1,305 and 895 participants (aged ≥ 55 years) with complete data were examined over 5 and 10 years, respectively. The 36-Item Short-Form Survey assesses quality of life and has eight subscales representing dimensions of health and well-being; higher scores reflect better quality of life. Functional status was determined once at the 10-year follow-up by the Older Americans Resources and Services activities of daily living scale. This scale has 14 items: seven items assess basic activities of daily living (eg, eating and walking) and seven items assess instrumental activities of daily living (eg, shopping or housework). Normalized 36-Item Short-Form Survey component scores were used in analysis of covariance to calculate multivariable adjusted mean scores. Logistic regression analysis was used to calculate adjusted odds ratios and 95% CIs to demonstrate the association between total diet score with the 5-year incidence of impaired activities of daily living. Participants in the highest vs lowest quartile of baseline total diet scores had adjusted mean scores 5.6, 4.0, 5.3, and 2.6 units higher in these 36-Item Short-Form Survey domains 5 years later: physical function (P trend=0.003), general health (P trend=0.02), vitality (P trend=0.001), and physical composite score (P trend=0.003), respectively. Participants in the highest vs lowest quartile of baseline total diet scores had 50% reduced risk of impaired instrumental activites of daily living at follow-up (multivariable-adjusted P trend=0.03). Higher diet quality was prospectively associated with better quality of life and functional ability. Copyright © 2014 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
Brännström, K Jonas; Lantz, Johannes; Nielsen, Lars Holme; Olsen, Steen Østergaard
2014-02-01
Outcome measures can be used to improve the quality of the rehabilitation by identifying and understanding which variables influence the outcome. This information can be used to improve outcomes for clients. In clinical practice, pure-tone audiometry, speech reception thresholds (SRTs), and speech discrimination scores (SDSs) in quiet or in noise are common assessments made prior to hearing aid (HA) fittings. It is not known whether SRT and SDS in quiet relate to HA outcome measured with the International Outcome Inventory for Hearing Aids (IOI-HA). The aim of the present study was to investigate the relationship between pure-tone average (PTA), SRT, and SDS in quiet and IOI-HA in both first-time and experienced HA users. SRT and SDS were measured in a sample of HA users who also responded to the IOI-HA. Fifty-eight Danish-speaking adult HA users. The psychometric properties were evaluated and compared to previous studies using the IOI-HA. The associations and differences between the outcome scores and a number of descriptive variables (age, gender, fitted monaurally/binaurally with HA, first-time/experienced HA users, years of HA use, time since last HA fitting, best ear PTA, best ear SRT, or best ear SDS) were examined. A multiple forward stepwise regression analysis was conducted using scores on the separate IOI-HA items, the global score, and scores on the introspection and interaction subscales as dependent variables to examine whether the descriptive variables could predict these outcome measures. Scores on single IOI-HA items, the global score, and scores on the introspection (items 1, 2, 4, and 7) and interaction (items 3, 5, and 6) subscales closely resemble those previously reported. Multiple regression analysis showed that the best ear SDS predicts about 18-19% of the outcome on items 3 and 5 separately, and about 16% on the interaction subscale (sum of items 3, 5, and 6) CONCLUSIONS: The best ears SDS explains some of the variance displayed in the IOI-HA global score and the interaction subscale. The relation between SDS and IOI-HA suggests that a poor unaided SDS might in itself be a limiting factor for the HA rehabilitation efficacy and hence the IOI-HA outcome. The clinician could use this information to align the user's HA expectations to what is within possible reach. American Academy of Audiology.
Lee, Kyoung Suk; Moser, Debra K; Pelter, Michele; Biddle, Martha J; Dracup, Kathleen
2017-05-01
Comorbid depression in patients with heart failure is associated with increased risk for death. In order to effectively identify depressed patients with cardiac disease, the American Heart Association suggests a 2-step screening method: administering the 2-item Patient Health Questionnaire first and then the 9-item Patient Health Questionnaire. However, whether the 2-step method is better for predicting poor prognosis in heart failure than is either the 2-item or the 9-item tool alone is not known. To determine whether the 2-step method is better than either the 2-item or the 9-item questionnaire alone for predicting all-cause mortality in heart failure. During a 2-year period, 562 patients with heart failure were assessed for depression by using the 2-step method. With the 2-step method, results are considered positive if patients endorse either depressed mood or anhedonia on the 2-item screen and have scores of 10 or higher on the 9-item screen. Screening results with the 2-step method were not associated with all-cause mortality. Patients with scores positive for depression on either the 2-item or 9-item screen alone had 53% and 60% greater risk, respectively, for all-cause death than did patients with scores negative for depression after adjustments for covariates (hazard ratio, 1.530; 95% CI, 1.029-2.274 for the 2-item screen; hazard ratio, 1.603; 95% CI, 1.079-2.383 for the 9-item screen). The 2-step method has no clear advantages compared with the 2-item screen alone or the 9-item screen alone for predicting adverse prognostic effects of depressive symptoms in heart failure. ©2017 American Association of Critical-Care Nurses.
A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.
Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily
2018-02-23
The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.
Grassi, Mario; Nucera, Andrea
2010-01-01
The objective of this study was twofold: 1) to confirm the hypothetical eight scales and two-component summaries of the questionnaire Short Form 36 Health Survey (SF-36), and 2) to evaluate the performance of two alternative measures to the original physical component summary (PCS) and mental component summary (MCS). We performed principal component analysis (PCA) based on 35 items, after optimal scaling via multiple correspondence analysis (MCA), and subsequently on eight scales, after standard summative scoring. Item-based summary measures were planned. Data from the European Community Respiratory Health Survey II follow-up of 8854 subjects from 25 centers were analyzed to cross-validate the original and the novel PCS and MCS. Overall, the scale- and item-based comparison indicated that the SF-36 scales and summaries meet the supposed dimensionality. However, vitality, social functioning, and general health items did not fit data optimally. The novel measures, derived a posteriori by unit-rule from an oblique (correlated) MCA/PCA solution, are simple item sums or weighted scale sums where the weights are the raw scale ranges. These item-based scores yielded consistent scale-summary results for outliers profiles, with an expected known-group differences validity. We were able to confirm the hypothesized dimensionality of eight scales and two summaries of the SF-36. The alternative scoring reaches at least the same required standards of the original scoring. In addition, it can reduce the item-scale inconsistencies without loss of predictive validity.
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
2015-08-19
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
Handling Dynamic Weights in Weighted Frequent Pattern Mining
NASA Astrophysics Data System (ADS)
Ahmed, Chowdhury Farhan; Tanbeer, Syed Khairuzzaman; Jeong, Byeong-Soo; Lee, Young-Koo
Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.
Marfeo, Elizabeth E; Ni, Pengsheng; Chan, Leighton; Rasch, Elizabeth K; Jette, Alan M
2014-07-01
The goal of this article was to investigate optimal functioning of using frequency vs. agreement rating scales in two subdomains of the newly developed Work Disability Functional Assessment Battery: the Mood & Emotions and Behavioral Control scales. A psychometric study comparing rating scale performance embedded in a cross-sectional survey used for developing a new instrument to measure behavioral health functioning among adults applying for disability benefits in the United States was performed. Within the sample of 1,017 respondents, the range of response category endorsement was similar for both frequency and agreement item types for both scales. There were fewer missing values in the frequency items than the agreement items. Both frequency and agreement items showed acceptable reliability. The frequency items demonstrated optimal effectiveness around the mean ± 1-2 standard deviation score range; the agreement items performed better at the extreme score ranges. Findings suggest an optimal response format requires a mix of both agreement-based and frequency-based items. Frequency items perform better in the normal range of responses, capturing specific behaviors, reactions, or situations that may elicit a specific response. Agreement items do better for those whose scores are more extreme and capture subjective content related to general attitudes, behaviors, or feelings of work-related behavioral health functioning. Copyright © 2014 Elsevier Inc. All rights reserved.
Paige, Samantha R; Krieger, Janice L; Stellefson, Michael; Alber, Julia M
2017-02-01
Chronic disease patients are affected by low computer and health literacy, which negatively affects their ability to benefit from access to online health information. To estimate reliability and confirm model specifications for eHealth Literacy Scale (eHEALS) scores among chronic disease patients using Classical Test (CTT) and Item Response Theory techniques. A stratified sample of Black/African American (N=341) and Caucasian (N=343) adults with chronic disease completed an online survey including the eHEALS. Item discrimination was explored using bi-variate correlations and Cronbach's alpha for internal consistency. A categorical confirmatory factor analysis tested a one-factor structure of eHEALS scores. Item characteristic curves, in-fit/outfit statistics, omega coefficient, and item reliability and separation estimates were computed. A 1-factor structure of eHEALS was confirmed by statistically significant standardized item loadings, acceptable model fit indices (CFI/TLI>0.90), and 70% variance explained by the model. Item response categories increased with higher theta levels, and there was evidence of acceptable reliability (ω=0.94; item reliability=89; item separation=8.54). eHEALS scores are a valid and reliable measure of self-reported eHealth literacy among Internet-using chronic disease patients. Providers can use eHEALS to help identify patients' eHealth literacy skills. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Validation of Gujarati Version of ABILOCO-Kids Questionnaire.
Diwan, Shraddha; Diwan, Jasmin; Patel, Pankaj; Bansal, Ankita B
2015-10-01
ABILOCO-Kids is a measure of locomotion ability for children with cerebral palsy (CP) aged 6 to 15 years & is available in English & French. To validate the Gujarati version of ABILOCO-Kids questionnaire to be used in clinical research on Gujarati population. ABILOCO-Kids questionnaire was translated into Gujarati from English using forward-backward-forward method. To ensure face & content validity of Gujarati version using group consensus method, each item was examined by group of experts having mean experience of 24.62 years in field of paediatric and paediatric physiotherapy. Each item was analysed for content, meaning, wording, format, ease of administration & scoring. Each item was scored by expert group as either accepted, rejected or accepted with modification. Procedure was continued until 80% of consensus for all items. Concurrent validity was examined on 55 children with Cerebral Palsy (6-15 years) of all Gross Motor Functional Classification System (GMFCS) level & all clinical types by correlating score of ABILOCO-Kids with Gross Motor Functional Measure & GMFCS. In phase 1 of validation, 16 items were accepted as it is; 22 items accepted with modification & 3 items went for phase 2 validation. For concurrent validity, highly significant positive correlation was found between score of ABILOCO-Kids & total GMFM (r=0.713, p<0.005) & highly significant negative correlation with GMFCS (r= -0.778, p<0.005). Gujarati translated version of ABILOCO-Kids questionnaire has good face & content validity as well as concurrent validity which can be used to measure caregiver reported locomotion ability in children with CP.
Application of the diligence inventory in dental education.
Jasinevicius, T R; Bernard, H; Schuttenberg, E M
1998-04-01
The fifty-five-item Diligence Inventory for Higher Education (DI-HE) was applied to a new subject group--190 dental students. After item and factor analysis, a fifty-item (four subscale) inventory best reflected this group. The DI-HE's split half reliability was 0.81 (p < 0.001), the reliability coefficient for the pre- and post-test was 0.68 (p < 0.01), and the correlation coefficient alpha was 0.90. The DI-HE scores were high, with no statistical differences among the four classes. Overall, significant relationships were found between grade point averages (GPAs) and DI-HE total and subscale scores, with r values as high as 0.44. While female students' DI-HE scores were significantly higher (p = 0.023) than male students' scores, no correlations between DI-HE scores and GPAs for females were found. The results suggest that DI-HE may be useful for assessment purposes in professional education.
An analysis of the masking of speech by competing speech using self-report data.
Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot
2009-01-01
Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
[Development of a cell phone addiction scale for korean adolescents].
Koo, Hyun Young
2009-12-01
This study was done to develop a cell phone addiction scale for Korean adolescents. The process included construction of a conceptual framework, generation of initial items, verification of content validity, selection of secondary items, preliminary study, and extraction of final items. The participants were 577 adolescents in two middle schools and three high schools. Item analysis, factor analysis, criterion related validity, and internal consistency were used to analyze the data. Twenty items were selected for the final scale, and categorized into 3 factors explaining 55.45% of total variance. The factors were labeled as withdrawal/tolerance (7 items), life dysfunction (6 items), and compulsion/persistence (7 items). The scores for the scale were significantly correlated with self-control, impulsiveness, and cell phone use. Cronbach's alpha coefficient for the 20 items was .92. Scale scores identified students as cell phone addicted, heavy users, or average users. The above findings indicate that the cell phone addiction scale has good validity and reliability when used with Korean adolescents.
Chen, Wei; Shu, Liang; Wang, Qian; Pan, Hui; Wu, Jing; Fang, Jie; Sun, Xu-Hong; Zhai, Yu; Dong, You-Rong; Liu, Jian-Ren
2016-08-01
As possible candidate screening instruments for benign paroxysmal positional vertigo (BPPV), studies to validate the Dizziness Handicap Inventory (DHI) sub-scale (5-item and 2-item) and total scores are rare in China. From May 2014 to December 2014, 108(55 with and 53 without BPPV) patients complaining of episodic vertigo in the past week from a vertigo outpatient clinic were enrolled for DHI evaluation, as well as demographic and other clinical data. Objective BPPV was subsequently determined by positional evoking maneuvers under the record of optical Frenzel glasses. Cronbach's coefficient α was used to evaluate the reliability of psychometric scales. The validity of DHI total, 5-item and 2-item questionnaires to screen for BPPV was assessed by receiver operating characteristic (ROC) curves. It revealed that the DHI 5-item questionnaire had good internal consistency (Cronbach's coefficient α = 0.72). Area under the curve of total DHI, 5-item and 2-item scores for discriminating BPPV from those without was 0.678 (95 % CI 0.578-0.778), 0.873(95 % CI 0.807-0.940) and 0.895(95 % CI 0.836-0.953), respectively. It revealed 74.5 % sensitivity and 88.7 % specificity in separating BPPV and those without, with a cutoff value of 12 in the 5-item questionnaire. The corresponding rate of sensitivity and specificity was 78.2 and 88.7 %, respectively, with a cutoff value of 6 in 2-item questionnaire. The present study indicated that both 5-item and 2-item questionnaires in the Chinese version of DHI may be more valid than DHI total score for screening objective BPPV and merit further application in clinical practice in China.
Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Bombardier, Charles H; Pohlig, Ryan T; Heinemann, Allen W; Carle, Adam; Choi, Seung W
2015-05-01
To develop a calibrated spinal cord injury-quality of life (SCI-QOL) item bank, computer adaptive test (CAT), and short form to assess depressive symptoms experienced by individuals with SCI, transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a crosswalk to the Patient Health Questionnaire (PHQ)-9. We used grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, item response theory (IRT) analyses, and statistical linking techniques to transform scores to a PROMIS metric and to provide a crosswalk with the PHQ-9. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. Spinal Cord Injury--Quality of Life (SCI-QOL) Depression Item Bank Individuals with SCI were involved in all phases of SCI-QOL development. A sample of 716 individuals with traumatic SCI completed 35 items assessing depression, 18 of which were PROMIS items. After removing 7 non-PROMIS items, factor analyses confirmed a unidimensional pool of items. We used a graded response IRT model to estimate slopes and thresholds for the 28 retained items. The SCI-QOL Depression measure correlated 0.76 with the PHQ-9. The SCI-QOL Depression item bank provides a reliable and sensitive measure of depressive symptoms with scores reported in terms of general population norms. We provide a crosswalk to the PHQ-9 to facilitate comparisons between measures. The item bank may be administered as a CAT or as a short form and is suitable for research and clinical applications.
Systematic evaluation of clinical practice guidelines for pharmacogenomics.
Beckett, Robert D; Kisor, David F; Smith, Thomas; Vonada, Brooke
2018-06-01
To systematically assess methodological quality of pharmacogenomics clinical practice guidelines. Guidelines published through 2017 were reviewed by at least three independent reviewers using the AGREE II instrument, which consists of 23 items grouped into 6 domains and 2 items representing an overall assessment. Items were assessed on a seven-point rating scale, and aggregate quality scores were calculated. 31 articles were included. All guidelines were published as peer-reviewed articles and 90% (n = 28) were endorsed by professional organizations. Mean AGREE II domain scores (maximum score 100%) ranged from 46.6 ± 11.5% ('applicability') to 78.9 ± 11.4% ('clarity of presentation'). Median overall quality score was 72.2% (IQR: 61.1-77.8%). Quality of pharmacogenomics guidelines was generally high, but variable, for most AGREE II domains.
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Doostvandi, Tayebeh; Bahadoran, Zahra; Mozaffari-Khosravi, Hassan; Tahmasebinejad, Zhaleh; Mirmiran, Parvin; Azizi, Fereidoun
2017-05-01
The aim of this study was to investigate the relationship between major dietary patterns and the risk of insulin resistance (IR) among an urban Iranian population. In this longitudinal study, 802 adult men and women were studied within the framework of Tehran Lipid and Glucose Study. Fasting serum insulin and glucose were measured at baseline and again after a 3-year of followup. The usual dietary intakes were assessed using a validated 168 item semi-quantitative food frequency questionnaire and major dietary patterns were obtained using principal component analysis. Logistic regression models were used to estimate the occurrence of IR across tertiles of dietary patterns with adjustment for potential confounding variables. Mean age of participants was 39.0±11.2 years and 45.5% were men. Three major dietary patterns including the Western, traditional and healthy were extracted, which explained 25.3% of total variance in food intake. The healthy dietary pattern, loaded heavily on intake of vegetable oils, fresh and dried fruits, low-fat dairy, nuts and seeds, was accompanied with a reduced risk of insulin resistance by 51% (OR=0.49, 95% CI=0.30-0.81), and 81% (OR=0.19, 95% CI=0.10-0.36), in the second and third tertile, respectively (p trend=0.001). In the presence of all dietary pattern scores in the logistic regression model, a 45% reduced risk of IR was observed per 1 unit increase in healthy dietary pattern score. These findings confirmed the protective effect of a plant-based, low-fat dietary pattern against the development of insulin resistance as a main risk factor of type 2 diabetes and metabolic disorders.
A rasch analysis of the Manchester foot pain and disability index
Muller, Sara; Roddy, Edward
2009-01-01
Background There is currently no interval-level measure of foot-related disability and this has hampered research in this area. The Manchester Foot Pain and Disability Index (FPDI) could potentially fill this gap. Objective To assess the fit of the three subscales (function, pain, appearance) of the FPDI to the Rasch unidimensional measurement model in order to form interval-level scores. Methods A two-stage postal survey at a general practice in the UK collected data from 149 adults aged 50 years and over with foot pain. The 17 FPDI items, in three subscales, were assessed for their fit to the Rasch model. Checks were carried out for differential item functioning by age and gender. Results The function and pain items fit the Rasch model and interval-level scores can be constructed. There were too few people without extreme scores on the appearance subscale to allow fit to the Rasch model to be tested. Conclusion The items from the FPDI function and pain subscales can be used to obtain interval level scores for these factors for use in future research studies in older adults. Further work is needed to establish the interval nature of these subscale scores in more diverse populations and to establish the measurement properties of these interval-level scores. PMID:19878536
Braend, Anja Maria; Gran, Sarah Frandsen; Frich, Jan C; Lindbaek, Morten
2010-01-01
Formative assessment of medical students' clinical performance during general practice clerkship is necessary to learn consultation skills. Our aim was to triangulate feedback using patient questionnaires, written self-assessment and teachers' observation-based assessment, and to describe the content of this feedback. We developed StudentPEP, a 15-item version of EUROPEP, a tool for measuring patients' evaluation of quality in general practice. The teacher and student forms consisted of five StudentPEP-items and open-ended questions asking for approval and improvement needed on four aspects. Quantitative scores were analyzed statistically. Free-text comments were analyzed and categorized into 'specific and concrete' versus 'general and unspecific'. One hundred seventy-three students returned data from 2643 consultations. Mean patients' scores for 15 items were 4.3-4.8 on a five-point Likert scale. Mean teacher scores were 4.4 on five items, while students' mean self-assessments were 3.6-3.8. In an analysis of 380 consultations, students were more specific and concrete in their self-evaluation compared with teachers (p < 0.01). Patients scored students' performance high compared with students' self-assessments. Teachers' scores were in accordance with patients' scores. Teachers' written evaluations of students were often general. There is a potential for improving teachers' feedback in terms of more specific and concrete comments.
Statistical power as a function of Cronbach alpha of instrument questionnaire items.
Heo, Moonseong; Kim, Namhee; Faith, Myles S
2015-10-14
In countless number of clinical trials, measurements of outcomes rely on instrument questionnaire items which however often suffer measurement error problems which in turn affect statistical power of study designs. The Cronbach alpha or coefficient alpha, here denoted by C(α), can be used as a measure of internal consistency of parallel instrument items that are developed to measure a target unidimensional outcome construct. Scale score for the target construct is often represented by the sum of the item scores. However, power functions based on C(α) have been lacking for various study designs. We formulate a statistical model for parallel items to derive power functions as a function of C(α) under several study designs. To this end, we assume fixed true score variance assumption as opposed to usual fixed total variance assumption. That assumption is critical and practically relevant to show that smaller measurement errors are inversely associated with higher inter-item correlations, and thus that greater C(α) is associated with greater statistical power. We compare the derived theoretical statistical power with empirical power obtained through Monte Carlo simulations for the following comparisons: one-sample comparison of pre- and post-treatment mean differences, two-sample comparison of pre-post mean differences between groups, and two-sample comparison of mean differences between groups. It is shown that C(α) is the same as a test-retest correlation of the scale scores of parallel items, which enables testing significance of C(α). Closed-form power functions and samples size determination formulas are derived in terms of C(α), for all of the aforementioned comparisons. Power functions are shown to be an increasing function of C(α), regardless of comparison of interest. The derived power functions are well validated by simulation studies that show that the magnitudes of theoretical power are virtually identical to those of the empirical power. Regardless of research designs or settings, in order to increase statistical power, development and use of instruments with greater C(α), or equivalently with greater inter-item correlations, is crucial for trials that intend to use questionnaire items for measuring research outcomes. Further development of the power functions for binary or ordinal item scores and under more general item correlation strutures reflecting more real world situations would be a valuable future study.
Kallen, Michael A; Cook, Karon F; Amtmann, Dagmar; Knowlton, Elizabeth; Gershon, Richard C
2018-05-05
To evaluate the degree to which applying alternative stopping rules would reduce response burden while maintaining score precision in the context of computer adaptive testing (CAT). Analyses were conducted on secondary data comprised of CATs administered in a clinical setting at multiple time points (baseline and up to two follow ups) to 417 study participants who had back pain (51.3%) and/or depression (47.0%). Participant mean age was 51.3 years (SD = 17.2) and ranged from 18 to 86. Participants tended to be white (84.7%), relatively well educated (77% with at least some college), female (63.9%), and married or living in a committed relationship (57.4%). The unit of analysis was individual assessment histories (i.e., CAT item response histories) from the parent study. Data were first aggregated across all individuals, domains, and time points in an omnibus dataset of assessment histories and then were disaggregated by measure for domain-specific analyses. Finally, assessment histories within a "clinically relevant range" (score ≥ 1 SD from the mean in direction of poorer health) were analyzed separately to explore score level-specific findings. Two different sets of CAT administration rules were compared. The original CAT (CAT ORIG ) rules required at least four and no more than 12 items be administered. If the score standard error (SE) reached a value < 3 points (T score metric) before 12 items were administered, the CAT was stopped. We simulated applying alternative stopping rules (CAT ALT ), removing the requirement that a minimum four items be administered, and stopped a CAT if responses to the first two items were both associated with best health, if the SE was < 3, if SE change < 0.1 (T score metric), or if 12 items were administered. We then compared score fidelity and response burden, defined as number of items administered, between CAT ORIG and CAT ALT . CAT ORIG and CAT ALT scores varied little, especially within the clinically relevant range, and response burden was substantially lower under CAT ALT (e.g., 41.2% savings in omnibus dataset). Alternate stopping rules result in substantial reductions in response burden with minimal sacrifice in score precision.
Attitudes of medical students toward psychiatry in a Chilean medical school.
Valdivieso, Sergio; Sirhan, Marisol; Aguirre, Constanza; Ivelic, Jose Antonio; Aillach, Emilio; Villarroel, Luis
2014-06-01
The authors assess the attitudes of seventh-year medical students with regard to psychiatry and patients with psychiatric illness during the psychiatry clerkship. A 32-item questionnaire regarding attitudes toward psychiatry and patients with psychiatric illness was administered at the beginning of the psychiatry clerkship. One hundred and ten seventh-year students participated in the study, providing responses anonymously. Average negative attitude item score was 2.45 ± 0.3 (range 1.7-3.3). Eighty-three students (75 %) responded to all the questions with an average negative attitude item score of 2.43 ± 0.3 (range 1.7-3.3) and a total negative attitude item score of 77.9 ± 10.3 (range 55-104). Undergraduate students of a Chilean medical school showed fairly positive attitudes toward psychiatry and toward patients with psychiatric illness.
Three-dimensional structural representation of the sleep-wake adaptability.
Putilov, Arcady A
2016-01-01
Various characteristics of the sleep-wake cycle can determine the success or failure of individual adjustment to certain temporal conditions of the today's society. However, it remains to be explored how many such characteristics can be self-assessed and how they are inter-related one to another. The aim of the present report was to apply a three-dimensional structural representation of the sleep-wake adaptability in the form of "rugby cake" (scalene or triaxial ellipsoid) to explain the results of analysis of the pattern of correlations of the responses to the initial 320-item list of a new inventory with scores on the six scales designed for multidimensional self-assessment of the sleep-wake adaptability (Morning and Evening Lateness, Anytime and Nighttime Sleepability, and Anytime and Daytime Wakeability). The results obtained for sample consisting of 149 respondents were confirmed by the results of similar analysis of earlier collected responses of 139 respondents to the same list of 320 items and responses of 1213 respondents to the 72 items of one of the earlier established questionnaire tools. Empirical evidence was provided in support of the model-driven prediction of the possibility to identify items linked to as many as 36 narrow (6 core and 30 mixed) adaptabilities of the sleep-wake cycle. The results enabled the selection of 168 items for self-assessment of all these adaptabilities predicted by the rugby cake model.
Citrome, Leslie; Landbloom, Ronald; Chang, Cheng-Tao; Earley, Willie
2017-01-01
Bipolar disorder is associated with an increased risk of aggression. However, effective management of hostility and/or agitation symptoms may prevent patients from becoming violent. This analysis investigated the efficacy of the antipsychotic asenapine on hostility and agitation in patients with bipolar I disorder. Data were pooled from three randomized, double-blind, placebo-controlled, Phase III trials of asenapine in adults with manic or mixed episodes of bipolar I disorder (NCT00159744, NCT00159796, and NCT00764478). Post hoc analyses assessed the changes from baseline to day 21 on the Young Mania Rating Scale (YMRS) and the Positive and Negative Syndrome Scale (PANSS) hostility-related item scores in asenapine- or placebo-treated patients with at least minimal or mild symptom severity and on the PANSS-excited component (PANSS-EC) total score in agitated patients. Changes were adjusted for improvements in overall mania symptoms to investigate direct effects on hostility. Significantly greater changes in favor of asenapine versus placebo were observed in YMRS hostility-related item scores (irritability: least squares mean difference [95% confidence interval] =-0.5 [-0.87, -0.22], P =0.001; disruptive-aggressive behavior: -0.7 [-0.99, -0.37], P <0.0001), PANSS hostility item score (-0.2 [-0.44, -0.04]; P =0.0181), and PANSS-EC total score (-1.4 [-2.4, -0.4]; P =0.0055). Changes in the YMRS disruptive-aggressive behavior score and the sum of the hostility-related items remained significant after adjusting for improvements in other YMRS item scores. Asenapine significantly reduced hostility and agitation in patients with bipolar I disorder; improvement was at least partially independent of overall improvement on mania symptoms.
Kasper, Judith D.; Brandt, Jason; Pezzin, Liliana E.
2012-01-01
Objective. To examine the measurement equivalence of items on disability across three international surveys of aging. Method. Data for persons aged 65 and older were drawn from the Health and Retirement Survey (HRS, n = 10,905), English Longitudinal Study of Aging (ELSA, n = 5,437), and Survey of Health, Ageing and Retirement in Europe (SHARE, n = 13,408). Differential item functioning (DIF) was assessed using item response theory (IRT) methods for activities of daily living (ADL) and instrumental activities of daily living (IADL) items. Results. HRS and SHARE exhibited measurement equivalence, but 6 of 11 items in ELSA demonstrated meaningful DIF. At the scale level, this item-level DIF affected scores reflecting greater disability. IRT methods also spread out score distributions and shifted scores higher (toward greater disability). Results for mean disability differences by demographic characteristics, using original and DIF-adjusted scores, were the same overall but differed for some subgroup comparisons involving ELSA. Discussion. Testing and adjusting for DIF is one means of minimizing measurement error in cross-national survey comparisons. IRT methods were used to evaluate potential measurement bias in disability comparisons across three international surveys of aging. The analysis also suggested DIF was mitigated for scales including both ADL and IADL and that summary indexes (counts of limitations) likely underestimate mean disability in these international populations. PMID:22156662
Rustemeyer, Jan; Gregersen, Johanne
2012-07-01
The objective of this prospective study was to assess changes of Quality of Life (QoL) in patients undergoing bimaxillary orthognathic surgery. Questionnaires were based on the Oral Health Impact Profile (OHIP, items OH-1-OH-14) and three additional questions (items AD-1-3), and were completed by patients (n=50; mean age 26.9±9.9 years) on average 9.1±2.4 months before surgery, and 12.1±1.4 months after surgery, using a scoring scale. Item scores describing functional limitation, physical pain, physical disability and chewing function did not change significantly, whereas item scores covering psychological discomfort and social disability domains revealed significant decreases following surgery. AD-2 "dissatisfying aesthetics" revealed the greatest difference between pre- and post-surgical scores (p<0.001). If there was a perception of aesthetic improvement of facial features post-surgery, the benefit in QoL was generally high. The significant correlation of the pre- to post-surgical changes of item OH-5 "self conscious" to nearly all other item changes suggested that OH-5 was the most sensitive indicator for post-surgical improvement of QoL. Psychological factors and aesthetics exerted a strong influence on the patients' QoL, and determined major changes more than functional aspects did. Copyright © 2011 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Assessing depression outcome in patients with moderate dementia: sensitivity of the HoNOS65+ scale.
Canuto, Alessandra; Rudhard-Thomazic, Valérie; Herrmann, François R; Delaloye, Christophe; Giannakopoulos, Panteleimon; Weber, Kerstin
2009-08-15
To date, there is no widely accepted clinical scale to monitor the evolution of depressive symptoms in demented patients. We assessed the sensitivity to treatment of a validated French version of the Health of the Nation Outcome Scale (HoNOS) 65+ compared to five routinely used scales. Thirty elderly inpatients with ICD-10 diagnosis of dementia and depression were evaluated at admission and discharge using paired t-test. Using the Brief Psychiatric Rating Scale (BPRS) "depressive mood" item as gold standard, a receiver operating characteristic curve (ROC) analysis assessed the validity of HoNOS65+F "depressive symptoms" item score changes. Unlike Geriatric Depression Scale, Mini Mental State Examination and Activities of Daily Living scores, BPRS scores decreased and Global Assessment Functioning Scale score increased significantly from admission to discharge. Amongst HoNOS65+F items, "behavioural disturbance", "depressive symptoms", "activities of daily life" and "drug management" items showed highly significant changes between the first and last day of hospitalization. The ROC analysis revealed that changes in the HoNOS65+F "depressive symptoms" item correctly classified 93% of the cases with good sensitivity (0.95) and specificity (0.88) values. These data suggest that the HoNOS65+F "depressive symptoms" item may provide a valid assessment of the evolution of depressive symptoms in demented patients.
Physical performance testing in mucopolysaccharidosis I: a pilot study.
Dumas, Helene M; Fragala, Maria A; Haley, Stephen M; Skrinar, Alison M; Wraith, James E; Cox, Gerald F
2004-01-01
To develop and field-test a physical performance measure (MPS-PPM) for individuals with Mucopolysaccharidosis I (MPS I), a rare genetic disorder. Motor performance and endurance items were developed based on literature review, clinician feedback, feasibility, and equipment and training needs. A standardized testing protocol and scoring rules were created. The MPS-PPM includes: Arm Function (7 items), Leg Function (5 items), and Endurance (2 items). Pilot data were collected for 10 subjects (ages 5-29 years). We calculated Spearman's rho correlations between age, severity and summary z-scores on the MPS-PPM. Subjects had variable presentations, as correlations among the three sub-test scores were not significant. Increasing age was related to greater severity in physical performance (r = 0.72, p<0.05) and lower scores on the Leg Function (r = -0.67, p<0.05) and Endurance (r = -0.65, p<0.05) sub-tests. The MPS-PPM was sensitive to detecting physical performance deficits, as six subjects could not complete the full battery of Arm Function items and eight subjects were unable to complete all Leg Function items. Subjects walked more slowly and expended more energy than typically developing peers. Individuals with MPS I have difficulty with arm and leg function and reduced endurance. The MPS-PPM is a clinically feasible measure that detects limitations in physical performance and may have potential to quantify changes in function following intervention. Copyright 2004 Taylor and Francis Ltd.
When less is more: validating a brief scale to rate interprofessional team competencies.
Lie, Désirée A; Richter-Lagha, Regina; Forest, Christopher P; Walsh, Anne; Lohenry, Kevin
2017-01-01
There is a need for validated and easy-to-apply behavior-based tools for assessing interprofessional team competencies in clinical settings. The seven-item observer-based Modified McMaster-Ottawa scale was developed for the Team Objective Structured Clinical Encounter (TOSCE) to assess individual and team performance in interprofessional patient encounters. We aimed to improve scale usability for clinical settings by reducing item numbers while maintaining generalizability; and to explore the minimum number of observed cases required to achieve modest generalizability for giving feedback. We administered a two-station TOSCE in April 2016 to 63 students split into 16 newly-formed teams, each consisting of four professions. The stations were of similar difficulty. We trained sixteen faculty to rate two teams each. We examined individual and team performance scores using generalizability (G) theory and principal component analysis (PCA). The seven-item scale shows modest generalizability (.75) with individual scores. PCA revealed multicollinearity and singularity among scale items and we identified three potential items for removal. Reducing items for individual scores from seven to four (measuring Collaboration, Roles, Patient/Family-centeredness, and Conflict Management) changed scale generalizability from .75 to .73. Performance assessment with two cases is associated with reasonable generalizability (.73). Students in newly-formed interprofessional teams show a learning curve after one patient encounter. Team scores from a two-station TOSCE demonstrate low generalizability whether the scale consisted of four (.53) or seven items (.55). The four-item Modified McMaster-Ottawa scale for assessing individual performance in interprofessional teams retains the generalizability and validity of the seven-item scale. Observation of students in teams interacting with two different patients provides reasonably reliable ratings for giving feedback. The four-item scale has potential for assessing individual student skills and the impact of IPE curricula in clinical practice settings. IPE: Interprofessional education; SP: Standardized patient; TOSCE: Team objective structured clinical encounter.
Functional recovery in patients with schizophrenia: recommendations from a panel of experts.
Lahera, Guillermo; Gálvez, José L; Sánchez, Pedro; Martínez-Roig, Miguel; Pérez-Fuster, J V; García-Portilla, Paz; Herrera, Berta; Roca, Miquel
2018-06-05
The management of schizophrenia is evolving towards a more comprehensive model based on functional recovery. The concept of functional recovery goes beyond clinical remission and encompasses multiple aspects of the patient's life, making it difficult to settle on a definition and to develop reliable assessment criteria. In this consensus process based on a panel of experts in schizophrenia, we aimed to provide useful insights on functional recovery and its involvement in clinical practice and clinical research. After a literature review of functional recovery in schizophrenia, a scientific committee of 8 members prepared a 75-item questionnaire, including 6 sections: (I) the concept of functional recovery (9 items), (II) assessment of functional recovery (23 items), (III) factors influencing functional recovery (16 items), (IV) psychosocial interventions and functional recovery (8 items), (V) pharmacological treatment and functional recovery (14 items), and (VI) the perspective of patients and their relatives on functional recovery (5 items). The questionnaire was sent to a panel of 53 experts, who rated each item on a 9-point Likert scale. Consensus was achieved in a 2-round Delphi dynamics, using the median (interquartile range) scores to consider consensus in either agreement (scores 7-9) or disagreement (scores 1-3). Items not achieving consensus in the first round were sent back to the experts for a second consideration. After the two recursive rounds, consensus was achieved in 64 items (85.3%): 61 items (81.3%) in agreement and 3 (4.0%) in disagreement, all of them from section II (assessment of functional recovery). Items not reaching consensus were related to the concepts of functional recovery (1 item, 1.3%), functional assessment (5 items, 6.7%), factors influencing functional recovery (3 items, 4.0%), and psychosocial interventions (2 items, 5.6%). Despite the lack of a well-defined concept of functional recovery, we identified a trend towards a common archetype of the definition and factors associated with functional recovery, as well as its applicability in clinical practice and clinical research.
Modeling Incorrect Responses to Multiple-Choice Items with Multilinear Formula Score Theory.
ERIC Educational Resources Information Center
Drasgow, Fritz; And Others
This paper addresses the information revealed in incorrect option selection on multiple choice items. Multilinear Formula Scoring (MFS), a theory providing methods for solving psychological measurement problems of long standing, is first used to estimate option characteristic curves for the Armed Services Vocational Aptitude Battery Arithmetic…
Hidden Item Variance in Multiple Mini-Interview Scores
ERIC Educational Resources Information Center
Zaidi, Nikki L.; Swoboda, Christopher M.; Kelcey, Benjamin M.; Manuel, R. Stephen
2017-01-01
The extant literature has largely ignored a potentially significant source of variance in multiple mini-interview (MMI) scores by "hiding" the variance attributable to the sample of attributes used on an evaluation form. This potential source of hidden variance can be defined as rating items, which typically comprise an MMI evaluation…
34 CFR 200.8 - Assessment reports.
Code of Federal Regulations, 2013 CFR
2013-07-01
... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
34 CFR 200.8 - Assessment reports.
Code of Federal Regulations, 2014 CFR
2014-07-01
... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
34 CFR 200.8 - Assessment reports.
Code of Federal Regulations, 2012 CFR
2012-07-01
... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
34 CFR 200.8 - Assessment reports.
Code of Federal Regulations, 2010 CFR
2010-07-01
... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
34 CFR 200.8 - Assessment reports.
Code of Federal Regulations, 2011 CFR
2011-07-01
... assessment is given; (ii) In an understandable and uniform format, including an alternative format (e.g... understand. (b) Itemized score analyses for LEAs and schools. (1) A State's academic assessment system must produce and report to LEAs and schools itemized score analyses, consistent with § 200.2(b)(4), so that...
Whitney, Susan L; Marchetti, Gregory F; Morris, Laura O
2005-09-01
The purpose of the study was to determine whether a newly developed subscale of the Dizziness Handicap Inventory (DHI) could assist in the screening of benign paroxysmal positional vertigo (BPPV). Retrospective case review. Tertiary balance referral center. Charts of 383 patients (mean age, 61 yr) with a variety of vestibular diagnoses (peripheral and central) were reviewed. Patients completed the DHI before the onset of physical therapy intervention. A newly developed BPPV subscale developed from current DHI items was computed to determine whether the score could assist the practitioner in identifying individuals with BPPV. Individuals with BPPV had significantly higher mean scores on the newly developed BPPV subscale of the DHI (p < 0.01). The five-item BPPV score was a significant predictor of the likelihood of having BPPV (chi2 = 8.35; p < 0.01). On the two-item BPPV scale, individuals who had a score of 8 of 8 were 4.3 times more likely to have BPPV compared with individuals who had a score of 0. Items on the DHI appear to be helpful in determining the likelihood of an individual having the diagnosis of BPPV.
Calibration of the Spanish PROMIS Smoking Item Banks.
Huang, Wenjing; Stucky, Brian D; Edelen, Maria O; Tucker, Joan S; Shadel, William G; Hansen, Mark; Cai, Li
2016-07-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) Smoking Initiative has developed item banks for assessing six smoking behaviors and biopsychosocial correlates of smoking among adult cigarette smokers. The goal of this study is to evaluate the performance of the Spanish version of the PROMIS smoking item banks as compared to the original banks developed in English. The six PROMIS banks for daily smokers were translated into Spanish and administered to a sample of Spanish-speaking adult daily smokers in the United States (N = 302). We first evaluated the unidimensionality of each bank using confirmatory factor analysis. We then conducted a two-group item response theory calibration, including an item response theory-based Differential Item Functioning (DIF) analysis by language of administration (Spanish vs. English). Finally, we generated full bank and short form scores for the translated banks and evaluated their psychometric performance. Unidimensionality of the Spanish smoking item banks was supported by confirmatory factor analysis results. Out of a total of 109 items that were evaluated for language DIF, seven items in three of the six banks were identified as having levels of DIF that exceeded an established criterion. The psychometric performance of the Spanish daily smoker banks is largely comparable to that of the English versions. The Spanish PROMIS smoking item banks are highly similar, but not entirely equivalent, to the original English versions. The parameters from these two-group calibrations can be used to generate comparable bank scores across the two language versions. In this study, we developed a Spanish version of the PROMIS smoking toolkit, which was originally designed and developed for English speakers. With the growing Spanish-speaking population, it is important to make the toolkit more accessible by translating the items and calibrating the Spanish version to be comparable with English-language scores. This study provided the translated item banks and short forms, comparable unbiased scores for Spanish speakers and evaluations of the psychometric properties of the new Spanish toolkit. © The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Multiple determinants of lifespan memory differences.
Henson, Richard N; Campbell, Karen L; Davis, Simon W; Taylor, Jason R; Emery, Tina; Erzinclioglu, Sharon; Kievit, Rogier A
2016-09-07
Memory problems are among the most common complaints as people grow older. Using structural equation modeling of commensurate scores of anterograde memory from a large (N = 315), population-derived sample (www.cam-can.org), we provide evidence for three memory factors that are supported by distinct brain regions and show differential sensitivity to age. Associative memory and item memory are dramatically affected by age, even after adjusting for education level and fluid intelligence, whereas visual priming is not. Associative memory and item memory are differentially affected by emotional valence, and the age-related decline in associative memory is faster for negative than for positive or neutral stimuli. Gray-matter volume in the hippocampus, parahippocampus and fusiform cortex, and a white-matter index for the fornix, uncinate fasciculus and inferior longitudinal fasciculus, show differential contributions to the three memory factors. Together, these data demonstrate the extent to which differential ageing of the brain leads to differential patterns of memory loss.
Alcohol-Related Facebook Activity Predicts Alcohol Use Patterns in College Students
Marczinski, Cecile A.; Hertzenberg, Heather; Goddard, Perilou; Maloney, Sarah F.; Stamates, Amy L.; O’Connor, Kathleen
2016-01-01
The purpose of this study was to determine if a brief 10-item alcohol-related Facebook® activity (ARFA) questionnaire would predict alcohol use patterns in college students (N = 146). During a single laboratory session, participants first privately logged on to their Facebook® profiles while they completed the ARFA measure, which queries past 30 day postings related to alcohol use and intoxication. Participants were then asked to complete five additional questionnaires: three measures of alcohol use (the Alcohol Use Disorders Identification Test [AUDIT], the Timeline Follow-Back [TLFB], and the Personal Drinking Habits Questionnaire [PDHQ]), the Barratt Impulsiveness Scale (BIS-11), and the Marlowe-Crowne Social Desirability Scale (MC-SDS). Regression analyses revealed that total ARFA scores were significant predictors of recent drinking behaviors, as assessed by the AUDIT, TLFB, and PDHQ measures. Moreover, impulsivity (BIS-11) and social desirability (MC-SDS) did not predict recent drinking behaviors when ARFA total scores were included in the regressions. The findings suggest that social media activity measured via the ARFA scale may be useful as a research tool for identifying risky alcohol use. PMID:28138317
Applications of computerized adaptive testing (CAT) to the assessment of headache impact.
Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew
2003-12-01
To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.
Mulvey, Matthew R; Fawkner, Helen J; Johnson, Mark I
2015-12-01
The aim of this study was to investigate the strength of perceptual embodiment achieved during an adapted version of the rubber hand illusion (RHI) in response to a series of modified transcutaneous electrical nerve stimulation (TENS) pulse patterns with dynamic temporal and spatial characteristics which are more akin to the mechanical brush stroke in the original RHI. A repeated-measures counterbalanced experimental study was conducted where each participant was exposed to four TENS interventions: continuous pattern TENS; burst pattern TENS (fixed frequency of 2 bursts per second of 100 pulses per second); amplitude-modulated pattern TENS (intensity increasing from zero to a preset level, then back to zero again in a cyclical fashion); and sham (no current) TENS. Participants rated the intensity of the RHI using a three-item numerical rating scale (each item was ranked from 0 to 10). Friedman's analysis of ranks (one-factor repeated measure) was used to test the differences in perceptual embodiment between TENS innervations; alpha was set at p ≤ 0.05. There were statistically significant differences in the intensity of misattribution and perceptual embodiment between sham and active TENS interventions, but no significant differences between the three active TENS conditions (amplitude-modulated TENS, burst TENS, and continuous TENS). Amplitude-modulated and burst TENS produced significantly higher intensity scores for misattribution sensation and perceptual embodiment compared with sham (no current) TENS, whereas continuous TENS did not. The findings provide tentative, but not definitive, evidence that TENS parameters with dynamic spatial and temporal characteristics may produce more intense misattribution sensations and intense perceptual embodiment than parameters with static characteristics (e.g., continuous pulse patterns). © 2015 International Neuromodulation Society.
Shafeghat, Hossein; Jafari, Mehdi; Monavarian, Abbas; Shafayi, Maryam; Dehnavieh, Reza
2014-02-01
Labor laws and regulations have inevitable effects on employees' work motivation as well as the overall efficiency and productivity of the organization. This study was conducted to assess the effects of the "Countrywide Services Management Law" on the work motivation level of the employees of the Iranian Ministry of Health. This cross-sectional study was done in 2011 in the Iran's Ministry of Health. Data was collected by a 51-item Likert scale questionnaire, in five domains including: organizational structure, information technology, training patterns, salary and bonus system and re-engineering process. The reliability and validity of the questionnaire was evaluated (Cronbach's alpha= 0.96). Data analysis was conducted using descriptive and inferential statistics (t-test). Out of 192 samples examined, 55.2% of the respondents were female, 88 (45.8%) had BS degree and 116 (60.4%) had less than 10 years' experience. The mean scores in the domains of organizational structure, information technology, training patterns, salary and bonus system and re-engineering patterns were: 3.11, 3.51, 3.05, 3.21 and 3.14, respectively. Relationship between the items related to manpower in the "Countrywide Services Management Law", with employees' work motivation was significant (P < 0.0001). The training patterns did not show a significant relation (P < 0.26) with any of five domains. According to our results and the views of the employees of the Iranian Ministry of Health, "Countrywide Services Management Law" positively affected the personnel's work motivation regarding all the factors associated with motivation including: organizational structure, information technology, training patterns, salary and bonus system and re-engineering pattern. Finally, to enhance the workforce motivation and satisfaction level, application and implementation of the rules and regulations should be based on the organizational needs.
Engeset, Dagrun; Hofoss, Dag; Nilsson, Lena M; Olsen, Anja; Tjønneland, Anne; Skeie, Guri
2015-04-01
To identify dietary patterns with whole grains as a main focus to see if there is a similar whole grain pattern in the three Scandinavian countries; Denmark, Sweden and Norway. Another objective is to see if items suggested for a Nordic Food Index will form a typical Nordic pattern when using factor analysis. The HELGA study population is based on samples of existing cohorts: the Norwegian Women and Cancer Study, the Swedish Västerbotten cohort and the Danish Diet, Cancer and Health study. The HELGA study aims to generate knowledge about the health effects of whole grain foods. The study included a total of 119 913 participants. The associations among food variables from FFQ were investigated by principal component analysis. Only food groups common for all three cohorts were included. High factor loading of a food item shows high correlation of the item to the specific diet pattern. The main whole grain for Denmark and Sweden was rye, while Norway had highest consumption of wheat. Three similar patterns were found: a cereal pattern, a meat pattern and a bread pattern. However, even if the patterns look similar, the food items belonging to the patterns differ between countries. High loadings on breakfast cereals and whole grain oat were common in the cereal patterns for all three countries. Thus, the cereal pattern may be considered a common Scandinavian whole grain pattern. Food items belonging to a Nordic Food Index were distributed between different patterns.
Dalton, Megan; Davidson, Megan; Keating, Jenny
2011-01-01
Is the Assessment of Physiotherapy Practice (APP) a valid instrument for the assessment of entry-level competence in physiotherapy students? Cross-sectional study with Rasch analysis of initial (n=326) and validation samples (n=318). Students were assessed on completion of 4, 5, or 6-week clinical placements across one university semester. 298 clinical educators and 456 physiotherapy students at nine universities in Australia and New Zealand provided 644 completed APP instruments. APP data in both samples showed overall fit to a Rasch model of expected item functioning for interval scale measurement. Item 6 (Written communication) exhibited misfit in both samples, but was retained as an important element of competence. The hierarchy of item difficulty was the same in both samples with items related to professional behaviour and communication the easiest to achieve and items related to clinical reasoning the most difficult. Item difficulty was well targeted to person ability. No Differential Item Functioning was identified, indicating that the scale performed in a comparable way regardless of the student's age, gender or amount of prior clinical experience, and the educator's age, gender, or experience as an educator, or the type of facility, university, or clinical area. The instrument demonstrated unidimensionality confirming the appropriateness of summing the scale scores on each item to provide an overall score of clinical competence and was able to discriminate four levels of professional competence (Person Separation Index=0.96). Person ability and raw APP scores had a linear relationship (r(2)=0.99). Rasch analysis supports the interpretation that a student's APP score is an indication of their underlying level of professional competence in workplace practice. Copyright © 2011 Australian Physiotherapy Association. Published by .. All rights reserved.
Sunderland, Matthew; Slade, Tim; Krueger, Robert F; Markon, Kristian E; Patrick, Christopher J; Kramer, Mark D
2017-07-01
The development of the Externalizing Spectrum Inventory (ESI) was motivated by the need to comprehensively assess the interrelated nature of externalizing psychopathology and personality using an empirically driven framework. The ESI measures 23 theoretically distinct yet related unidimensional facets of externalizing, which are structured under 3 superordinate factors representing general externalizing, callous aggression, and substance abuse. One limitation of the ESI is its length at 415 items. To facilitate the use of the ESI in busy clinical and research settings, the current study sought to examine the efficiency and accuracy of a computerized adaptive version of the ESI. Data were collected over 3 waves and totaled 1,787 participants recruited from undergraduate psychology courses as well as male and female state prisons. A series of 6 algorithms with different termination rules were simulated to determine the efficiency and accuracy of each test under 3 different assumed distributions. Scores generated using an optimal adaptive algorithm evidenced high correlations (r > .9) with scores generated using the full ESI, brief ESI item-based factor scales, and the 23 facet scales. The adaptive algorithms for each facet administered a combined average of 115 items, a 72% decrease in comparison to the full ESI. Similarly, scores on the item-based factor scales of the ESI-brief form (57 items) were generated using on average of 17 items, a 70% decrease. The current study successfully demonstrates that an adaptive algorithm can generate similar scores for the ESI and the 3 item-based factor scales using a fraction of the total item pool. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
MacKeown, Jennifer M; Faber, Mieke
2005-03-01
The objective of the study was to compare the frequency of consumption of cariogenic food items among 4-month-old to 24-month-old children in two neighbouring rural areas in KwaZulu-Natal Province, South Africa: Nyuswa/Embo (Area A) (n = 127) and Ndunakazi (Area B) (n = 105). Dietary intake was assessed using a food frequency questionnaire. Mothers or caregivers were interviewed by a team of Zulu-speaking fieldworkers. The percentage of children consuming the individual food items (consumers) and the weekly consumption for consumers were calculated for the two areas separately. The food items were ranked in descending order according to the combined group of children and reported for each area within five selected food groups (carbohydrates, sugars, fruit and vegetables, milk and milk products, and other foods and snacks). Food items were 'flagged' according to their cariogenic potential. Fisher's exact test on absolute numbers tested for significant differences in the frequency of intake between individual food items between the two groups. Significance was set at P < 0.05. The frequency of consumption of certain listed cariogenic food items showed significant differences between the two areas. A higher percentage of children in Area A than in Area B consumed most of the food items and also more frequently. Children mainly consumed foods with a cariogenic score of 2, solid foods with 8-20% sugars as well as foods high in starch with less than 10% sugars. This knowledge is essential to gain insight into the eating pattern among rural communities and will provide a baseline for developing and adapting dietary advice specifically for young rural South African children with particular emphasis on the prevention of dental caries.
Designing P-Optimal Item Pools in Computerized Adaptive Tests with Polytomous Items
ERIC Educational Resources Information Center
Zhou, Xuechun
2012-01-01
Current CAT applications consist of predominantly dichotomous items, and CATs with polytomously scored items are limited. To ascertain the best approach to polytomous CAT, a significant amount of research has been conducted on item selection, ability estimation, and impact of termination rules based on polytomous IRT models. Few studies…
Real and Artificial Differential Item Functioning in Polytomous Items
ERIC Educational Resources Information Center
Andrich, David; Hagquist, Curt
2015-01-01
Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…
Calibration of Automatically Generated Items Using Bayesian Hierarchical Modeling.
ERIC Educational Resources Information Center
Johnson, Matthew S.; Sinharay, Sandip
For complex educational assessments, there is an increasing use of "item families," which are groups of related items. However, calibration or scoring for such an assessment requires fitting models that take into account the dependence structure inherent among the items that belong to the same item family. C. Glas and W. van der Linden…
Online Calibration of Polytomous Items Under the Generalized Partial Credit Model
Zheng, Yi
2016-01-01
Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. PMID:29881063
Household Food Security in Isfahan Based on Current Population Survey Adapted Questionnaire
Rafiei, Morteza; Rastegari, Hosein Ali; Ghiasi, Mojdeh; Shahsanaie, Vahid
2013-01-01
Background: Food security is a state in which all people at every time have physical and economic access to adequate food to obviate nutritional needs and live a healthy and active life. Therefore, this study was performed to quantitatively evaluate the household food security in Esfahan using the localized version of US Household Food Security Survey Module (US HFSSM). Methods: This descriptive cross-sectional study was performed in year 2006 on 3000 households of Esfahan. The study instrument used in this work is 18-item US food security module, which is developed into a localized 15-item questionnaire. This study is performed in two stages of families with no children (under 18 years old) and families with children over 18 years old. Results: The results showed that item severity coefficient, ratio of responses given by households and item infit and outfit coefficient in adult's and children's questionnaire respectively. According to obtained data, scale score of +3 in adults group is described as determination limit of slight food insecurity and +6 is stated as the limit for severe food insecurity. For children's group, scale score of +2 is defined to be the limit of slight food insecurity and +5 is the determination limit of severe food insecurity. Conclusions: The main hypothesis of this survey analysis is based on the raw scale score of USFSSM The item of “lack of enough money for buying food” (item 2) and the item of “lack of balanced meal” (3rd item) have the lowest severity coefficient. Then, the ascending rate of item severity continues in first item, 4th item and keeps increasing into 10th item. PMID:24498498
Development and initial validation of the Bedside Paediatric Early Warning System score
2009-01-01
Introduction Adverse outcomes following clinical deterioration in children admitted to hospital wards is frequently preventable. Identification of children for referral to critical care experts remains problematic. Our objective was to develop and validate a simple bedside score to quantify severity of illness in hospitalized children. Methods A case-control design was used to evaluate 11 candidate items and identify a pragmatic score for routine bedside use. Case-patients were urgently admitted to the intensive care unit (ICU). Control-patients had no 'code blue', ICU admission or care restrictions. Validation was performed using two prospectively collected datasets. Results Data from 60 case and 120 control-patients was obtained. Four out of eleven candidate-items were removed. The seven-item Bedside Paediatric Early Warning System (PEWS) score ranges from 0–26. The mean maximum scores were 10.1 in case-patients and 3.4 in control-patients. The area under the receiver operating characteristics curve was 0.91, compared with 0.84 for the retrospective nurse-rating of patient risk for near or actual cardiopulmonary arrest. At a score of 8 the sensitivity and specificity were 82% and 93%, respectively. The score increased over 24 hours preceding urgent paediatric intensive care unit (PICU) admission (P < 0.0001). In 436 urgent consultations, the Bedside PEWS score was higher in patients admitted to the ICU than patients who were not admitted (P < 0.0001). Conclusions We developed and performed the initial validation of the Bedside PEWS score. This 7-item score can quantify severity of illness in hospitalized children and identify critically ill children with at least one hours notice. Prospective validation in other populations is required before clinical application. PMID:19678924
Feagan, B G; Hanauer, S B; Coteur, G; Schreiber, S
2011-05-01
Successful treatment of systemic inflammatory symptoms is essential for improving health-related quality of life in patients with active Crohn's disease. Patient-reported outcomes provide unique perspectives on the impact of chronic disease. It is unknown whether a combination of different instruments might improve sensitivity to clinically relevant changes in health status. To develop a composite score based upon Crohn's Disease Activity Index (CDAI) and Inflammatory Bowel Disease Questionnaire (IBDQ) items. Patients from the PRECiSE 2 trial who responded at week 6 to certolizumab pegol (CZP) were randomised to receive treatment with CZP 400 mg or placebo for up to 26 weeks. IBDQ and CDAI scores were assessed at weeks 0, 6, 16 and 26. A 'daily practice' composite score (DP-6) containing two items from the CDAI and four items from IBDQ was constructed. Correlation coefficients between the CDAI score and IBDQ total score at baseline and at week 26 were -0.344 and -0.603, respectively (P<0.05). All IBDQ items were improved following CZP treatment. The DP-6 had the highest responsiveness at assessing response to treatment, relative to CDAI total score, when compared with other scores. The DP-6 composite score could be used to optimise the use of existing instruments by serving as an index of symptoms due to systemic inflammation. Additional studies are needed to determine if the DP-6 composite score differentiates the impact of different treatments on patient-reported outcomes, and to determine if the use of the DP-6 improves the care of patients in clinical practice. © 2011 Blackwell Publishing Ltd.
The Effects of Judgment-Based Stratum Classifications on the Efficiency of Stratum Scored CATs.
ERIC Educational Resources Information Center
Finney, Sara J.; Smith, Russell W.; Wise, Steven L.
Two operational item pools were used to investigate the performance of stratum computerized adaptive tests (CATs) when items were assigned to strata based on empirical estimates of item difficulty or human judgments of item difficulty. Items from the first data set consisted of 54 5-option multiple choice items from a form of the ACT mathematics…
ERIC Educational Resources Information Center
Li, Yanmei
2012-01-01
In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…
Aikawa, Ken; Kataoka, Masao; Ogawa, Soichiro; Akaihata, Hidenori; Sato, Yuichi; Yabe, Michihiro; Hata, Junya; Koguchi, Tomoyuki; Kojima, Yoshiyuki; Shiragasawa, Chihaya; Kobayashi, Toshimitsu; Yamaguchi, Osamu
2015-08-01
To present a new grouping of male patients with lower urinary tract symptoms (LUTS) based on symptom patterns and clarify whether the therapeutic effect of α1-blocker differs among the groups. We performed secondary analysis of anonymous data from 4815 patients enrolled in a postmarketing surveillance study of tamsulosin in Japan. Data on 7 International Prostate Symptom Score (IPSS) items at the initial visit were used in the cluster analysis. IPSS and quality of life (QOL) scores before and after tamsulosin treatment for 12 weeks were assessed in each cluster. Partial correlation coefficients were also obtained for IPSS and QOL scores based on changes before and after treatment. Five symptom groups were identified by cluster analysis of IPSS. On their symptom profile, each cluster was labeled as minimal type (cluster 1), multiple severe type (cluster 2), weak stream type (cluster 3), storage type (cluster 4), and voiding type (cluster 5). Prevalence and the mean symptom score were significantly improved in almost all symptoms in all clusters by tamsulosin treatment. Nocturia and weak stream had the strongest effect on QOL in clusters 1, 2, and 4 and clusters 3 and 5, respectively. The study clarified that 5 characteristic symptom patterns exist by cluster analysis of IPSS in male patients with LUTS. Tamsulosin improved various symptoms and QOL in each symptom group. The study reports many male patients with LUTS being satisfied with monotherapy using tamsulosin and suggests the usefulness of α1-blockers as a drug of first choice. Copyright © 2015 Elsevier Inc. All rights reserved.
The factor structure and screening utility of the Social Interaction Anxiety Scale.
Rodebaugh, Thomas L; Woods, Carol M; Heimberg, Richard G; Liebowitz, Michael R; Schneier, Franklin R
2006-06-01
The widely used Social Interaction Anxiety Scale (SIAS; R. P. Mattick & J. C. Clarke, 1998) possesses favorable psychometric properties, but questions remain concerning its factor structure and item properties. Analyses included 445 people with social anxiety disorder and 1,689 undergraduates. Simple unifactorial models fit poorly, and models that accounted for differences due to item wording (i.e., reverse scoring) provided superior fit. It was further found that clients and undergraduates approached some items differently, and the SIAS may be somewhat overly conservative in selecting analogue participants from an undergraduate sample. Overall, this study provides support for the excellent properties of the SIAS's straightforwardly worded items, although questions remain regarding its reverse-scored items. Copyright 2006 APA, all rights reserved.
Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie
2017-05-01
Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Vermeulen, Esther; Stronks, Karien; Snijder, Marieke B; Schene, Aart H; Lok, Anja; de Vries, Jeanne H; Visser, Marjolein; Brouwer, Ingeborg A; Nicolaou, Mary
2017-09-01
To identify a high-sugar (HS) dietary pattern, a high-saturated-fat (HF) dietary pattern and a combined high-sugar and high-saturated-fat (HSHF) dietary pattern and to explore if these dietary patterns are associated with depressive symptoms. We used data from the HELIUS (Healthy Life in an Urban Setting) study and included 4969 individuals aged 18-70 years. Diet was assessed using four ethnic-specific FFQ. Dietary patterns were derived using reduced rank regression with mono- and disaccharides, saturated fat and total fat as response variables. The nine-item Patient Health Questionnaire (PHQ-9) was used to assess depressive symptoms by using continuous scores and depressed mood (identified using the cut-off point: PHQ-9 sum score ≥10). The Netherlands. Three dietary patterns were identified; an HSHF dietary pattern (including chocolates, red meat, added sugars, high-fat dairy products, fried foods, creamy sauces), an HS dietary pattern (including sugar-sweetened beverages, added sugars, fruit (juices)) and an HF dietary pattern (including high-fat dairy products, butter). When comparing extreme quartiles, consumption of an HSHF dietary pattern was associated with more depressive symptoms (Q1 v. Q4: β=0·18, 95 % CI 0·07, 0·30, P=0·001) and with higher odds of depressed mood (Q1 v. Q4: OR=2·36, 95 % CI 1·19, 4·66, P=0·014). No associations were found between consumption of the remaining dietary patterns and depressive symptoms. Higher consumption of an HSHF dietary pattern is associated with more depressive symptoms and with depressed mood. Our findings reinforce the idea that the focus should be on dietary patterns that are high in both sugar and saturated fat.
Wang, Ye; Tan, Ngiap-Chuan; Tay, Ee-Guan; Thumboo, Julian; Luo, Nan
2015-07-16
This study aimed to assess the measurement equivalence of the 5-level EQ-5D (EQ-5D-5L) among the English, Chinese, and Malay versions. A convenience sample of patients with type 2 diabetes mellitus were enrolled from a public primary health care institution in Singapore. The survey questionnaire comprised the EQ-5D-5L and questions assessing participants' socio-demographic and clinical characteristics. Multiple linear regression models were used to assess the difference in EQ-5D-5L index (calculated using an interim algorithm) and EQ-visual analog scale (EQ-VAS) scores across survey language (Chinese vs. English, Malay vs. English, and Malay vs. Chinese). Measurement equivalence was examined by comparing the 90% confidence interval of difference in the EQ-5D-5L index and EQ-VAS scores with a pre-determined equivalence margin. Multiple logistic regression models were used to assess the response patterns of the 5 Likert-type items of the EQ-5D-5L across survey language. Equivalence was demonstrated between the Chinese and English versions and between the Malay and English versions of the EQ-5D-5L index scores. Equivalence was also demonstrated between the Chinese and English versions and between the Malay and Chinese versions of the EQ-VAS scores. Equivalence could not be determined between the Malay and Chinese versions of the EQ-5D-5L index score and between the Malay and English versions of the EQ-VAS score. No significant difference was found in responses to EQ-5D-5L items between any languages, except that patients who chose to complete the Chinese version were more likely to report "no problems" in mobility compared to those who completed the Malay version of the questionnaire. This study provided evidence for the measurement equivalence of the different language versions of EQ-5D-5L in Singapore.
1986-08-01
most examinees. Therefore it appears psychometrically ac - ceptable for the CAT -ASVAB project to proceed without item recalibration based on...MEMORANDUM DETERMINING THE SENSITIVITY OF CAT -ASVAB SCORES TO CHANGES IN ITEM RESPONSE CURVES WITH THE MEDIUM OF ADMINISTRATION D. R. Divgi...Subj: Center for Naval Analyses Research Memorandum 86-189 End: (1) CNA Research Memorandum 86-189, "Determining the Sensitivity of CAT -ASVAB
Can business and economics students perform elementary arithmetic?
Standing, Lionel G; Sproule, Robert A; Leung, Ambrose
2006-04-01
Business and economics majors (N=146) were tested on the D'Amore Test of Elementary Arithmetic, which employs third-grade test items from 1932. Only 40% of the subjects passed the test by answering 10 out of 10 items correctly. Self-predicted scores were a good predictor of actual scores, but performance was not associated with demographic variables, grades in calculus courses, liking for science or computers, or mathematics anxiety. Scores decreased over the subjects' initial years on campus. The hardest test item, with an error rate of 23%, required the subject to evaluate (36 x 7) + (33 x 7). The results are similar to those of Standing in 2006, despite methodological changes intended to maximize performance.
Validation of Gujarati Version of ABILOCO-Kids Questionnaire
Diwan, Jasmin; Patel, Pankaj; Bansal, Ankita B.
2015-01-01
Background ABILOCO-Kids is a measure of locomotion ability for children with cerebral palsy (CP) aged 6 to 15 years & is available in English & French. Aim To validate the Gujarati version of ABILOCO-Kids questionnaire to be used in clinical research on Gujarati population. Materials and Methods ABILOCO-Kids questionnaire was translated into Gujarati from English using forward-backward-forward method. To ensure face & content validity of Gujarati version using group consensus method, each item was examined by group of experts having mean experience of 24.62 years in field of paediatric and paediatric physiotherapy. Each item was analysed for content, meaning, wording, format, ease of administration & scoring. Each item was scored by expert group as either accepted, rejected or accepted with modification. Procedure was continued until 80% of consensus for all items. Concurrent validity was examined on 55 children with Cerebral Palsy (6-15 years) of all Gross Motor Functional Classification System (GMFCS) level & all clinical types by correlating score of ABILOCO-Kids with Gross Motor Functional Measure & GMFCS. Result In phase 1 of validation, 16 items were accepted as it is; 22 items accepted with modification & 3 items went for phase 2 validation. For concurrent validity, highly significant positive correlation was found between score of ABILOCO-Kids & total GMFM (r=0.713, p<0.005) & highly significant negative correlation with GMFCS (r= -0.778, p<0.005). Conclusion Gujarati translated version of ABILOCO-Kids questionnaire has good face & content validity as well as concurrent validity which can be used to measure caregiver reported locomotion ability in children with CP. PMID:26557603
Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E
2015-01-01
The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.
Effects of prenatal substance exposure on infant temperament vary by context.
Locke, Robin L; Lagasse, Linda L; Seifer, Ronald; Lester, Barry M; Shankaran, Seetha; Bada, Henrietta S; Bauer, Charles R
2016-05-01
This was a prospective longitudinal multisite study of the effects of prenatal cocaine and/or opiate exposure on temperament in 4-month-olds of the Maternal Lifestyle Study (N = 958: 366 cocaine exposed, 37 opiate exposed, 33 exposed to both drugs, 522 matched comparison). The study evaluated positivity and negativity during The Behavior Assessment of Infant Temperament (Garcia Coll et al., 1988). Parents rated temperament (Infant Behavior Questionnaire; Rothbart, 1981). Cocaine-exposed infants showed less positivity overall, mainly during activity and threshold items, more negativity during sociability items, and less negativity during irritability and threshold items. Latent profile analysis indicated individual temperament patterns were best described by three groups: low/moderate overall reactivity, high social negative reactivity, and high nonsocial negative reactivity. Infants with heavy cocaine exposure were more likely in high social negative reactivity profile, were less negative during threshold items, and required longer soothing intervention. Cocaine- and opiate-exposed infants scored lower on Infant Behavior Questionnaire smiling and laughter and duration of orienting scales. Opiate-exposed infants were rated as less respondent to soothing. By including a multitask measure of temperament we were able to show context-specific behavioral dysregulation in prenatally cocaine-exposed infants. The findings indicate flatter temperament may be specific to nonsocial contexts, whereas social interactions may be more distressing for cocaine-exposed infants.
Capturing specific abilities as a window into human individuality: the example of face recognition.
Wilmer, Jeremy B; Germine, Laura; Chabris, Christopher F; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken
2012-01-01
Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality.
Testing manifest monotonicity using order-constrained statistical inference.
Tijmstra, Jesper; Hessen, David J; van der Heijden, Peter G M; Sijtsma, Klaas
2013-01-01
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.
Lawton IADL scale in dementia: can item response theory make it more informative?
McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M
2014-07-01
impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
[Severe intimate partner violence risk prediction scale-revised].
Echeburúa, Enrique; Amor, Pedro Javier; Loinaz, Ismael; de Corral, Paz
2010-11-01
The aim of this study was to describe the psychometric properties of the Severe Intimate Partner Violence Risk Prediction Scale and to revise it in order to ponderate the 20 items according to their discriminant capacity and to solve the missing item problem. The sample for this study consisted of 450 male batterers who were reported to the police station. The victims were classified as high-risk (18.2%), moderate-risk (45.8%) and low-risk (36%), depending on the cutoff scores in the original scale. Internal consistency (Cronbach's alpha=.72) and interrater reliability (r=.73) were acceptable. The point biserial correlation coefficient between each item and the corrected total score of the 20-item scale was calculated to determine the most discriminative items, which were associated with the context of intimate partner violence in the last month, with the male batterer's profile and with the victim's vulnerability. A revised scale (EPV-R) with new cutoff scores and indications on how to deal with the missing items were proposed in accordance with these results. This easy-to-use tool appears to be suitable to the requirements of criminal justice professionals and is intended for use in safety planning. Implications of these results for further research are discussed.
Dimensionality Assessment for Dichotomously Scored Items Using Multidimensional Scaling.
ERIC Educational Resources Information Center
Jones, Patricia B.; And Others
In order to determine the effectiveness of multidimensional scaling (MDS) in recovering the dimensionality of a set of dichotomously-scored items, data were simulated in one, two, and three dimensions for a variety of correlations with the underlying latent trait. Similarity matrices were constructed from these data using three margin-sensitive…
Comparison of Reliability Measures under Factor Analysis and Item Response Theory
ERIC Educational Resources Information Center
Cheng, Ying; Yuan, Ke-Hai; Liu, Cheng
2012-01-01
Reliability of test scores is one of the most pervasive psychometric concepts in measurement. Reliability coefficients based on a unifactor model for continuous indicators include maximal reliability rho and an unweighted sum score-based omega, among many others. With increasing popularity of item response theory, a parallel reliability measure pi…
The Effects of Item by Item Feedback Given during an Ability Test.
ERIC Educational Resources Information Center
Whetton, C.; Childs, R.
1981-01-01
Answer-until-correct (AUC) is a procedure for providing feedback during a multiple-choice test, giving an increased range of scores. The performance of secondary students on a verbal ability test using AUC procedures was compared with a group using conventional instructions. AUC scores considerably enhanced reliability but not validity.…
Observed-Score Equating as a Test Assembly Problem.
ERIC Educational Resources Information Center
van der Linden, Wim J.; Luecht, Richard M.
1998-01-01
Derives a set of linear conditions of item-response functions that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly. An example illustrates the use of the model for an item pool from the Law School Admissions Test (LSAT). (SLD)
Developing and Evaluating a Machine-Scorable, Constrained Constructed-Response Item.
ERIC Educational Resources Information Center
Braun, Henry I.; And Others
The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non-multiple choice…
Gasser, Constantine E; Mensah, Fiona K; Kerr, Jessica A; Wake, Melissa
2017-12-01
Social patterning of dietary-related diseases may partly be explained by population disparities in children's diets. This study aimed to determine which early life socioeconomic factors best predict dietary trajectories across childhood. For waves 2-6 of the Baby (B) Cohort (ages 2-3 to 10-11 years) and waves 1-6 of the Kindergarten (K) Cohort (ages 4-5 to 14-15 years) of the Longitudinal Study of Australian Children, we constructed trajectories of dietary scores and of empirically derived dietary patterns. Dietary scores, based on the Australian Dietary Guidelines, summed children's consumption frequencies of seven groups of foods or drinks over the last 24 hours. Dietary patterns at each wave were derived using factor analyses of 12-16 food or drink items. Using multinomial logistic regression analyses, we examined associations of baseline single (parental education, remoteness area, parental employment, income, food security and home ownership) and composite (socioeconomic position and neighbourhood disadvantage) factors with adherence to dietary trajectories. All dietary trajectory outcomes across both cohorts showed profound gradients by composite socioeconomic position but not by neighbourhood disadvantage. For example, odds for children in the lowest relative to highest socioeconomic position quintile being in the 'never healthy' relative to the 'always healthy' score trajectory were OR=16.40, 95% CI 9.40 to 28.61 (B Cohort). Among the single variables, only parental education consistently predicted dietary trajectories. Child dietary trajectories vary profoundly by family socioeconomic position. If causal, reducing dietary inequities may require researching underlying pathways, tackling socioeconomic inequities and targeting health promoting interventions to less educated families. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Franchignoni, F; Tesio, L; Martino, M T; Benevolo, E; Castagna, M
1998-01-01
A model for prediction of length of stay (LOS, in days) of stroke rehabilitation inpatients was developed, based on patients' age (years) and function at admission (scored on the Functional Independence Measure, FIMSM). One hundred and twenty-nine cases, consecutively admitted to three free-standing rehabilitation centres in Italy, were analyzed. A multiple linear regression using forward stepwise selection procedure was adopted. Median admission and discharge scores were: 57 and 75 for the total FIM score, 29 and 48 for the 13-item motor FIM subscore, 29 and 30 for the 5-item cognitive FIM subscore (potential range: 18-126, 13-91, 5-35, respectively). Median LOS was 44 days (interquartile range 30-62). The logLOS predictive model included three FIM items ("toilet transfer", TTr; "social interaction"; "expression") and patient's age (R2 = 0.48). TTr alone explained 31.3% of the variance of logLOS. These results are consistent with previous American studies, showing that FIM scores at admission are strong predictors of patients' LOS, with the transfer items having the greatest predictive power.
Nilsson, Lena Maria; Winkvist, Anna; Brustad, Magritt; Jansson, Jan-Håkan; Johansson, Ingegerd; Lenner, Per; Lindahl, Bernt; Van Guelpen, Bethany
2012-05-04
To examine the relationship between "traditional Sami" dietary pattern and mortality in a general northern Swedish population. Population-based cohort study. We examined 77,319 subjects from the Västerbotten Intervention Program (VIP) cohort. A traditional Sami diet score was constructed by adding 1 point for intake above the median level of red meat, fatty fish, total fat, berries and boiled coffee, and 1 point for intake below the median of vegetables, bread and fibre. Hazard ratios (HR) for mortality were calculated by Cox regression. Increasing traditional Sami diet scores were associated with slightly elevated all-cause mortality in men [Multivariate HR per 1-point increase in score 1.04 (95% CI 1.01-1.07), p=0.018], but not for women [Multivariate HR 1.03 (95% CI 0.99-1.07), p=0.130]. This increased risk was approximately equally attributable to cardiovascular disease and cancer, though somewhat more apparent for cardiovascular disease mortality in men free from diabetes, hypertension and obesity at baseline [Multivariate HR 1.10 (95% CI 1.01-1.20), p=0.023]. A weak increased all-cause mortality was observed in men with higher traditional Sami diet scores. However, due to the complexity in defining a "traditional Sami" diet, and the limitations of our questionnaire for this purpose, the study should be considered exploratory, a first attempt to relate a "traditional Sami" dietary pattern to health endpoints. Further investigation of cohorts with more detailed information on dietary and lifestyle items relevant for traditional Sami culture is warranted.
Reliability of the Melbourne assessment of unilateral upper limb function.
Randall, M; Carlin, J B; Chondros, P; Reddihough, D
2001-11-01
This study examines the reliability of the Melbourne Assessment of Unilateral Upper Limb Function: a quantitative test of quality of movement in children with neurological impairment. The assessment was administered to 20 children aged from 5 to 16 years (mean age 9 years 10 months, SD 2 years 10 months) who had various types and degrees of cerebral palsy (CP). The performances of the 20 children during assessment were videotaped for subsequent scoring by 15 occupational therapists. Scores were analyzed for internal consistency of test items, inter- and intrarater reliability of scorings of the same videotapes, and test-retest reliability using repeat videotaping. Results revealed very high internal consistency of test items (alpha=0.96), moderate to high agreement both within and between raters for all test items (intraclass correlations of at least 0.7) apart from item 16 (hand to mouth and down), and high interrater reliability (0.95) and intrarater reliability (0.97) for total test scores. Test-retest results revealed moderate to high intrarater reliability for item totals (mean of 0.83 and 0.79) for each rater and high reliability for test totals (0.98 and 0.97). These findings indicate that the Melbourne Assessment of Unilateral Upper Limb Function is a reliable tool for measuring the quality of unilateral upper-limb movement in children with CP.
Krusinska, Beata; Hawrysz, Iwona; Wadolowska, Lidia; Slowinska, Malgorzata Anna; Biernacki, Maciej; Czerwinska, Anna; Golota, Janusz Jacek
2018-04-11
Lung cancer in men and breast cancer in women are the most commonly diagnosed cancers in Poland and worldwide. Results of studies involving dietary patterns (DPs) and breast or lung cancer risk in European countries outside the Mediterranean Sea region are limited and inconclusive. This study aimed to develop a 'Polish-adapted Mediterranean Diet' ('Polish-aMED') score, and then study the associations between the 'Polish-aMED' score and a posteriori -derived dietary patterns with breast or lung cancer risk in adult Poles. This pooled analysis of two case-control studies involved 560 subjects (280 men, 280 women) aged 40-75 years from Northeastern Poland. Diagnoses of breast cancer in 140 women and lung cancer in 140 men were found. The food frequency consumption of 21 selected food groups was collected using a 62-item Food Frequency Questionnaire (FFQ)-6. The 'Polish-adapted Mediterranean Diet' score which included eight items-vegetables, fruit, whole grain, fish, legumes, nuts and seeds-as well as the ratio of vegetable oils to animal fat and red and processed meat was developed (range: 0-8 points). Three DPs were identified in a Principal Component Analysis: 'Prudent', 'Non-healthy', 'Dressings and sweetened-low-fat dairy'. In a multiple logistic regression analysis, two models were created: crude, and adjusted for age, sex, type of cancer, Body Mass Index (BMI), socioeconomic status (SES) index, overall physical activity, smoking status and alcohol abuse. The risk of breast or lung cancer was lower in the average (3-5 points) and high (6-8 points) levels of the 'Polish-aMED' score compared to the low (0-2 points) level by 51% (odds ratio (OR): 0.49; 95% confidence interval (Cl): 0.30-0.80; p < 0.01; adjusted) and 63% (OR: 0.37; 95% Cl: 0.21-0.64; p < 0.001; adjusted), respectively. In the middle and upper tertiles compared to the bottom tertile of the 'Prudent' DP, the risk of cancer was lower by 38-43% (crude) but was not significant after adjustment for confounders. In the upper compared to the bottom tertile of the 'Non-healthy' DP, the risk of cancer was higher by 65% (OR: 1.65; 95% Cl: 1.05-2.59; p < 0.05; adjusted). In conclusion, the Polish adaptation of the Mediterranean diet could be considered for adults living in non-Mediterranean countries for the prevention of the breast or lung cancers. Future studies should explore the role of a traditional Mediterranean diet fitted to local dietary patterns of non-Mediterranean Europeans in cancer prevention.
Large-Scale Constraint-Based Pattern Mining
ERIC Educational Resources Information Center
Zhu, Feida
2009-01-01
We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of…
Polytomous Latent Scales for the Investigation of the Ordering of Items
ERIC Educational Resources Information Center
Ligtvoet, Rudy; van der Ark, L. Andries; Bergsma, Wicher P.; Sijtsma, Klaas
2011-01-01
We propose three latent scales within the framework of nonparametric item response theory for polytomously scored items. Latent scales are models that imply an invariant item ordering, meaning that the order of the items is the same for each measurement value on the latent scale. This ordering property may be important in, for example,…
Saulle, R; Del Prete, G; Stelmach-Mardas, M; De Giusti, M; La Torre, G
2016-01-01
To investigate dietary habits among young people in the Mediterranean lands, exactly where the health benefits of the Mediterranean diet (MD) were discovered by Ancel Keys. A cross-sectional study design. A 10-items food-frequency questionnaire was administered to 1117 students in the schools of the Cilento area. Adherence to the MD was appraised according to a scale of 0-10. A logistic regression model was used to identify possible factors associated with "Following an unhealthy diet". Results were expressed as Odds Ratio with 95% confidence interval and the level of significance was set at p<0.05. A percentage of 63.8 reached a score under six, indicating that the majority of the students did not respect the rules of the Mediterranean diet and only 36.2% (n. 371) exceeded a score of 6 adhering to it in varying degrees. At the logistic regression analysis smokers resulted to be affected by almost a double risk of getting away from the Mediterranean dietary pattern (OR = 1.93; CI 95% 1.44-2.57); on the contrary, those with a higher PCS12 (Physical Component Summary score) were in a lower risk to move away from the MD style (OR = 0.98; 95% CI = 0.96-0.99). Despite its increasing popularity worldwide, adherence to the MD model is decreasing. The new generation of young people does not adhere to the MD pattern although they live in the lands characterized by the tradition and culture of healthy diet and where the benefits from this pattern were initially discovered. Interventions and specific education about the healthy diet may be useful to recover student's dietary patterns as in the old eating tradition.
The Bergen Shopping Addiction Scale: reliability and validity of a brief screening test.
Andreassen, Cecilie S; Griffiths, Mark D; Pallesen, Ståle; Bilder, Robert M; Torsheim, Torbjørn; Aboujaoude, Elias
2015-01-01
Although excessive and compulsive shopping has been increasingly placed within the behavioral addiction paradigm in recent years, items in existing screens arguably do not assess the core criteria and components of addiction. To date, assessment screens for shopping disorders have primarily been rooted within the impulse-control or obsessive-compulsive disorder paradigms. Furthermore, existing screens use the terms 'shopping,' 'buying,' and 'spending' interchangeably, and do not necessarily reflect contemporary shopping habits. Consequently, a new screening tool for assessing shopping addiction was developed. Initially, 28 items, four for each of seven addiction criteria (salience, mood modification, conflict, tolerance, withdrawal, relapse, and problems), were constructed. These items and validated scales (i.e., Compulsive Buying Measurement Scale, Mini-International Personality Item Pool, Hospital Anxiety and Depression Scale, Rosenberg Self-Esteem Scale) were then administered to 23,537 participants (M age = 35.8 years, SD age = 13.3). The highest loading item from each set of four pooled items reflecting the seven addiction criteria were retained in the final scale, The Bergen Shopping Addiction Scale (BSAS). The factor structure of the BSAS was good (RMSEA = 0.064, CFI = 0.983, TLI = 0.973) and coefficient alpha was 0.87. The scores on the BSAS converged with scores on the Compulsive Buying Measurement Scale (CBMS; 0.80), and were positively correlated with extroversion and neuroticism, and negatively with conscientiousness, agreeableness, and intellect/imagination. The scores of the BSAS were positively associated with anxiety, depression, and low self-esteem and inversely related to age. Females scored higher than males on the BSAS. The BSAS is the first scale to fully embed shopping addiction within an addiction paradigm. A recommended cutoff score for the new scale and future research directions are discussed.
Barnett, Carolina; Merkies, Ingemar S J; Katzberg, Hans; Bril, Vera
2015-09-02
The Quantitative Myasthenia Gravis Score and the Myasthenia Gravis Composite are two commonly used outcome measures in Myasthenia Gravis. So far, their measurement properties have not been compared, so we aimed to study their psychometric properties using the Rasch model. 251 patients with stable myasthenia gravis were assessed with both scales, and 211 patients returned for a second assessment. We studied fit to the Rasch model at the first visit, and compared item fit, thresholds, differential item functioning, local dependence, person separation index, and tests for unidimensionality. We also assessed test-retest reliability and estimated the Minimal Detectable Change. Neither scale fit the Rasch model (X2p < 0.05). The Myasthenia Gravis Composite had lower discrimination properties than the Quantitative Myasthenia Gravis Scale (Person Separation Index: 0.14 and 0.7). There was local dependence in both scales, as well as differential item functioning for ocular and generalized disease. Disordered thresholds were found in 6(60%) items of the Myasthenia Gravis Composite and in 4(31%) of the Quantitative Myasthenia Gravis Score. Both tools had adequate test-retest reliability (ICCs >0.8). The minimally detectable change was 4.9 points for the Myasthenia Gravis Composite and 4.3 points for the Quantitative Myasthenia Gravis Score. Neither scale fulfilled Rasch model expectations. The Quantitative Myasthenia Gravis Score has higher discrimination than the Myasthenia Gravis Composite. Both tools have items with disordered thresholds, differential item functioning and local dependency. There was evidence of multidimensionality in the QMGS. The minimal detectable change values are higher than previous studies on the minimal significant change. These findings might inform future modifications of these tools.
Assessing for suicidal behavior in youth using the Achenbach System of Empirically Based Assessment.
Van Meter, Anna R; Algorta, Guillermo Perez; Youngstrom, Eric A; Lechtman, Yana; Youngstrom, Jen K; Feeny, Norah C; Findling, Robert L
2018-02-01
This study investigated the clinical utility of the Achenbach System of Empirically Based Assessment (ASEBA) for identifying youth at risk for suicide. Specifically, we investigated how well the Total Problems scores and the sum of two suicide-related items (#18 "Deliberately harms self or attempts suicide" and #91 "Talks about killing self") were able to distinguish youth with a history of suicidal behavior. Youth (N = 1117) aged 5-18 were recruited for two studies of mental illness. History of suicidal behavior was assessed by semi-structured interviews (K-SADS) with youth and caregivers. Youth, caregivers, and a primary teacher each completed the appropriate form (YSR, CBCL, and TRF, respectively) of the ASEBA. Areas under the curve (AUCs) from ROC analyses and diagnostic likelihood ratios (DLRs) were used to measure the ability of both Total Problems T scores, as well as the summed score of two suicide-related items, to identify youth with a history of suicidal behavior. The Suicide Items from the CBCL and YSR performed well (AUCs = 0.85 and 0.70, respectively). The TRF Suicide Items did not perform better than chance, AUC = 0.45. The AUCs for the Total Problems scores were poor-to-fair (0.33-0.65). The CBCL Suicide Items outperformed all other scores (ps = 0.04 to <0.0005). Combining the CBCL and YSR items did not lead to incremental improvement in prediction over the CBCL alone. The sum of two questions from a commonly used assessment tool can offer important information about a youth's risk for suicidal behavior. The low burden of this approach could facilitate wide-spread screening for suicide in an increasingly at-risk population.
The Bergen Shopping Addiction Scale: reliability and validity of a brief screening test
Andreassen, Cecilie S.; Griffiths, Mark D.; Pallesen, Ståle; Bilder, Robert M.; Torsheim, Torbjørn; Aboujaoude, Elias
2015-01-01
Although excessive and compulsive shopping has been increasingly placed within the behavioral addiction paradigm in recent years, items in existing screens arguably do not assess the core criteria and components of addiction. To date, assessment screens for shopping disorders have primarily been rooted within the impulse-control or obsessive-compulsive disorder paradigms. Furthermore, existing screens use the terms ‘shopping,’ ‘buying,’ and ‘spending’ interchangeably, and do not necessarily reflect contemporary shopping habits. Consequently, a new screening tool for assessing shopping addiction was developed. Initially, 28 items, four for each of seven addiction criteria (salience, mood modification, conflict, tolerance, withdrawal, relapse, and problems), were constructed. These items and validated scales (i.e., Compulsive Buying Measurement Scale, Mini-International Personality Item Pool, Hospital Anxiety and Depression Scale, Rosenberg Self-Esteem Scale) were then administered to 23,537 participants (Mage = 35.8 years, SDage = 13.3). The highest loading item from each set of four pooled items reflecting the seven addiction criteria were retained in the final scale, The Bergen Shopping Addiction Scale (BSAS). The factor structure of the BSAS was good (RMSEA = 0.064, CFI = 0.983, TLI = 0.973) and coefficient alpha was 0.87. The scores on the BSAS converged with scores on the Compulsive Buying Measurement Scale (CBMS; 0.80), and were positively correlated with extroversion and neuroticism, and negatively with conscientiousness, agreeableness, and intellect/imagination. The scores of the BSAS were positively associated with anxiety, depression, and low self-esteem and inversely related to age. Females scored higher than males on the BSAS. The BSAS is the first scale to fully embed shopping addiction within an addiction paradigm. A recommended cutoff score for the new scale and future research directions are discussed. PMID:26441749
ERIC Educational Resources Information Center
Michaelides, Michalis P.
2006-01-01
Consistent behavior is a desirable characteristic that common items are expected to have when administered to different groups. Findings from the literature have established that items do not always behave in consistent ways; item indices and IRT item parameter estimates of the same items differ when obtained from different administrations.…
Time manages interference in visual short-term memory.
Smith, Amy V; McKeown, Denis; Bunce, David
2017-09-01
Emerging evidence suggests that age-related declines in memory may reflect a failure in pattern separation, a process that is believed to reduce the encoding overlap between similar stimulus representations during memory encoding. Indeed, behavioural pattern separation may be indexed by a visual continuous recognition task in which items are presented in sequence and observers report for each whether it is novel, previously viewed (old), or whether it shares features with a previously viewed item (similar). In comparison to young adults, older adults show a decreased pattern separation when the number of items between "old" and "similar" items is increased. Yet the mechanisms of forgetting underpinning this type of recognition task are yet to be explored in a cognitively homogenous group, with careful control over the parameters of the task, including elapsing time (a critical variable in models of forgetting). By extending the inter-item intervals, number of intervening items and overall decay interval, we observed in a young adult sample (N = 35, M age = 19.56 years) that the critical factor governing performance was inter-item interval. We argue that tasks using behavioural continuous recognition to index pattern separation in immediate memory will benefit from generous inter-item spacing, offering protection from inter-item interference.
2000-12-01
A SKIP FLAG INDICATING THE RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP...RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5, NOTE 2, IN APPENDIX D. G-52...RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5
Bourdel-Marchasson, Isabelle; Diallo, Abou; Bellera, Carine; Blanc-Bisson, Christelle; Durrieu, Jessica; Germain, Christine; Mathoulin-Pélissier, Simone; Soubeyran, Pierre; Rainfray, Muriel; Fonck, Mariane; Doussau, Adelaïde
2016-01-01
The MNA (Mini Nutritional Assessment) is known as a prognosis factor in older population. We analyzed the prognostic value for one-year mortality of MNA items in older patients with cancer treated with chemotherapy as the basis of a simplified prognostic score. The prospective derivation cohort included 606 patients older than 70 years with an indication of chemotherapy for cancers. The endpoint to predict was one-year mortality. The 18 items of the Full MNA, age, gender, weight loss, cancer origin, TNM, performance status and lymphocyte count were considered to construct the prognostic model. MNA items were analyzed with a backward step-by-step multivariate logistic regression and other items were added in a forward step-by-step regression. External validation was performed on an independent cohort of 229 patients. At one year 266 deaths had occurred. Decreased dietary intake (p = 0.0002), decreased protein-rich food intake (p = 0.025), 3 or more prescribed drugs (p = 0.023), calf circumference <31 cm (p = 0.0002), tumor origin (p<0.0001), metastatic status (p = 0.0007) and lymphocyte count <1500/mm3 (0.029) were found to be associated with 1-year mortality in the final model and were used to construct a prognostic score. The area under curve (AUC) of the score was 0.793, which was higher than the Full MNA AUC (0.706). The AUC of the score in validation cohort (229 subjects, 137 deaths) was 0.698. Key predictors of one-year mortality included cancer cachexia clinical features, comorbidities, the origin and the advanced status of the tumor. The prognostic value of this model combining a subset of MNA items and cancer related items was better than the full MNA, thus providing a simple score to predict 1-year mortality in older patients with an indication of chemotherapy.
Relationship between cognitive and non-cognitive symptoms of delirium.
Rajlakshmi, Aarya Krishnan; Mattoo, Surendra Kumar; Grover, Sandeep
2013-04-01
To study relationship between the cognitive and the non-cognitive symptoms of delirium. Eighty-four patients referred to psychiatry liaison services and met DSM-IVTR criteria of delirium were assessed using the Delirium Rating Scale Revised-1998 (DRSR-98) and Cognitive Test for Delirium (CTD). The mean DRS-R-98 severity score was 17.19 and DRS-R-98 total score was 23.36. The mean total score on CTD was 11.75. The mean scores on CTD were highest for comprehension (3.47) and lowest for vigilance (1.71). Poor attention was associated with significantly higher motor retardation and higher DRS-R-98 severity scores minus the attention scores. There were no significant differences between those with and without poor attention. Higher attention deficits were associated with higher dysfunction on all other domains of cognition on CTD. There was significant correlation between cognitive functions as assessed on CTD and total DRS-R-98 score, DRS-R-98 severity score and DRS-R-98 severity score without the attention item score. However, few correlations emerged between CTD domains and CTD total scores with cognitive symptom total score of DRS-R-98 (items 9-13) and non-cognitive symptom total score of DRS-R-98 (items 1-8). Our study suggests that in delirium, cognitive deficits are quite prevalent and correlate with overall severity of delirium. Attention deficit is a core symptom of delirium. Copyright © 2012 Elsevier B.V. All rights reserved.
Heinik, Jeremia; Solomesh, Isaac
2007-03-01
The Cambridge Cognitive Examination-Revised introduces 2 new executive items (Ideational Fluency and Visual Reasoning), which separately or combined with 2 executive items in the former version (word list generation and similarities) might constitute an Executive Function Score (EFS). The authors studied the validity of these new EFSs in 51 demented (dementia of the Alzheimer's type, vascular dementia) and nondemented individuals (depressives and normals). The new EFSs were found valid to accurately differentiate between demented and nondemented subjects; however, they were considerably less so when specific diagnoses were considered. Correlations between the variously combined executive scores and the cognitive scales and subscales studied were prevalently low to moderate, and ranged from high and significant to low and nonsignificant when the 4 executive items were correlated to each other. The ability of the executive scores to discriminate demented from nondemented individuals was lower compared with the Cambridge Cognitive Examination-Revised scores. EFS was found internally consistent.
Thyø, A; Emmertsen, K J; Pinkney, T D; Christensen, P; Laurberg, S
2017-01-01
The aim was to develop and validate a simple scoring system evaluating the impact of colostomy dysfunction on quality of life (QOL) in patients with a permanent stoma after rectal cancer treatment. In this population-based study, 610 patients with a permanent colostomy after previous rectal cancer treatment during the period 2001-2007 completed two questionnaires: (i) the basic stoma questionnaire consisting of 22 items about stoma function with one anchor question addressing the overall stoma impact on QOL and (ii) the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire (EORTC QLQ) C30. Answers from half of the cohort were used to develop the score and subsequently validated on the remaining half. Logistic regression analyses identified and selected items for the score and multivariate analysis established the score value allocated to each item. The colostomy impact score includes seven items with a total range from 0 to 38 points. A score of ≥ 10 indicates major colostomy impact (Major CI). The score has a sensitivity of 85.7% for detecting patients with significant stoma impact on QOL. Using the EORTC QLQ scales, patients with Major CI experienced significant impairment in their QOL compared to the Minor CI group. This new scoring system appears valid for the assessment of the impact on QOL from having a permanent colostomy in a Danish rectal cancer population. It requires validation in non-Danish populations prior to its acceptance as a valuable patient-reported outcome measure for patients internationally. Colorectal Disease © 2016 The Association of Coloproctology of Great Britain and Ireland.
The medial tibial stress syndrome score: a new patient-reported outcome measure.
Winters, Marinus; Moen, Maarten H; Zimmermann, Wessel O; Lindeboom, Robert; Weir, Adam; Backx, Frank Jg; Bakker, Eric Wp
2016-10-01
At present, there is no validated patient-reported outcome measure (PROM) for patients with medial tibial stress syndrome (MTSS). Our aim was to select and validate previously generated items and create a valid, reliable and responsive PROM for patients with MTSS: the MTSS score. A prospective cohort study was performed in multiple sports medicine, physiotherapy and military facilities in the Netherlands. Participants with MTSS filled out the previously generated items for the MTSS score on 3 occasions. From previously generated items, we selected the best items. We assessed the MTSS score for its validity, reliability and responsiveness. The MTSS score was filled out by 133 participants with MTSS. Factor analysis showed the MTSS score to exhibit a single-factor structure with acceptable internal consistency (α=0.58) and good test-retest reliability (intraclass correlation coefficient=0.81). The MTSS score ranges from 0 to 10 points. The smallest detectable change in our sample was 0.69 at the group level and 4.80 at the individual level. Construct validity analysis showed significant moderate-to-large correlations (r=0.34-0.52, p<0.01). Responsiveness of the MTSS score was confirmed by a significant relation with the global perceived effect scale (β=-0.288, R(2)=0.21, p<0.001). The MTSS score is a valid, reliable and responsive PROM to measure the severity of MTSS. It is designed to evaluate treatment outcomes in clinical studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Measuring the Reliability of Picture Story Exercises like the TAT
Gruber, Nicole; Kreuzpointner, Ludwig
2013-01-01
As frequently reported, psychometric assessments on Picture Story Exercises, especially variations of the Thematic Apperception Test, mostly reveal inadequate scores for internal consistency. We demonstrate that the reason for this apparent shortcoming is not caused by the coding system itself but from the incorrect use of internal consistency coefficients, especially Cronbach’s α. This problem could be eliminated by using the category-scores as items instead of the picture-scores. In addition to a theoretical explanation we prove mathematically why the use of category-scores produces an adequate internal consistency estimation and examine our idea empirically with the origin data set of the Thematic Apperception Test by Heckhausen and two additional data sets. We found generally higher values when using the category-scores as items instead of picture-scores. From an empirical and theoretical point of view, the estimated reliability is also superior to each category within a picture as item measuring. When comparing our suggestion with a multifaceted Rasch-model we provide evidence that our procedure better fits the underlying principles of PSE. PMID:24348902
Sim, Joong Hiong; Tong, Wen Ting; Hong, Wei-Han; Vadivelu, Jamuna; Hassan, Hamimah
2015-01-01
Assessment environment, synonymous with climate or atmosphere, is multifaceted. Although there are valid and reliable instruments for measuring the educational environment, there is no validated instrument for measuring the assessment environment in medical programs. This study aimed to develop an instrument for measuring students' perceptions of the assessment environment in an undergraduate medical program and to examine the psychometric properties of the new instrument. The Assessment Environment Questionnaire (AEQ), a 40-item, four-point (1=Strongly Disagree to 4=Strongly Agree) Likert scale instrument designed by the authors, was administered to medical undergraduates from the authors' institution. The response rate was 626/794 (78.84%). To establish construct validity, exploratory factor analysis (EFA) with principal component analysis and varimax rotation was conducted. To examine the internal consistency reliability of the instrument, Cronbach's α was computed. Mean scores for the entire AEQ and for each factor/subscale were calculated. Mean AEQ scores of students from different academic years and sex were examined. Six hundred and eleven completed questionnaires were analysed. EFA extracted four factors: feedback mechanism (seven items), learning and performance (five items), information on assessment (five items), and assessment system/procedure (three items), which together explained 56.72% of the variance. Based on the four extracted factors/subscales, the AEQ was reduced to 20 items. Cronbach's α for the 20-item AEQ was 0.89, whereas Cronbach's α for the four factors/subscales ranged from 0.71 to 0.87. Mean score for the AEQ was 2.68/4.00. The factor/subscale of 'feedback mechanism' recorded the lowest mean (2.39/4.00), whereas the factor/subscale of 'assessment system/procedure' scored the highest mean (2.92/4.00). Significant differences were found among the AEQ scores of students from different academic years. The AEQ is a valid and reliable instrument. Initial validation supports its use to measure students' perceptions of the assessment environment in an undergraduate medical program.
van der Maas, Nico Arie
2017-03-16
The Multiple Sclerosis Questionnaire for Physical Therapists (MSQPT) is a patient-rated outcome questionnaire for evaluating the rehabilitation of persons with multiple sclerosis (MS). Responsiveness was evaluated, and minimal important difference (MID) estimates were calculated to provide thresholds for clinical change for four items, three sections and the total score of the MSQPT. This multicentre study used a combined distribution- and anchor-based approach with multiple anchors and multiple rating of change questions. Responsiveness was evaluated using effect size, standardized response mean (SRM), modified SRM and relative efficiency. For distribution-based MID estimates, 0.2 and 0.33 standard deviations (SD), standard error of measurement (SEM) and minimal detectable change were used . Triangulation of anchor- and distribution-based MID estimates provided a range of MID values for each of the four items, the three sections and the total score of the MSQPT. The MID values were tested for their sensitivity and specificity for amelioration and deterioration for each of the four items, the three sections and the total score of the MSQPT. The MID values of each item and section and of the total score with the best sensitivity and specificity were selected as thresholds for clinical change. The outcome measures were the MSQPT, Hamburg Quality of Life Questionnaire for Multiple Sclerosis (HAQUAMS), rating of change questionnaires, Expanded Disability Status Scale, 6-metre timed walking test, Berg Balance Scale and 6-minute walking test. The effect size ranged from 0.46 to 1.49. The SRM data showed comparable results. The modified SRM ranged from 0.00 to 0.60. Anchor-based MID estimates were very low and were comparable with SD- and SEM-based estimates. The MSQPT was more responsive than the HAQUAMS in detecting improvement but less responsive in finding deterioration. The best MID estimates of the items, sections and total score, expressed in percentage of their maximum score, were between 5.4% (activity) and 22% (item 10) change for improvement and between 5.7% (total score) and 22% (item 10) change for deterioration. The MSQPT is a responsive questionnaire with an adequate MID that may be used as threshold for change during rehabilitation of MS patients. This trial was retrospectively (01/24/2015) registered in ClinicalTrials.gov as NCT02346279.
Chen, Liuxi; Xu, Kai; Fu, Lingyun; Xu, Shaofang; Gao, Qianqian; Wang, Wei
2015-01-01
Consistent results have shown a relationship between the psychological world of children and their perceived parental bonding or family attachment style, but to date there is no single measure covering both styles. The authors designed a statement matrix with 116 items for this purpose and compared it with the Parental Bonding Instrument (PBI) in a study with 718 university students. After exploratory and confirmatory factor analyses, five factors (scales)--namely, Paternal/Maternal Encouragement (5 items each), Paternal/Maternal Abuse (5 items each), Paternal/Maternal Freedom Release (5 items each), General Attachment (5 items), and Paternal/Maternal Dominance (4 items each)--were defined to form a Family Relationship Questionnaire (FRQ). The internal alphas of the factors ranged from .64 to .83, and their congruency coefficients were .93 to .98 in samples regarding father and mother. Women scored significantly higher on FRQ General Attachment and Maternal Encouragement and lower on Paternal Abuse than men did; only children scored significantly higher on Paternal and Maternal Encouragements than children with siblings did. Women also scored significantly higher on PBI Paternal Autonomy Denial; only children scored significantly higher on Paternal and Maternal Cares and Maternal Autonomy Denial. All intercorrelations between FRQ scales were low to medium, and some correlations between FRQ and PBI scales were medium to high. This study demonstrates that the FRQ has a structure of five factors with satisfactory discriminant and convergent validities, which might help to characterize family relationships in healthy and clinical populations.
Filipino Nurses' Spirituality and Provision of Spiritual Nursing Care.
Labrague, Leodoro J; McEnroe-Petitte, Denise M; Achaso, Romeo H; Cachero, Geifsonne S; Mohammad, Mary Rose A
2016-12-01
This study was to explore the perceptions of Filipino nurses' spirituality and the provision of spiritual nursing care. A descriptive, cross-sectional, and quantitative study was adopted for this study. The study was conducted in the Philippines utilizing a convenience sample of 245 nurses. Nurses' Spirituality and Delivery of Spiritual Care (NSDSC) was used as the main instrument. The items on NSDSC with higher mean scores related to nurses' perception of spirituality were Item 7, "I believe that God loves me and cares for me," and Item 8, "Prayer is an important part of my life," with mean scores of 4.87 (SD = 1.36) and 4.88 (SD = 1.34), respectively. Items on NSDSC with higher mean scores related to the practice of spiritual care were Item 26, "I usually comfort clients spiritually (e.g., reading books, prayers, music, etc.)," and Item 25, "I refer the client to his/her spiritual counselor (e.g., hospital chaplain) if needed," with mean scores of 3.16 (SD = 1.54) and 2.92 (SD = 1.59). Nurse's spirituality correlated significantly with their understanding of spiritual nursing care (r = .3376, p ≤ .05) and delivery of spiritual nursing care (r = .3980, p ≤ .05). Positive significant correlations were found between understanding of spiritual nursing care and delivery of spiritual nursing care (r = .3289, p ≤ .05). For nurses to better provide spiritual nursing care, they must care for themselves through self-awareness, self-reflection, and developing a sense of satisfaction and contentment. © The Author(s) 2015.
2018-01-01
Background To further understand the relationship between anxiety and depression, this study examined the factor structure of the combined items from two validated measures for anxiety and depression. Methods The participants were 406 patients with mixed psychiatric diagnoses including anxiety and depressive disorders from a psychiatric outpatient unit at a university-affiliated medical center. Responses of the Beck Anxiety Inventory (BAI), Beck Depression Inventory (BDI)-II, and Symptom Checklist-90-Revised (SCL-90-R) were analyzed. We conducted an exploratory factor analysis of 42 items from the BAI and BDI-II. Correlational analyses were performed between subscale scores of the SCL-90-R and factors derived from the factor analysis. Scores of individual items of the BAI and BDI-II were also compared between groups of anxiety disorder (n = 185) and depressive disorder (n = 123). Results Exploratory factor analysis revealed the following five factors explaining 56.2% of the total variance: somatic anxiety (factor 1), cognitive depression (factor 2), somatic depression (factor 3), subjective anxiety (factor 4), and autonomic anxiety (factor 5). The depression group had significantly higher scores for 12 items on the BDI while the anxiety group demonstrated higher scores for six items on the BAI. Conclusion Our results suggest that anxiety and depressive symptoms as measured by the BAI and BDI-II can be empirically differentiated and that particularly items of the cognitive domain in depression and those of physical domain in anxiety are noteworthy. PMID:29651821
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
ERIC Educational Resources Information Center
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
The representation of order information in auditory-verbal short-term memory.
Kalm, Kristjan; Norris, Dennis
2014-05-14
Here we investigate how order information is represented in auditory-verbal short-term memory (STM). We used fMRI and a serial recall task to dissociate neural activity patterns representing the phonological properties of the items stored in STM from the patterns representing their order. For this purpose, we analyzed fMRI activity patterns elicited by different item sets and different orderings of those items. These fMRI activity patterns were compared with the predictions made by positional and chaining models of serial order. The positional models encode associations between items and their positions in a sequence, whereas the chaining models encode associations between successive items and retain no position information. We show that a set of brain areas in the postero-dorsal stream of auditory processing store associations between items and order as predicted by a positional model. The chaining model of order representation generates a different pattern similarity prediction, which was shown to be inconsistent with the fMRI data. Our results thus favor a neural model of order representation that stores item codes, position codes, and the mapping between them. This study provides the first fMRI evidence for a specific model of order representation in the human brain. Copyright © 2014 the authors 0270-6474/14/346879-08$15.00/0.
Andreeva, Valentina A; Martin, Christophe; Issanchou, Sylvie; Hercberg, Serge; Kesse-Guyot, Emmanuelle; Méjean, Caroline
2013-08-01
Certain beneficial foods taste bitter (e.g., cruciferous vegetables) and might be aversive to consumers. Here, individual characteristics according to bitter food consumption patterns were assessed. The study included 2327 participants in the SU.VI.MAX antioxidant-based randomized controlled trial (1994-2002). The sample was drawn from the general French population. Dietary data were obtained from a minimum of twelve 24-h dietary records provided during the first 2years of follow-up. Two bitter food consumption scores were computed - one assessing the variety of items consumed (unweighted score) and the other reflecting exposure to bitterness estimated via complementary sensory panel data from the EpiPref project (weighted score). Associations with sociodemographic, health, and lifestyle factors were analyzed with multiple linear regression. Among men, the variety of bitter foods consumed was positively associated with educational level and alcohol intake and inversely associated with physical activity and rural area of residence. Among women, the same outcome was positively associated with alcohol intake and inversely associated with diabetes. In turn, Body Mass Index displayed a significant inverse association with the bitterness-weighted score across sex, whereas educational level was supported only in women. This study adds to the presently scant knowledge about non-genetic determinants or moderators of actual bitter food intake. Future studies should elucidate the impact of diabetes and body size on bitter food intake patterns. Copyright © 2013 Elsevier Ltd. All rights reserved.
Krekels, Ehj; Novakovic, A M; Vermeulen, A M; Friberg, L E; Karlsson, M O
2017-08-01
As biomarkers are lacking, multi-item questionnaire-based tools like the Positive and Negative Syndrome Scale (PANSS) are used to quantify disease severity in schizophrenia. Analyzing composite PANSS scores as continuous data discards information and violates the numerical nature of the scale. Here a longitudinal analysis based on Item Response Theory is presented using PANSS data from phase III clinical trials. Latent disease severity variables were derived from item-level data on the positive, negative, and general PANSS subscales each. On all subscales, the time course of placebo responses were best described with Weibull models, and dose-independent functions with exponential models to describe the onset of the full effect were used to describe paliperidone's effect. Placebo and drug effect were most pronounced on the positive subscale. The final model successfully describes the time course of treatment effects on the individual PANSS item-levels, on all PANSS subscale levels, and on the total score level. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Ling, Ying; Zhang, Minqiang; Locke, Kenneth D; Li, Guangming; Li, Zonglong
2016-01-01
The Circumplex Scales of Interpersonal Values (CSIV) is a 64-item self-report measure of goals from each octant of the interpersonal circumplex. We used item response theory methods to compare whether dominance models or ideal point models best described how people respond to CSIV items. Specifically, we fit a polytomous dominance model called the generalized partial credit model and an ideal point model of similar complexity called the generalized graded unfolding model to the responses of 1,893 college students. The results of both graphical comparisons of item characteristic curves and statistical comparisons of model fit suggested that an ideal point model best describes the process of responding to CSIV items. The different models produced different rank orderings of high-scoring respondents, but overall the models did not differ in their prediction of criterion variables (agentic and communal interpersonal traits and implicit motives).
Fischer, H Felix; Rose, Matthias
2016-10-19
Recently, a growing number of Item-Response Theory (IRT) models has been published, which allow estimation of a common latent variable from data derived by different Patient Reported Outcomes (PROs). When using data from different PROs, direct estimation of the latent variable has some advantages over the use of sum score conversion tables. It requires substantial proficiency in the field of psychometrics to fit such models using contemporary IRT software. We developed a web application ( http://www.common-metrics.org ), which allows estimation of latent variable scores more easily using IRT models calibrating different measures on instrument independent scales. Currently, the application allows estimation using six different IRT models for Depression, Anxiety, and Physical Function. Based on published item parameters, users of the application can directly estimate latent trait estimates using expected a posteriori (EAP) for sum scores as well as for specific response patterns, Bayes modal (MAP), Weighted likelihood estimation (WLE) and Maximum likelihood (ML) methods and under three different prior distributions. The obtained estimates can be downloaded and analyzed using standard statistical software. This application enhances the usability of IRT modeling for researchers by allowing comparison of the latent trait estimates over different PROs, such as the Patient Health Questionnaire Depression (PHQ-9) and Anxiety (GAD-7) scales, the Center of Epidemiologic Studies Depression Scale (CES-D), the Beck Depression Inventory (BDI), PROMIS Anxiety and Depression Short Forms and others. Advantages of this approach include comparability of data derived with different measures and tolerance against missing values. The validity of the underlying models needs to be investigated in the future.
Xiao, Yuan-mei; Wang, Zhi-ming; Wang, Mian-zhen; Lan, Ya-jia
2005-06-01
To test the reliability and validity of two mental workload assessment scales, i.e. subjective workload assessment technique (SWAT) and NASA task load index (NASA-TLX). One thousand two hundred and sixty-eight mental workers were sampled from various kinds of occupations, such as scientific research, education, administration and medicine, etc, with randomized cluster sampling. The re-test reliability, split-half reliability, Cronbach's alpha coefficient and correlation coefficients between item score and total score were adopted to test the reliability. The test of validity included structure validity. The re-test reliability coefficients of these two scales and their items were ranged from 0.516 to 0.753 (P < 0.01), indicating the two scales had good re-test reliability; the split-half reliability of SWAT was 0.645, and its Cronbach's alpha coefficient was more than 0.80, all the correlation coefficients between its items score and total score were more than 0.70; as for NASA-TLX, both the split-half reliability and Cronbach's alpha coefficient were more than 0.80, the correlation coefficients between its items score and total score were all more than 0.60 (P < 0.01) except the item of performance. Both scales had good inner consistency. The Pearson correlation coefficient between the two scales was 0.492 (P < 0.01), implying the results of the two scales had good consistency. Factor analysis showed that the two scales had good structure validity. Both SWAT and NASA-TLX have good reliability and validity and may be used as a valid tool to assess mental workload in China after being revised properly.
Haley, Stephen M.; Ni, Pengsheng; Dumas, Helene M.; Fragala-Pinkham, Maria A.; Hambleton, Ronald K.; Montpetit, Kathleen; Bilodeau, Nathalie; Gorton, George E.; Watson, Kyle; Tucker, Carole A
2009-01-01
Purpose The purpose of this study was to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). Methods Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. Results Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. Conclusions The bi-factor MIRT CAT application, especially the 10- and 15-item version, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. PMID:19221892
Snowdon, John; Halliday, Graeme; Hunt, Glenn E
2013-07-01
Most people who collect and hoard, and then have difficulty discarding items, do not live in squalor, even though accumulation of hoarded items can make cleaning very difficult. Commonly, people living in squalor accumulate garbage, but relatively few fulfill proposed criteria for "hoarding disorder." We examined the overlap between hoarding and squalor among people referred because of unacceptable living conditions. Ongoing collection of data by a Squalor Project team, including ratings on the Environmental Cleanliness and Clutter Scale (ECCS), allowed (1) description of characteristics of cases and (2) examination of ratings of uncleanliness, and of the effect of accumulation of items or material on access within dwellings. Principal component analysis was used to examine latent variables underlying the ECCS. The mean age of the referred occupants (108 male, 95 female) was 61.9 years. The mean ECCS score in 186 rated cases was 18.5. Factor analysis of ECCS data showed a two-factor solution as the most plausible. Factor 1, comprising seven squalor items, accounted for 33.7% of the variance. Factor 2 comprised reduced accessibility and accumulation of items of little value (variance 17.6%). Accumulation of garbage loaded equally on the two factors. High levels of squalor and/or accumulation were recorded in 105 (56%) of the 186 dwellings. One-third scored high on accumulation/hoarding, while 38% scored high on squalor; 15% scored high on both squalor and accumulation. A quarter of those scoring high on squalor scored low on hoarding/accumulation. The ECCS is useful when describing whether referred cases show high levels of squalor, hoarding, or both.
Kempen, Jiska C E; Doorenbosch, Caroline A M; Knol, Dirk L; de Groot, Vincent; Beckerman, Heleen
2016-11-01
Limited walking ability is an important problem for patients with multiple sclerosis. A better understanding of how gait impairments lead to limited walking ability may help to develop more targeted interventions. Although gait classifications are available in cerebral palsy and stroke, relevant knowledge in MS is scarce. The aims of this study were: (1) to identify distinctive gait patterns in patients with MS based on a combined evaluation of kinematics, gait features, and muscle activity during walking and (2) to determine the clinical relevance of these gait patterns. This was a cross-sectional study of 81 patients with MS of mild-to-moderate severity (Expanded Disability Status Scale [EDSS] median score=3.0, range=1.0-7.0) and an age range of 28 to 69 years. The patients participated in 2-dimensional video gait analysis, with concurrent measurement of surface electromyography and ground reaction forces. A score chart of 73 gait items was used to rate each gait analysis. A single rater performed the scoring. Latent class analysis was used to identify gait classes. Analysis of the 73 gait variables revealed that 9 variables could distinguish 3 clinically meaningful gait classes. The 9 variables were: (1) heel-rise in terminal stance, (2) push-off, (3) clearance in initial swing, (4) plantar-flexion position in mid-swing, (5) pelvic rotation, (6) arm-trunk movement, (7) activity of the gastrocnemius muscle in pre-swing, (8) M-wave, and (9) propulsive force. The EDSS score and gait speed worsened in ascending classes. Most participants had mild-to-moderate limitations in walking ability based on their EDSS scores, and the number of walkers who were severely limited was small. Based on a small set of 9 variables measured with 2-dimensional clinical gait analysis, patients with MS could be divided into 3 different gait classes. The gait variables are suggestive of insufficient ankle push-off. © 2016 American Physical Therapy Association.
Measuring the Success of a Pipeline Program to Increase Nursing Workforce Diversity.
Katz, Janet R; Barbosa-Leiker, Celestina; Benavides-Vaello, Sandra
2016-01-01
The purpose of this study was to understand changes in knowledge and opinions of underserved American Indian and Hispanic high school students after attending a 2-week summer pipeline program using and testing a pre/postsurvey. The research aims were to (a) psychometrically analyze the survey to determine if scale items could be summed to create a total scale score or subscale scores; (b) assess change in scores pre/postprogram; and (c) examine the survey to make suggestions for modifications and further testing to develop a valid tool to measure changes in student perceptions about going to college and nursing as a result of pipeline programs. Psychometric analysis indicated poor model fit for a 1-factor model for the total scale and majority of subscales. Nonparametric tests indicated statistically significant increases in 13 items and decreases in 2 items. Therefore, while total scores or subscale scores cannot be used to assess changes in perceptions from pre- to postprogram, the survey can be used to examine changes over time in each item. Student did not have an accurate view of nursing and college and underestimated support needed to attend college. However students realized that nursing was a profession with autonomy, respect, and honor. Copyright © 2016 Elsevier Inc. All rights reserved.
Confirmatory Factor Analysis of the Minnesota Nicotine Withdrawal Scale
Toll, Benjamin A.; O’Malley, Stephanie S.; McKee, Sherry A.; Salovey, Peter; Krishnan-Sarin, Suchitra
2008-01-01
The authors examined the factor structure of the Minnesota Nicotine Withdrawal Scale (MNWS) using confirmatory factor analysis in clinical research samples of smokers trying to quit (n = 723). Three confirmatory factor analytic models, based on previous research, were tested with each of the 3 study samples at multiple points in time. A unidimensional model including all 8 MNWS items was found to be the best explanation of the data. This model produced fair to good internal consistency estimates. Additionally, these data revealed that craving should be included in the total score of the MNWS. Factor scores derived from this single-factor, 8-item model showed that increases in withdrawal were associated with poor smoking outcome for 2 of the clinical studies. Confirmatory factor analyses of change scores showed that the MNWS symptoms cohere as a syndrome over time. Future investigators should report a total score using all of the items from the MNWS. PMID:17563141
An examination of the interrater reliability between practitioners and researchers on the static-99.
Quesada, Stephen P; Calkins, Cynthia; Jeglic, Elizabeth L
2014-11-01
Many studies have validated the psychometric properties of the Static-99, the most widely used measure of sexual offender recidivism risk. However much of this research relied on instrument coding completed by well-trained researchers. This study is the first to examine the interrater reliability (IRR) of the Static-99 between practitioners in the field and researchers. Using archival data from a sample of 1,973 formerly incarcerated sex offenders, field raters' scores on the Static-99 were compared with those of researchers. Overall, clinicians and researchers had excellent IRR on Static-99 total scores, with IRR coefficients ranging from "substantial" to "outstanding" for the individual 10 items of the scale. The most common causes of discrepancies were coding manual errors, followed by item subjectivity, inaccurate item scoring, and calculation errors. These results offer important data with regard to the frequency and perceived nature of scoring errors. © The Author(s) 2013.
Feeding Practices in Infancy Associated with Caries Incidence in Early Childhood
Chaffee, Benjamin W.; Feldens, Carlos Alberto; Rodrigues, Priscila Humbert; Vítolo, Márcia Regina
2015-01-01
Early-life feeding behaviors foretell later dietary habits and health outcomes. Few studies have examined infant dietary patterns and caries occurrence prospectively. OBJECTIVE Assess whether patterns in food and drink consumption before age 12 months are associated with caries incidence by preschool age. METHODS We collected early-life feeding data within a birth cohort from low-income families in Porto Alegre, Brazil. Three dietary indexes were defined, based on refined sugar content and/or previously reported caries associations: a count of sweet foods or drinks introduced <6-months (e.g., candy, cookies, soft drinks), a count of other, non-sweet items introduced <6-months (e.g., beans, meat), and a count of sweet items consumed at 12 months. Incidence of severe early childhood caries (S-ECC) at age 38 months (N=458) was compared by score tertile on each index, adjusted for family, maternal, and child characteristics using regression modeling. RESULTS Introduction to a greater number of presumably cariogenic items in infancy was positively associated with future caries. S-ECC incidence was highest in the uppermost tertile of the “6-month sweet index” (adjusted cumulative incidence ratio, RR, versus lowest tertile: 1.46; 95% CI: 0.97, 2.04) and the uppermost tertile of the “12-month sweet index” (RR: 1.55; 95% CI: 1.17, 2.23). The association was specific for sweet items: caries incidence did not differ by tertile of the “6-month non-sweet index” (RR: 1.00; 95% CI: 0.70, 1.40). Additionally, each one-unit increase on the 6-month and the 12-month sweet indexes, but not the 6-month non-sweet index, was statistically significantly associated with greater S-ECC incidence and associated with more decayed, missing or restored teeth. Results were robust to minor changes in the items constituting each index and persisted if liquid items were excluded. CONCLUSIONS Dietary factors observed before age 12-months were associated with S-ECC at preschool age, highlighting a need for timely, multi-level intervention. PMID:25753518
Feeding practices in infancy associated with caries incidence in early childhood.
Chaffee, Benjamin W; Feldens, Carlos Alberto; Rodrigues, Priscila Humbert; Vítolo, Márcia Regina
2015-08-01
Early-life feeding behaviors foretell later dietary habits and health outcomes. Few studies have examined infant dietary patterns and caries occurrence prospectively. Assess whether patterns in food and drink consumption before age 12 months are associated with caries incidence by preschool age. We collected early-life feeding data within a birth cohort from low-income families in Porto Alegre, Brazil. Three dietary indexes were defined, based on refined sugar content and/or previously reported caries associations: a count of sweet foods or drinks introduced <6-months (e.g., candy, cookies, soft drinks), a count of other, nonsweet items introduced <6-months (e.g., beans, meat), and a count of sweet items consumed at 12 months. Incidence of severe early childhood caries (S-ECC) at age 38 months (N = 458) was compared by score tertile on each index, adjusted for family, maternal, and child characteristics using regression modeling. Introduction to a greater number of presumably cariogenic items in infancy was positively associated with future caries. S-ECC incidence was highest in the uppermost tertile of the '6-month sweet index' (adjusted cumulative incidence ratio, RR, versus lowest tertile: 1.46; 95% CI: 0.97, 2.04) and the uppermost tertile of the '12-month sweet index' (RR: 1.55; 95% CI: 1.17, 2.23). The association was specific for sweet items: caries incidence did not differ by tertile of the '6-month nonsweet index' (RR: 1.00; 95% CI: 0.70, 1.40). Additionally, each one-unit increase on the 6-month and the 12-month sweet indexes, but not the 6-month nonsweet index, was statistically significantly associated with greater S-ECC incidence and associated with more decayed, missing, or restored teeth. Results were robust to minor changes in the items constituting each index and persisted if liquid items were excluded. Dietary factors observed before age 12-months were associated with S-ECC at preschool age, highlighting a need for timely, multilevel intervention. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Associations Between Sleep Duration Patterns and Behavioral/Cognitive Functioning at School Entry
Touchette, Évelyne; Petit, Dominique; Séguin, Jean R.; Boivin, Michel; Tremblay, Richard E.; Montplaisir, Jacques Y.
2007-01-01
Objective: The aim of the study was to investigate the associations between longitudinal sleep duration patterns and behavioral/cognitive functioning at school entry. Design, Setting, and Participants: Hyperactivity-impulsivity (HI), inattention, and daytime sleepiness scores were measured by questionnaire at 6 years of age in a sample of births from 1997 to 1998 in a Canadian province (N=1492). The Peabody Picture Vocabulary Test - Revised (PPVT-R) was administered at 5 years of age and the Block Design subtest (WISC-III) was administered at 6 years of age. Sleep duration was reported yearly by the children's mothers from age 2.5 to 6 years. A group-based semiparametric mixture model was used to estimate developmental patterns of sleep duration. The relationships between sleep duration patterns and both behavioral items and neurodevelopmental tasks were tested using weighted multivariate logistic regression models to control for potentially confounding psychosocial factors. Results: Four sleep duration patterns were identified: short persistent (6.0%), short increasing (4.8%),10-hour persistent (50.3%), and 11-hour persistent (38.9%). The association of short sleep duration patterns with high HI scores (P=0.001), low PPVT-R performance (P=0.002), and low Block Design subtest performance (P=0.004) remained significant after adjusting for potentially confounding variables. Conclusions: Shortened sleep duration, especially before the age of 41 months, is associated with externalizing problems such as HI and lower cognitive performance on neurodevelopmental tests. Results highlight the importance of giving a child the opportunity to sleep at least 10 hours per night throughout early childhood. Citation: Touchette E; Petit D; Séguin JR; Boivin M; Tremblay RE; Montplaisir JY. Associations between sleep duration patterns and behavioral/cognitive functioning at school entry. SLEEP 2007;30(9):1213-1219. PMID:17910393
ERIC Educational Resources Information Center
Wang, Wei
2013-01-01
Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Pick-N Multiple Choice-Exams: A Comparison of Scoring Algorithms
ERIC Educational Resources Information Center
Bauer, Daniel; Holzer, Matthias; Kopp, Veronika; Fischer, Martin R.
2011-01-01
To compare different scoring algorithms for Pick-N multiple correct answer multiple-choice (MC) exams regarding test reliability, student performance, total item discrimination and item difficulty. Data from six 3rd year medical students' end of term exams in internal medicine from 2005 to 2008 at Munich University were analysed (1,255 students,…
Network Approach to Autistic Traits: Group and Subgroup Analyses of ADOS Item Scores
ERIC Educational Resources Information Center
Anderson, George M.; Montazeri, Farhad; de Bildt, Annelies
2015-01-01
A network conceptualization might contribute to understanding the occurrence and interacting nature of behavioral traits in the autism realm. Networks were constructed based on correlations of item scores of the Autism Diagnostic Observation Schedule for Modules 1, 2 and 3 obtained for a group of 477 Dutch individuals with developmental disorders.…
1983-07-01
be a useful tool for assessing kowledge , but there are several problems with this item format. These problems include the possibility of an examinee...1959. -Kane, M. T., & Moloney, J. M. The effect of SSM grading on reliability when residual items have no discriminating power . Paper presented at
The outdoor situational fear inventory: a newer measure of an older instrument
Anderson B. Young; Alan Ewert; Sharon Todd; Thomas Steele; Thomas Quinn
1995-01-01
This study examined the relationship of two methods of scaling the Outdoor Situational Fear Inventory - continuum scaling and the more easily scored certainty method of scaling. Although item-by-item correlations varied widely, overall and subscale score relationships were strong. The data also suggested ways to clarify interpretations of earlier continuum scaled OSFI...
ERIC Educational Resources Information Center
Randall, Jennifer; Engelhard, George, Jr.
2010-01-01
The psychometric properties and multigroup measurement invariance of scores across subgroups, items, and persons on the "Reading for Meaning" items from the Georgia Criterion Referenced Competency Test (CRCT) were assessed in a sample of 778 seventh-grade students. Specifically, we sought to determine the extent to which score-based…
ERIC Educational Resources Information Center
van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas
2007-01-01
The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…
An Evaluation of a New Method of IRT Scaling
ERIC Educational Resources Information Center
Ragland, Shelley
2010-01-01
In order to be able to fairly compare scores derived from different forms of the same test within the Item Response Theory framework, all individual item parameters must be on the same scale. A new approach, the RPA method, which is based on transformations of predicted score distributions was evaluated here and was shown to produce results…
Brondani, Juliana Tabarelli; Luna, Stelio Pacca Loureiro; Padovani, Carlos Roberto
2011-02-01
To refine and test construct validity and reliability of a composite pain scale for use in assessing acute postoperative pain in cats undergoing ovariohysterectomy. 40 cats that underwent ovariohysterectomy in a previous study. In a previous randomized, double-blind, placebo-controlled study, a composite pain scale was developed to assess postoperative pain in cats that received a placebo or an analgesic (tramadol, vedaprofen, or tramadol-vedaprofen combination). In the present study, the scale was refined via item analysis (distribution frequency and occurrence), a nonparametric ANOVA, and item-to-total score correlation. Construct validity was assessed via factor analysis and known-groups discrimination, and reliability was measured by assessing internal consistency. Respiratory rate and respiratory pattern were rejected after item analysis. Factor analysis resulted in 5 dimensions (F1 [psychomotor change], posture, comfort, activity, mental status, and miscellaneous behaviors; F2 [protection of wound area], reaction to palpation of the surgical wound and palpation of the abdomen and flank; F3 [physiologic variables], systolic arterial blood pressure and appetite; F4 [vocal expression of pain], vocalization; and F5 [heart rate]). Internal consistency was excellent for the overall scale and for F1, F2, and F3; very good for F4; and unacceptable for F5. Except for heart rate, the identified factors and scale total score could be used to detect differences between the analgesic and placebo groups and differences among the analgesic treatments. Results provided initial evidence of construct validity and reliability of a multidimensional composite tool for use in assessing acute postoperative pain in cats undergoing ovariohysterectomy.
The Utility of the Family Empowerment Scale With Custodial Grandmothers
Hayslip, Bert; Smith, Gregory C.; Montoro-Rodriguez, Julian; Streider, Frederick H.; Merchant, William
2016-01-01
The Family Empowerment Scale (FES) was developed specifically to assess empowerment in families with emotional disorders. Its relevance to custodial grandfamilies is reflected in the difficulties in grandchildren's social, emotional, and behavioral functioning, wherein such difficulties may be explained via either reactions to changes in their family structure or in their responses to the newly formed family unit. Utilizing 27 items derived from the 34-item version of the FES, which had represented differential levels of empowerment (family, service system, community) as indexed by one's attitudes, knowledge, and behavior, we explored the factor structure, internal consistency, construct, and convergent validity of the FES with grandparent caregivers. Three-hundred forty-three (M age = 58.45, SD = 8.22, n Caucasian = 152, n African American = 149, n Hispanic = 38) custodial grandmothers caring for grandchildren between ages 4 and 12 years completed the 27 FES items and various measures of their psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. Factor analysis revealed three factors that differed slightly from the originally proposed FES subscales: Parental Self-Efficacy/Self-Confidence, Service Activism, and Service Knowledge. Each of the factors was internally consistent, and derived factor scores were moderately interrelated, speaking to the question of convergent validity. The construct validity of these three factors was evidenced by meaningful patterns of statistically significant correlations with grandmothers’ psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. These factor scores were independent of grandmother age, health, and education. These findings suggest the newly identified FES factors to be valuable in understanding empowerment among grandmother caregivers. PMID:26452627
The Utility of the Family Empowerment Scale With Custodial Grandmothers.
Hayslip, Bert; Smith, Gregory C; Montoro-Rodriguez, Julian; Streider, Frederick H; Merchant, William
2017-03-01
The Family Empowerment Scale (FES) was developed specifically to assess empowerment in families with emotional disorders. Its relevance to custodial grandfamilies is reflected in the difficulties in grandchildren's social, emotional, and behavioral functioning, wherein such difficulties may be explained via either reactions to changes in their family structure or in their responses to the newly formed family unit. Utilizing 27 items derived from the 34-item version of the FES, which had represented differential levels of empowerment (family, service system, community) as indexed by one's attitudes, knowledge, and behavior, we explored the factor structure, internal consistency, construct, and convergent validity of the FES with grandparent caregivers. Three-hundred forty-three ( M age = 58.45, SD = 8.22, n Caucasian = 152, n African American = 149, n Hispanic = 38) custodial grandmothers caring for grandchildren between ages 4 and 12 years completed the 27 FES items and various measures of their psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. Factor analysis revealed three factors that differed slightly from the originally proposed FES subscales: Parental Self-Efficacy/Self-Confidence, Service Activism, and Service Knowledge. Each of the factors was internally consistent, and derived factor scores were moderately interrelated, speaking to the question of convergent validity. The construct validity of these three factors was evidenced by meaningful patterns of statistically significant correlations with grandmothers' psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. These factor scores were independent of grandmother age, health, and education. These findings suggest the newly identified FES factors to be valuable in understanding empowerment among grandmother caregivers.
Item Review and the Rearrangement Procedure: Its Process and Its Results
ERIC Educational Resources Information Center
Papanastasiou, Elena C.
2005-01-01
Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive…
Item Reliabilities for a Family of Answer-Until-Correct (AUC) Scoring Rules.
ERIC Educational Resources Information Center
Kane, Michael T.; Moloney, James M.
The Answer-Until-Correct (AUC) procedure has been proposed in order to increase the reliability of multiple-choice items. A model for examinees' behavior when they must respond to each item until they answer it correctly is presented. An expression for the reliability of AUC items, as a function of the characteristics of the item and the scoring…
A large-scale, long-term study of scale drift: The micro view and the macro view
NASA Astrophysics Data System (ADS)
He, W.; Li, S.; Kingsbury, G. G.
2016-11-01
The development of measurement scales for use across years and grades in educational settings provides unique challenges, as instructional approaches, instructional materials, and content standards all change periodically. This study examined the measurement stability of a set of Rasch measurement scales that have been in place for almost 40 years. In order to investigate the stability of these scales, item responses were collected from a large set of students who took operational adaptive tests using items calibrated to the measurement scales. For the four scales that were examined, item samples ranged from 2183 to 7923 items. Each item was administered to at least 500 students in each grade level, resulting in approximately 3000 responses per item. Stability was examined at the micro level analysing change in item parameter estimates that have occurred since the items were first calibrated. It was also examined at the macro level, involving groups of items and overall test scores for students. Results indicated that individual items had changes in their parameter estimates, which require further analysis and possible recalibration. At the same time, the results at the total score level indicate substantial stability in the measurement scales over the span of their use.
Pedersen, Eric R; Huang, Wenjing; Dvorak, Robert D; Prince, Mark A; Hummer, Justin F
2017-08-01
Given recent state legislation legalizing marijuana for recreational purposes and majority popular opinion favoring these laws, we developed the Protective Behavioral Strategies for Marijuana scale (PBSM) to identify strategies that may mitigate the harms related to marijuana use among those young people who choose to use the drug. In the current study, we expand on the initial exploratory study of the PBSM to further validate the measure with a large and geographically diverse sample (N = 2,117; 60% women, 30% non-White) of college students from 11 different universities across the United States. We sought to develop a psychometrically sound item bank for the PBSM and to create a short assessment form that minimizes respondent burden and time. Quantitative item analyses, including exploratory and confirmatory factor analyses with item response theory (IRT) and evaluation of differential item functioning (DIF), revealed an item bank of 36 items that was examined for unidimensionality and good content coverage, as well as a short form of 17 items that is free of bias in terms of gender (men vs. women), race (White vs. non-White), ethnicity (Hispanic vs. non-Hispanic), and recreational marijuana use legal status (state recreational marijuana was legal for 25.5% of participants). We also provide a scoring table for easy transformation from sum scores to IRT scale scores. The PBSM item bank and short form associated strongly and negatively with past month marijuana use and consequences. The measure may be useful to researchers and clinicians conducting intervention and prevention programs with young adults. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Zeng, F F; Xue, W Q; Cao, W T; Wu, B H; Xie, H L; Fan, F; Zhu, H L; Chen, Y M
2014-08-01
This case-control study compared the associations of four widely used diet-quality scoring systems with the risk of hip fractures and assessed their utility in elderly Chinese. We found that individuals avoiding a low-quality diet have a lower risk of hip fractures in elderly Chinese. Few studies examined the associations of diet-quality scores on bone health, and no studies were available in Asians and compared their validity and utility in a study. We assessed the associations and utility of four widely used diet-quality scoring systems with the risk of hip fractures. A case-control study of 726 patients with hip fractures (diagnosed within 2 weeks) aged 55-80 years and 726 age- (within 3 years) and gender-matched controls was conducted in Guangdong, China (2009-2013). Dietary intake was assessed using a 79-item food frequency questionnaire with face-to-face interviews, and the Healthy Eating Index-2005 (HEI-2005, 12 items), the alternate Healthy Eating Index (aHEI, 8 items), the Diet Quality Index-International (DQI-I, 17 items), and the alternate Mediterranean Diet Score (aMed, 9 items) (the simplest one) were calculated. All greater values of the diet-quality scores were significantly associated with a similar decreased risk of hip fractures (all p trends <0.001). The multivariate-adjusted odds ratios (ORs) and 95% confidential intervals (95% CIs) comparing the extreme groups of diet-quality scores were 0.29 (0.18, 0.46) (HEI-2005), 0.20 (0.12, 0.33) (aHEI), 0.25 (0.16, 0.39) (DQI-I), and 0.28 (0.18, 0.43) (aMed) in total subjects; and the corresponding ORs ranged from 0.04 to 0.27 for men and from 0.26 to 0.44 for women (all p trends <0.05), respectively. Avoiding a low-quality diet is associated with a lower risk of hip fractures, and the aMed score is the best scoring system due to its equivalent performance and simplicity for the user.
Acute and long-term treatment of late-life major depressive disorder: duloxetine versus placebo.
Robinson, Michael; Oakes, Tina Myers; Raskin, Joel; Liu, Peng; Shoemaker, Scarlett; Nelson, J Craig
2014-01-01
To compare the efficacy of duloxetine with placebo on depression in elderly patients with major depressive disorder. Multicenter, 24-week (12-week short-term and 12-week continuation), randomized, placebo-controlled, double-blind trial. United States, France, Mexico, Puerto Rico. Age 65 years or more with major depressive disorder diagnosis (one or more previous episode); Mini-Mental State Examination score ≥20; Montgomery-Asberg Depression Rating Scale total score ≥20. Duloxetine 60 or 120 mg/day or placebo; placebo rescue possible. Primary-Maier subscale of the 17-item Hamilton Depression Rating Scale (HAMD-17) at week 12. Secondary-Geriatric Depression Scale, HAMD-17 total score, cognitive measures, Brief Pain Inventory (BPI), Numeric Rating Scales (NRS) for pain, Clinical Global Impression-Severity scale, Patient Global Impression of Improvement in acute phase and acute plus continuation phase of treatment. Compared with placebo, duloxetine did not show significantly greater improvement from baseline on Maier subscale at 12 weeks, but did show significantly greater improvement at weeks 4, 8, 16, and 20. Similar patterns for Geriatric Depression Scale and Clinical Global Impression-Severity scale emerged, with significance also seen at week 24. There was a significant treatment effect for all BPI items and 4 of 6 NRS pain measures in the acute phase, most BPI items and half of the NRS measures in the continuation phase. More duloxetine-treated patients completed the study (63% versus 55%). A significantly higher percentage of duloxetine-treated patients versus placebo discontinued due to adverse event (15.3% versus 5.8%). Although the antidepressant efficacy of duloxetine was not confirmed by the primary outcome, several secondary measures at multiple time points suggested efficacy. Duloxetine had significant and meaningful beneficial effects on pain. Copyright © 2014 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.
Fitting measurement models to vocational interest data: are dominance models ideal?
Tay, Louis; Drasgow, Fritz; Rounds, James; Williams, Bruce A
2009-09-01
In this study, the authors examined the item response process underlying 3 vocational interest inventories: the Occupational Preference Inventory (C.-P. Deng, P. I. Armstrong, & J. Rounds, 2007), the Interest Profiler (J. Rounds, T. Smith, L. Hubert, P. Lewis, & D. Rivkin, 1999; J. Rounds, C. M. Walker, et al., 1999), and the Interest Finder (J. E. Wall & H. E. Baker, 1997; J. E. Wall, L. L. Wise, & H. E. Baker, 1996). Item response theory (IRT) dominance models, such as the 2-parameter and 3-parameter logistic models, assume that item response functions (IRFs) are monotonically increasing as the latent trait increases. In contrast, IRT ideal point models, such as the generalized graded unfolding model, have IRFs that peak where the latent trait matches the item. Ideal point models are expected to fit better because vocational interest inventories ask about typical behavior, as opposed to requiring maximal performance. Results show that across all 3 interest inventories, the ideal point model provided better descriptions of the response process. The importance of specifying the correct item response model for precise measurement is discussed. In particular, scores computed by a dominance model were shown to be sometimes illogical: individuals endorsing mostly realistic or mostly social items were given similar scores, whereas scores based on an ideal point model were sensitive to which type of items respondents endorsed.
Maples, Jessica L; Guan, Li; Carter, Nathan T; Miller, Joshua D
2014-12-01
There has been a substantial increase in the use of personality assessment measures constructed using items from the International Personality Item Pool (IPIP) such as the 300-item IPIP-NEO (Goldberg, 1999), a representation of the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992). The IPIP-NEO is free to use and can be modified to accommodate its users' needs. Despite the substantial interest in this measure, there is still a dearth of data demonstrating its convergence with the NEO PI-R. The present study represents an investigation of the reliability and validity of scores on the IPIP-NEO. Additionally, we used item response theory (IRT) methodology to create a 120-item version of the IPIP-NEO. Using an undergraduate sample (n = 359), we examined the reliability, as well as the convergent and criterion validity, of scores from the 300-item IPIP-NEO, a previously constructed 120-item version of the IPIP-NEO (Johnson, 2011), and the newly created IRT-based IPIP-120 in comparison to the NEO PI-R across a range of outcomes. Scores from all 3 IPIP measures demonstrated strong reliability and convergence with the NEO PI-R and a high degree of similarity with regard to their correlational profiles across the criterion variables (rICC = .983, .972, and .976, respectively). The replicability of these findings was then tested in a community sample (n = 757), and the results closely mirrored the findings from Sample 1. These results provide support for the use of the IPIP-NEO and both 120-item IPIP-NEO measures as assessment tools for measurement of the five-factor model. (c) 2014 APA, all rights reserved.
Couvy-Duchesne, Baptiste; Davenport, Tracey A; Martin, Nicholas G; Wright, Margaret J; Hickie, Ian B
2017-08-01
The Somatic and Psychological HEalth REport (SPHERE) is a 34-item self-report questionnaire that assesses symptoms of mental distress and persistent fatigue. As it was developed as a screening instrument for use mainly in primary care-based clinical settings, its validity and psychometric properties have not been studied extensively in population-based samples. We used non-parametric Item Response Theory to assess scale validity and item properties of the SPHERE-34 scales, collected through four waves of the Brisbane Longitudinal Twin Study (N = 1707, mean age = 12, 51% females; N = 1273, mean age = 14, 50% females; N = 1513, mean age = 16, 54% females, N = 1263, mean age = 18, 56% females). We estimated the heritability of the new scores, their genetic correlation, and their predictive ability in a sub-sample (N = 1993) who completed the Composite International Diagnostic Interview. After excluding items most responsible for noise, sex or wave bias, the SPHERE-34 questionnaire was reduced to 21 items (SPHERE-21), comprising a 14-item scale for anxiety-depression and a 10-item scale for chronic fatigue (3 items overlapping). These new scores showed high internal consistency (alpha > 0.78), moderate three months reliability (ICC = 0.47-0.58) and item scalability (Hi > 0.23), and were positively correlated (phenotypic correlations r = 0.57-0.70; rG = 0.77-1.00). Heritability estimates ranged from 0.27 to 0.51. In addition, both scores were associated with later DSM-IV diagnoses of MDD, social anxiety and alcohol dependence (OR in 1.23-1.47). Finally, a post-hoc comparison showed that several psychometric properties of the SPHERE-21 were similar to those of the Beck Depression Inventory. The scales of SPHERE-21 measure valid and comparable constructs across sex and age groups (from 9 to 28 years). SPHERE-21 scores are heritable, genetically correlated and show good predictive ability of mental health in an Australian-based population sample of young people.
Dietary patterns and odds of Type 2 diabetes in Beirut, Lebanon: a case–control study
2012-01-01
Background In Lebanon, Type 2 diabetes (T2D) has a major public health impact through high disease prevalence, significant downstream pathophysiologic effects, and enormous financial liabilities. Diet is an important environmental factor in the development and prevention of T2D. Dietary patterns may exert greater effects on health than individual foods, nutrients, or food groups. The objective of this study is to examine the association between dietary patterns and the odds of T2D among Lebanese adults. Methods Fifty-eight recently diagnosed cases of T2D and 116 population-based age, sex, and place of residence matched control participants were interviewed. Data collection included a standard socio-demographic and lifestyle questionnaire. Dietary intake was evaluated by a semi-quantitative 97-item food frequency questionnaire. Anthropometric measurements including weight, height, waist circumference, and percent body fat were also obtained. Dietary patterns were identified by factor analysis. Multivariate logistic regression analysis was used to evaluate the associations of extracted patterns with T2D. Pearson correlations between these patterns and obesity markers, energy, and nutrient intakes were also examined. Results Four dietary patterns were identified: Refined Grains & Desserts, Traditional Lebanese, Fast Food and Meat & Alcohol. While scores of the “Refined Grains & Desserts” had the highest correlations with energy (r = 0.74) and carbohydrates (r = 0.22), those of the “Fast Food” had the highest correlation with fat intake (r = 0.34). After adjustment for socio-demographic and lifestyle characteristics, scores of the Refined Grains & Desserts and Fast Food patterns were associated with higher odds of T2D (OR: 3.85, CI: 1.13-11.23 and OR: 2.80, CI: 1.14-5.59; respectively) and scores of the Traditional Lebanese pattern were inversely associated with the odds of T2D (OR: 0.46, CI: 0.22-0.97). Conclusions The findings of this study demonstrate direct associations of the Refined Grains & Desserts and Fast Food patterns with T2D and an inverse association between the Traditional Lebanese pattern and the disease among Lebanese adults. These results may guide the development of nutrition interventions for the prevention and management of T2D among Lebanese adults. PMID:23270372
On Multidimensional Item Response Theory: A Coordinate-Free Approach. Research Report. ETS RR-07-30
ERIC Educational Resources Information Center
Antal, Tamás
2007-01-01
A coordinate-free definition of complex-structure multidimensional item response theory (MIRT) for dichotomously scored items is presented. The point of view taken emphasizes the possibilities and subtleties of understanding MIRT as a multidimensional extension of the classical unidimensional item response theory models. The main theorem of the…
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2010-01-01
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
A Methodology for Zumbo's Third Generation DIF Analyses and the Ecology of Item Responding
ERIC Educational Resources Information Center
Zumbo, Bruno D.; Liu, Yan; Wu, Amery D.; Shear, Benjamin R.; Olvera Astivia, Oscar L.; Ark, Tavinder K.
2015-01-01
Methods for detecting differential item functioning (DIF) and item bias are typically used in the process of item analysis when developing new measures; adapting existing measures for different populations, languages, or cultures; or more generally validating test score inferences. In 2007 in "Language Assessment Quarterly," Zumbo…
Item Vector Plots for the Multidimensional Three-Parameter Logistic Model
ERIC Educational Resources Information Center
Bryant, Damon; Davis, Larry
2011-01-01
This brief technical note describes how to construct item vector plots for dichotomously scored items fitting the multidimensional three-parameter logistic model (M3PLM). As multidimensional item response theory (MIRT) shows promise of being a very useful framework in the test development life cycle, graphical tools that facilitate understanding…
An Item Response Theory Model for Test Bias.
ERIC Educational Resources Information Center
Shealy, Robin; Stout, William
This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…