item response analyses: Topics by Science.gov

Sample records for item response analyses

Item response theory in personality assessment: a demonstration using the MMPI-2 depression scale.

PubMed

Childs, R A; Dahlstrom, W G; Kemp, S M; Panter, A T

2000-03-01

Item response theory (IRT) analyses have, over the past 3 decades, added much to our understanding of the relationships among and characteristics of test items, as revealed in examinees response patterns. Assessment instruments used outside the educational context have only infrequently been analyzed using IRT, however. This study demonstrates the relevance of IRT to personality data through analyses of Scale 2 (the Depression Scale) on the revised Minnesota Multiphasic Personality Inventory (MMPI-2). A rich set of hypotheses regarding the items on this scale, including contrasts among the Harris-Lingoes and Wiener-Harmon subscales and differences in the items measurement characteristics for men and women, are investigated through the IRT analyses.
Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

PubMed

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Measuring pain phenomena after spinal cord injury: Development and psychometric properties of the SCI-QOL Pain Interference and Pain Behavior assessment tools.

PubMed

Cohen, Matthew L; Kisala, Pamela A; Dyson-Hudson, Trevor A; Tulsky, David S

2018-05-01

To develop modern patient-reported outcome measures that assess pain interference and pain behavior after spinal cord injury (SCI). Grounded-theory based qualitative item development; large-scale item calibration field-testing; confirmatory factor analyses; graded response model item response theory analyses; statistical linking techniques to transform scores to the Patient Reported Outcome Measurement Information System (PROMIS) metric. Five SCI Model Systems centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. N/A. Spinal Cord Injury - Quality of Life (SCI-QOL) Pain Interference item bank, SCI-QOL Pain Interference short form, and SCI-QOL Pain Behavior scale. Seven hundred fifty-seven individuals with traumatic SCI completed 58 items addressing various aspects of pain. Items were then separated by whether they assessed pain interference or pain behavior, and poorly functioning items were removed. Confirmatory factor analyses confirmed that each set of items was unidimensional, and item response theory analyses were used to estimate slopes and thresholds for the items. Ultimately, 7 items (4 from PROMIS) comprised the Pain Behavior scale and 25 items (18 from PROMIS) comprised the Pain Interference item bank. Ten of these 25 items were selected to form the Pain Interference short form. The SCI-QOL Pain Interference item bank and the SCI-QOL Pain Behavior scale demonstrated robust psychometric properties. The Pain Interference item bank is available as a computer adaptive test or short form for research and clinical applications, and scores are transformed to the PROMIS metric.
Vegetable parenting practices scale. Item response modeling analyses

PubMed Central

Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom

2015-01-01

Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
Vegetable parenting practices scale: Item response modeling analyses

USDA-ARS?s Scientific Manuscript database

Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...
The Effect of Response Format on the Psychometric Properties of the Narcissistic Personality Inventory: Consequences for Item Meaning and Factor Structure.

PubMed

Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris

2016-04-01

The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed. © The Author(s) 2015.
A HO-IRT Based Diagnostic Assessment System with Constructed Response Items

ERIC Educational Resources Information Center

Yang, Chih-Wei; Kuo, Bor-Chen; Liao, Chen-Huei

2011-01-01

The aim of the present study was to develop an on-line assessment system with constructed response items in the context of elementary mathematics curriculum. The system recorded the problem solving process of constructed response items and transfered the process to response codes for further analyses. An inference mechanism based on artificial…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

ERIC Educational Resources Information Center

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

2016-01-01

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Examination of Polytomous Items' Psychometric Properties According to Nonparametric Item Response Theory Models in Different Test Conditions

ERIC Educational Resources Information Center

Sengul Avsar, Asiye; Tavsancil, Ezel

2017-01-01

This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.

PubMed

Shou, Yiyun; Sellbom, Martin; Xu, Jing

2018-05-01

There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Development and validation of an item response theory-based Social Responsiveness Scale short form.

PubMed

Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T

2017-09-01

Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
Psychometric properties of the Epworth Sleepiness Scale: A factor analysis and item-response theory approach.

PubMed

Pilcher, June J; Switzer, Fred S; Munc, Alec; Donnelly, Janet; Jellen, Julia C; Lamm, Claus

2018-04-01

The purpose of this study is to examine the psychometric properties of the Epworth Sleepiness Scale (ESS) in two languages, German and English. Students from a university in Austria (N = 292; 55 males; mean age = 18.71 ± 1.71 years; 237 females; mean age = 18.24 ± 0.88 years) and a university in the US (N = 329; 128 males; mean age = 18.71 ± 0.88 years; 201 females; mean age = 21.59 ± 2.27 years) completed the ESS. An exploratory-factor analysis was completed to examine dimensionality of the ESS. Item response theory (IRT) analyses were used to provide information about the response rates on the items on the ESS and provide differential item functioning (DIF) analyses to examine whether the items were interpreted differently between the two languages. The factor analyses suggest that the ESS measures two distinct sleepiness constructs. These constructs indicate that the ESS is probing sleepiness in settings requiring active versus passive responding. The IRT analyses found that overall, the items on the ESS perform well as a measure of sleepiness. However, Item 8 and to a lesser extent Item 6 were being interpreted differently by respondents in comparison to the other items. In addition, the DIF analyses showed that the responses between German and English were very similar indicating that there are only minor measurement differences between the two language versions of the ESS. These findings suggest that the ESS provides a reliable measure of propensity to sleepiness; however, it does convey a two-factor approach to sleepiness. Researchers and clinicians can use the German and English versions of the ESS but may wish to exclude Item 8 when calculating a total sleepiness score.
Bifactor and Item Response Theory Analyses of Interviewer Report Scales of Cognitive Impairment in Schizophrenia

PubMed Central

Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert

2011-01-01

We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

PubMed Central

Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

2011-01-01

Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
PROC IRT: A SAS Procedure for Item Response Theory

PubMed Central

Matlock Cole, Ki; Paek, Insu

2017-01-01

This article reviews the procedure for item response theory (PROC IRT) procedure in SAS/STAT 14.1 to conduct item response theory (IRT) analyses of dichotomous and polytomous datasets that are unidimensional or multidimensional. The review provides an overview of available features, including models, estimation procedures, interfacing, input, and output files. A small-scale simulation study evaluates the IRT model parameter recovery of the PROC IRT procedure. The use of the IRT procedure in Statistical Analysis Software (SAS) may be useful for researchers who frequently utilize SAS for analyses, research, and teaching.
Measuring sexual orientation in adolescent health surveys: evaluation of eight school-based surveys.

PubMed

Saewyc, Elizabeth M; Bauer, Greta R; Skay, Carol L; Bearinger, Linda H; Resnick, Michael D; Reis, Elizabeth; Murphy, Aileen

2004-10-01

To examine the performance of various items measuring sexual orientation within 8 school-based adolescent health surveys in the United States and Canada from 1986 through 1999. Analyses examined nonresponse and unsure responses to sexual orientation items compared with other survey items, demographic differences in responses, tests for response set bias, and congruence of responses to multiple orientation items; analytical methods included frequencies, contingency tables with Chi-square, and ANOVA with least significant differences (LSD)post hoc tests; all analyses were conducted separately by gender. In all surveys, nonresponse rates for orientation questions were similar to other sexual questions, but not higher; younger students, immigrants, and students with learning disabilities were more likely to skip items or select "unsure." Sexual behavior items had the lowest nonresponse, but fewer than half of all students reported sexual behavior, limiting its usefulness for indicating orientation. Item placement in the survey, wording, and response set bias all appeared to influence nonresponse and unsure rates. Specific recommendations include standardizing wording across future surveys, and pilot testing items with diverse ages and ethnic groups of teens before use. All three dimensions of orientation should be assessed where possible; when limited to single items, sexual attraction may be the best choice. Specific wording suggestions are offered for future surveys.
Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form.

PubMed

Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W

2015-05-01

To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form

PubMed Central

Kisala, Pamela A.; Tulsky, David S.; Pace, Natalie; Victorson, David; Choi, Seung W.; Heinemann, Allen W.

2015-01-01

Objective To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Design Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Stigma Item Bank Results A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. Conclusions The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications. PMID:26010973
An item response theory analysis of the narcissistic personality inventory.

PubMed

Ackerman, Robert A; Donnellan, M Brent; Robins, Richard W

2012-01-01

This research uses item response theory methods to evaluate the Narcissistic Personality Inventory (NPI; Raskin & Terry, 1988). Analyses using the 2-parameter logistic model were conducted on the total score and the Corry, Merritt, Mrug, and Pamp (2008) and Ackerman et al. (2011) subscales for the NPI. In addition to offering precise information about the psychometric properties of the NPI item pool, these analyses generated insights that can be used to develop new measures of the personality constructs embedded within this frequently used inventory.
ITEM RESPONSE ANALYSES OF THE EDUCATIONAL OPPORTUNITIES SURVEY 9TH GRADE STUDENT QUESTIONNAIRE.

ERIC Educational Resources Information Center

WEINFELD, FREDERIC D.; AND OTHERS

THIS REPORT PRESENTS THE ANALYSIS OF QUESTIONNAIRE ITEM RESPONSES FROM THE NINTH-GRADE STUDENT QUESTIONNAIRE ADMINISTERED AS PART OF THE EDUCATIONAL OPPORTUNITIES SURVEY. THE ANALYSES WERE PERFORMED TO DOCUMENT SOME OF THE BASIC DATA FROM THE SURVEY, TO MAKE THEM AVAILABLE TO INTERESTED EDUCATIONAL RESEARCHERS, AND TO REWORK THE BASIC DATA FOR…

Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7.

PubMed

Kisala, Pamela A; Tulsky, David S; Kalpakjian, Claire Z; Heinemann, Allen W; Pohlig, Ryan T; Carle, Adam; Choi, Seung W

2015-05-01

To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.
Qualitative Development of the PROMIS® Pediatric Stress Response Item Banks

PubMed Central

Gardner, William; Pajer, Kathleen; Riley, Anne W.; Forrest, Christopher B.

2013-01-01

Objective To describe the qualitative development of the Patient-Reported Outcome Measurement Information System (PROMIS®) Pediatric Stress Response item banks. Methods Stress response concepts were specified through a literature review and interviews with content experts, children, and parents. A library comprising 2,677 items derived from 71 instruments was developed. Items were classified into conceptual categories; new items were written and redundant items were removed. Items were then revised based on cognitive interviews (n = 39 children), readability analyses, and translatability reviews. Results 2 pediatric Stress Response sub-domains were identified: somatic experiences (43 items) and psychological experiences (64 items). Final item pools cover the full range of children’s stress experiences. Items are comprehensible among children aged ≥8 years and ready for translation. Conclusions Child- and parent-report versions of the item banks assess children’s somatic and psychological states when demands tax their adaptive capabilities. PMID:23124904
An Assessment of Character and Leadership Development Latent Factor Structures through Confirmatory Factor, Item Response Theory, and Latent Class Analyses

ERIC Educational Resources Information Center

Higginbotham, David L.

2013-01-01

This study leveraged the complementary nature of confirmatory factor (CFA), item response theory (IRT), and latent class (LCA) analyses to strengthen the rigor and sophistication of evaluation of two new measures of the Air Force Academy's "leader of character" definition--the Character Mosaic Virtues (CMV) and the Leadership Mosaic…
The Relationship of Item-Level Response Times with Test-Taker and Item Variables in an Operational CAT Environment. LSAC Research Report Series.

ERIC Educational Resources Information Center

Swygert, Kimberly A.

In this study, data from an operational computerized adaptive test (CAT) were examined in order to gather information concerning item response times in a CAT environment. The CAT under study included multiple-choice items measuring verbal, quantitative, and analytical reasoning. The analyses included the fitting of regression models describing the…
Influence of Skip Patterns on Item Non-Response in a Substance Use Survey of 7th to 12th Grade Students

ERIC Educational Resources Information Center

Ding, Kele; Olds, R. Scott; Thombs, Dennis L.

2009-01-01

This retrospective case study assessed the influence of item non-response error on subsequent response to questionnaire items assessing adolescent alcohol and marijuana use. Post-hoc analyses were conducted on survey results obtained from 4,371 7th to 12th grade students in Ohio in 2005. A skip pattern design in a conventional questionnaire…
An HIV/AIDS Knowledge Scale for Adolescents: Item Response Theory Analyses Based on Data from a Study in South Africa and Tanzania

ERIC Educational Resources Information Center

Aaro, Leif E.; Breivik, Kyrre; Klepp, Knut-Inge; Kaaya, Sylvia; Onya, Hans E.; Wubs, Annegreet; Helleve, Arnfinn; Flisher, Alan J.

2011-01-01

A 14-item human immunodeficiency virus/acquired immunodeficiency syndrome knowledge scale was used among school students in 80 schools in 3 sites in Sub-Saharan Africa (Cape Town and Mankweng, South Africa, and Dar es Salaam, Tanzania). For each item, an incorrect or don't know response was coded as 0 and correct response as 1. Exploratory factor…
Research applications for an Object and Action Naming Battery to assess naming skills in adult Spanish-English bilingual speakers.

PubMed

Edmonds, Lisa A; Donovan, Neila J

2014-06-01

Virtually no valid materials are available to evaluate confrontation naming in Spanish-English bilingual adults in the U.S. In a recent study, a large group of young Spanish-English bilingual adults were evaluated on An Object and Action Naming Battery (Edmonds & Donovan in Journal of Speech, Language, and Hearing Research 55:359-381, 2012). Rasch analyses of the responses resulted in evidence for the content and construct validity of the retained items. However, the scope of that study did not allow for extensive examination of individual item characteristics, group analyses of participants, or the provision of testing and scoring materials or raw data, thereby limiting the ability of researchers to administer the test to Spanish-English bilinguals and to score the items with confidence. In this study, we present the in-depth information described above on the basis of further analyses, including (1) online searchable spreadsheets with extensive empirical (e.g., accuracy and name agreeability) and psycholinguistic item statistics; (2) answer sheets and instructions for scoring and interpreting the responses to the Rasch items; (3) tables of alternative correct responses for English and Spanish; (4) ability strata determined for all naming conditions (English and Spanish nouns and verbs); and (5) comparisons of accuracy across proficiency groups (i.e., Spanish dominant, English dominant, and balanced). These data indicate that the Rasch items from An Object and Action Naming Battery are valid and sensitive for the evaluation of naming in young Spanish-English bilingual adults. Additional information based on participant responses for all of the items on the battery can provide researchers with valuable information to aid in stimulus development and response interpretation for experimental studies in this population.
Gender and Minority Achievement Gaps in Science in Eighth Grade: Item Analyses of Nationally Representative Data. Research Report. ETS RR-17-36

ERIC Educational Resources Information Center

Qian, Xiaoyu; Nandakumar, Ratna; Glutting, Joseoph; Ford, Danielle; Fifield, Steve

2017-01-01

In this study, we investigated gender and minority achievement gaps on 8th-grade science items employing a multilevel item response methodology. Both gaps were wider on physics and earth science items than on biology and chemistry items. Larger gender gaps were found on items with specific topics favoring male students than other items, for…
Assessing Patients’ Experiences with Communication Across the Cancer Care Continuum

PubMed Central

Mazor, Kathleen M.; Street, Richard L.; Sue, Valerie M.; Williams, Andrew E.; Rabin, Borsika A.; Arora, Neeraj K.

2016-01-01

Objective To evaluate the relevance, performance and potential usefulness of the Patient Assessment of cancer Communication Experiences (PACE) items. Methods Items focusing on specific communication goals related to exchanging information, fostering healing relationships, responding to emotions, making decisions, enabling self-management, and managing uncertainty were tested via a retrospective, cross-sectional survey of adults who had been diagnosed with cancer. Analyses examined response frequencies, inter-item correlations, and coefficient alpha. Results A total of 366 adults were included in the analyses. Relatively few selected “Does Not Apply”, suggesting that items tap relevant communication experiences. Ratings of whether specific communication goals were achieved were strongly correlated with overall ratings of communication, suggesting item content reflects important aspects of communication. Coefficient alpha was ≥.90 for each item set, indicating excellent reliability. Variations in the percentage of respondents selecting the most positive response across items suggest results can identify strengths and weaknesses. Conclusion The PACE items tap relevant, important aspects of communication during cancer care, and may be useful to cancer care teams desiring detailed feedback. PMID:26979476
Item response theory analysis of Centers for Disease Control and Prevention Health-Related Quality of Life (CDC HRQOL) items in adults with arthritis.

PubMed

Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C

2016-03-12

Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
Bayesian Modal Estimation of the Four-Parameter Item Response Model in Real, Realistic, and Idealized Data Sets.

PubMed

Waller, Niels G; Feuerstahler, Leah

2017-01-01

In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

PubMed Central

Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

2014-01-01

We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
Development of the multiple sclerosis (MS) early mobility impairment questionnaire (EMIQ).

PubMed

Ziemssen, Tjalf; Phillips, Glenn; Shah, Ruchit; Mathias, Adam; Foley, Catherine; Coon, Cheryl; Sen, Rohini; Lee, Andrew; Agarwal, Sonalee

2016-10-01

The Early Mobility Impairment Questionnaire (EMIQ) was developed to facilitate early identification of mobility impairments in multiple sclerosis (MS) patients. We describe the initial development of the EMIQ with a focus on the psychometric evaluation of the questionnaire using classical and item response theory methods. The initial 20-item EMIQ was constructed by clinical specialists and qualitatively tested among people with MS and physicians via cognitive interviews. Data from an observational study was used to make additional updates to the instrument based on exploratory factor analysis (EFA) and item response theory (IRT) analysis, and psychometric analyses were performed to evaluate the reliability and validity of the final instrument's scores and screening properties (i.e., sensitivity and specificity). Based on qualitative interview analyses, a revised 15-item EMIQ was included in the observational study. EFA, IRT and item-to-item correlation analyses revealed redundant items which were removed leading to the final nine-item EMIQ. The nine-item EMIQ performed well with respect to: test-retest reliability (ICC = 0.858); internal consistency (α = 0.893); convergent validity; and known-groups methods for construct validity. A cut-point of 41 on the 0-to-100 scale resulted in sufficient sensitivity and specificity statistics for viably identifying patients with mobility impairment. The EMIQ is a content valid and psychometrically sound instrument for capturing MS patients' experience with mobility impairments in a clinical practice setting. Additional research is suggested to further confirm the EMIQ's screening properties over time.
Item Response Theory Analyses of the Parent and Teacher Ratings of the DSM-IV ADHD Rating Scale

ERIC Educational Resources Information Center

Gomez, Rapson

2008-01-01

The graded response model (GRM), which is based on item response theory (IRT), was used to evaluate the psychometric properties of the inattention and hyperactivity/impulsivity symptoms in an ADHD rating scale. To accomplish this, parents and teachers completed the DSM-IV ADHD Rating Scale (DARS; Gomez et al., "Journal of Child Psychology and…
Dyadic confirmatory factor analysis of the inflammatory bowel disease family responsibility questionnaire.

PubMed

Greenley, Rachel Neff; Reed-Knight, Bonney; Blount, Ronald L; Wilson, Helen W

2013-09-01

Evaluate the factor structure of youth and maternal involvement ratings on the Inflammatory Bowel Disease Family Responsibility Questionnaire, a measure of family allocation of condition management responsibilities in pediatric inflammatory bowel disease. Participants included 251 youth aged 11-18 years with inflammatory bowel disease and their mothers. Item-level descriptive analyses, subscale internal consistency estimates, and confirmatory factor analyses of youth and maternal involvement were conducted using a dyadic data-analytic approach. Results supported the validity of 4 conceptually derived subscales including general health maintenance, social aspects, condition management tasks, and nutrition domains. Additionally, results indicated adequate support for the factor structure of a 21-item youth involvement measure and strong support for a 16-item maternal involvement measure. Additional empirical support for the validity of the Inflammatory Bowel Disease Family Responsibility Questionnaire was provided. Future research to replicate current findings and to examine the measure's clinical utility is warranted.
Development and psychometric evaluation of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions.

PubMed

Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike

2018-01-01

To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.
Development and Evaluation of the PROMIS® Pediatric Positive Affect Item Bank, Child-Report and Parent-Proxy Editions.

PubMed

Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B

2018-03-01

The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.
Evaluating and Refining the Construct of Sexual Quality With Item Response Theory: Development of the Quality of Sex Inventory.

PubMed

Shaw, Amanda M; Rogge, Ronald D

2016-02-01

This study took a critical look at the construct of sexual quality. The 65 items of four well-validated self-report measures of sexual satisfaction (the Index of Sexual Satisfaction [ISS], Hudson, Harrison, & Crosscup, 1981; the Global Measure of Sexual Satisfaction [GMSEX], Lawrance & Byers, 1995; the Pinney Sexual Satisfaction Inventory [PSSI], Pinney, Gerrard, & Denney, 1987; the Young Sexual Satisfaction Scale [YSSS], Young, Denny, Luquis, & Young, 1998) and an additional 74 potential sexual quality items were given to 3060 online participants. Using Item Response Theory (IRT), we demonstrated that the ISS, YSSS, and PSSI scales provided suboptimal levels of precision in assessing sexual quality, particularly given the length of those scales. Exploratory factor analyses, IRT, differential item functioning analyses, and longitudinal responsiveness analyses were used to develop and evaluate the Quality of Sex Inventory. Results suggested that, in comparison to existing scales, the QSI (1) offers investigators and clinicians more theoretically focused scales, (2) distinguishes sexual satisfaction from sexual dissatisfaction, and (3) offers greater precision and power for detecting differences with (4) comparably high levels of responsiveness for detecting change over time despite being notably shorter than most of the existing scales. The QSI-satisfaction subscales demonstrated strong convergent validity with other measures of sexual satisfaction and excellent construct validity with anchor scales from the nomological net surrounding that construct, suggesting that they continue to assess the same theoretical construct as prior scales. Implications for research are discussed.
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.

PubMed

Eichenbaum, Alexander E; Marcus, David K; French, Brian F

2017-06-01

This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
Validating the European Health Literacy Survey Questionnaire in people with type 2 diabetes: Latent trait analyses applying multidimensional Rasch modelling and confirmatory factor analysis.

PubMed

Finbråten, Hanne Søberg; Pettersen, Kjell Sverre; Wilde-Larsson, Bodil; Nordström, Gun; Trollvik, Anne; Guttersrud, Øystein

2017-11-01

To validate the European Health Literacy Survey Questionnaire (HLS-EU-Q47) in people with type 2 diabetes mellitus. The HLS-EU-Q47 latent variable is outlined in a framework with four cognitive domains integrated in three health domains, implying 12 theoretically defined subscales. Valid and reliable health literacy measurers are crucial to effectively adapt health communication and education to individuals and groups of patients. Cross-sectional study applying confirmatory latent trait analyses. Using a paper-and-pencil self-administered approach, 388 adults responded in March 2015. The data were analysed using the Rasch methodology and confirmatory factor analysis. Response violation (response dependency) and trait violation (multidimensionality) of local independence were identified. Fitting the "multidimensional random coefficients multinomial logit" model, 1-, 3- and 12-dimensional Rasch models were applied and compared. Poor model fit and differential item functioning were present in some items, and several subscales suffered from poor targeting and low reliability. Despite multidimensional data, we did not observe any unordered response categories. Interpreting the domains as distinct but related latent dimensions, the data fit a 12-dimensional Rasch model and a 12-factor confirmatory factor model best. Therefore, the analyses did not support the estimation of one overall "health literacy score." To support the plausibility of claims based on the HLS-EU score(s), we suggest: removing the health care aspect to reduce the magnitude of multidimensionality; rejecting redundant items to avoid response dependency; adding "harder" items and applying a six-point rating scale to improve subscale targeting and reliability; and revising items to improve model fit and avoid bias owing to person factors. © 2017 John Wiley & Sons Ltd.

Assessing patients' experiences with communication across the cancer care continuum.

PubMed

Mazor, Kathleen M; Street, Richard L; Sue, Valerie M; Williams, Andrew E; Rabin, Borsika A; Arora, Neeraj K

2016-08-01

To evaluate the relevance, performance and potential usefulness of the Patient Assessment of cancer Communication Experiences (PACE) items. Items focusing on specific communication goals related to exchanging information, fostering healing relationships, responding to emotions, making decisions, enabling self-management, and managing uncertainty were tested via a retrospective, cross-sectional survey of adults who had been diagnosed with cancer. Analyses examined response frequencies, inter-item correlations, and coefficient alpha. A total of 366 adults were included in the analyses. Relatively few selected Does Not Apply, suggesting that items tap relevant communication experiences. Ratings of whether specific communication goals were achieved were strongly correlated with overall ratings of communication, suggesting item content reflects important aspects of communication. Coefficient alpha was ≥.90 for each item set, indicating excellent reliability. Variations in the percentage of respondents selecting the most positive response across items suggest results can identify strengths and weaknesses. The PACE items tap relevant, important aspects of communication during cancer care, and may be useful to cancer care teams desiring detailed feedback. The PACE is a new tool for eliciting patients' perspectives on communication during cancer care. It is freely available online for practitioners, researchers and others. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Using Item Response Theory and Adaptive Testing in Online Career Assessment

ERIC Educational Resources Information Center

Betz, Nancy E.; Turner, Brandon M.

2011-01-01

The present article describes the potential utility of item response theory (IRT) and adaptive testing for scale evaluation and for web-based career assessment. The article describes the principles of both IRT and adaptive testing and then illustrates these with reference to data analyses and simulation studies of the Career Confidence Inventory…
Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

PubMed Central

Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J

2004-01-01

Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
Better assessment of physical function: item improvement is neglected but essential

PubMed Central

2009-01-01

Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.

PubMed

Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

2009-01-01

Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
Determining the Sensitivity of CAT-ASVAB (Computerized Adaptive Testing- Armed Services Vocational Aptitude Battery) Scores to Changes in Item Response Curves with the Medium of Administration

DTIC Science & Technology

1986-08-01

most examinees. Therefore it appears psychometrically ac - ceptable for the CAT -ASVAB project to proceed without item recalibration based on...MEMORANDUM DETERMINING THE SENSITIVITY OF CAT -ASVAB SCORES TO CHANGES IN ITEM RESPONSE CURVES WITH THE MEDIUM OF ADMINISTRATION D. R. Divgi...Subj: Center for Naval Analyses Research Memorandum 86-189 End: (1) CNA Research Memorandum 86-189, "Determining the Sensitivity of CAT -ASVAB
Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.

PubMed

Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of a new Rasch-based scoring algorithm for the National Eye Institute Visual Functioning Questionnaire to improve its interpretability.

PubMed

Petrillo, Jennifer; Bressler, Neil M; Lamoureux, Ecosse; Ferreira, Alberto; Cano, Stefan

2017-08-14

The NEI VFQ-25 has undergone psychometric evaluation in patients with varying ocular conditions and the general population. However, important limitations which may affect the interpretation of clinical trial results have been previously identified, such as concerns with reliability and validity. The purpose of this study was to evaluate the National Eye Institute Visual Functioning Questionnaire (NEI VFQ-25) and make recommendations for a revised scoring structure, with a view to improving its psychometric performance and interpretability. Rasch Measurement Theory analyses were conducted in two stages using pooled baseline NEI VFQ-25 data for 2487 participants with retinal diseases enrolled in six clinical trials. In stage 1, we examined: scale-to-sample targeting; thresholds for item response options; item fit statistics; stability; local dependence; and reliability. In stage 2, a post-hoc revision of the scoring structure (VFQ-28R) was created and psychometrically re-evaluated. In stage 1, we found that the NEI VFQ-25 was mis-targeted to the sample, and had disordered response thresholds (15/25 items) and mis-fitting items (8/25 items). However, items appeared to be stable (differential item functioning for three items), have minimal item dependency (one pair of items) and good reliability (person-separation index, 0.93). In stage 2, the modified Rasch-scored NEI VFQ-28-R was assessed. It comprised two broad domains: Activity Limitation (19 items) and Socio-Emotional Functioning (nine items). The NEI VFQ-28-R demonstrated improved performance with fewer disordered response thresholds (no items), less item misfit (three items) and improved population targeting (reduced ceiling effect) compared with the NEI VFQ-25. Compared with the original version, the proposed NEI VFQ-28-R, with Rasch-based scoring and a two-domain structure, appears to offer improved psychometric performance and interpretability of the vision-related quality of life scale for the population analysed.
Slower is not always better: Response-time evidence clarifies the limited role of miserly information processing in the Cognitive Reflection Test

PubMed Central

Pitchford, Melanie; Ball, Linden J.; Hunt, Thomas E.; Steel, Richard

2017-01-01

We report a study examining the role of ‘cognitive miserliness’ as a determinant of poor performance on the standard three-item Cognitive Reflection Test (CRT). The cognitive miserliness hypothesis proposes that people often respond incorrectly on CRT items because of an unwillingness to go beyond default, heuristic processing and invest time and effort in analytic, reflective processing. Our analysis (N = 391) focused on people’s response times to CRT items to determine whether predicted associations are evident between miserly thinking and the generation of incorrect, intuitive answers. Evidence indicated only a weak correlation between CRT response times and accuracy. Item-level analyses also failed to demonstrate predicted response-time differences between correct analytic and incorrect intuitive answers for two of the three CRT items. We question whether participants who give incorrect intuitive answers on the CRT can legitimately be termed cognitive misers and whether the three CRT items measure the same general construct. PMID:29099840
Development of the PROMIS nicotine dependence item banks.

PubMed

Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

2014-09-01

Nicotine dependence is a core construct important for understanding cigarette smoking and smoking cessation behavior. This article describes analyses conducted to develop and evaluate item banks for assessing nicotine dependence among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of nicotine dependence items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess dependence. A total of 32 items were included in the Nicotine Dependence item banks; 22 items are common across daily and nondaily smokers, 5 are unique to daily smokers, and 5 are unique to nondaily smokers. For both daily and nondaily smokers, the Nicotine Dependence item banks are strongly unidimensional, highly reliable (reliability = 0.97 and 0.97, respectively), and perform similarly across gender, age, and race/ethnicity groups. SFs common to daily and nondaily smokers consist of 8 and 4 items (reliability = 0.91 and 0.81, respectively). Results from simulated CATs showed that dependence can be assessed with very good precision for most respondents using fewer than 6 items adaptively selected from the item banks. Nicotine dependence on cigarettes can be assessed on the basis of these item banks via one of the SFs, by using CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of the PROMIS negative psychosocial expectancies of smoking item banks.

PubMed

Stucky, Brian D; Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Cerully, Jennifer; Kuhfeld, Megan; Hansen, Mark; Cai, Li

2014-09-01

Negative psychosocial expectancies of smoking include aspects of social disapproval and disappointment in oneself. This paper describes analyses conducted to develop and evaluate item banks for assessing psychosocial expectancies among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of psychosocial expectancies items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess psychosocial expectancies. A total of 21 items were included in the Psychosocial Expectancies item banks: 14 items are common across daily and nondaily smokers, 6 are unique to daily, and 1 is unique to nondaily. For both daily and nondaily smokers, the Psychosocial Expectancies item banks are strongly unidimensional, highly reliable (reliability = 0.95 and 0.93, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.85). Results from simulated CATs showed that, on average, fewer than 8 items are needed to assess psychosocial expectancies with adequate precision when using the item banks. Psychosocial expectancies of smoking can be assessed on the basis of these item banks via the SF, by using CAT, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Evaluation properties of the French version of the OUT-PATSAT35 satisfaction with care questionnaire according to classical and item response theory analyses.

PubMed

Panouillères, M; Anota, A; Nguyen, T V; Brédart, A; Bosset, J F; Monnier, A; Mercier, M; Hardouin, J B

2014-09-01

The present study investigates the properties of the French version of the OUT-PATSAT35 questionnaire, which evaluates the outpatients' satisfaction with care in oncology using classical analysis (CTT) and item response theory (IRT). This cross-sectional multicenter study includes 692 patients who completed the questionnaire at the end of their ambulatory treatment. CTT analyses tested the main psychometric properties (convergent and divergent validity, and internal consistency). IRT analyses were conducted separately for each OUT-PATSAT35 domain (the doctors, the nurses or the radiation therapists and the services/organization) by models from the Rasch family. We examined the fit of the data to the model expectations and tested whether the model assumptions of unidimensionality, monotonicity and local independence were respected. A total of 605 (87.4%) respondents were analyzed with a mean age of 64 years (range 29-88). Internal consistency for all scales separately and for the three main domains was good (Cronbach's α 0.74-0.98). IRT analyses were performed with the partial credit model. No disordered thresholds of polytomous items were found. Each domain showed high reliability but fitted poorly to the Rasch models. Three items in particular, the item about "promptness" in the doctors' domain and the items about "accessibility" and "environment" in the services/organization domain, presented the highest default of fit. A correct fit of the Rasch model can be obtained by dropping these items. Most of the local dependence concerned items about "information provided" in each domain. A major deviation of unidimensionality was found in the nurses' domain. CTT showed good psychometric properties of the OUT-PATSAT35. However, the Rasch analysis revealed some misfitting and redundant items. Taking the above problems into consideration, it could be interesting to refine the questionnaire in a future study.
Perception that "everything requires a lot of effort": transcultural SCL-25 item validation.

PubMed

Moreau, Nicolas; Hassan, Ghayda; Rousseau, Cécile; Chenguiti, Khalid

2009-09-01

This brief report illustrates how the migration context can affect specific item validity of mental health measures. The SCL-25 was administered to 432 recently settled immigrants (220 Haitian and 212 Arabs). We performed descriptive analyses, as well as Infit and Outfit statistics analyses using WINSTEPS Rasch Measurement Software based on Item Response Theory. The participants' comments about the item You feel everything requires a lot of effort in the SCL-25 were also qualitatively analyzed. Results revealed that the item You feel everything requires a lot of effort is an outlier and does not adjust in an expected and valid fashion with its cluster items, as it is over-endorsed by Haitian and Arab healthy participants. Our study thus shows that, in transcultural mental health research, the cultural and migratory contexts may interact and significantly influence the meaning of some symptom items and consequently, the validity of symptom scales.
Examining the Impact of Unscorable Item Responses on the Validity and Interpretability of MMPI-2/MMPI-2-RF Restructured Clinical (RC) Scale Scores

ERIC Educational Resources Information Center

Dragon, Wendy R.; Ben-Porath, Yossef S.; Handel, Richard W.

2012-01-01

This article examined the impact of unscorable item responses on the psychometric validity and practical interpretability of scores on the Restructured Clinical (RC) Scales of the Minnesota Multiphasic Personality Inventory-2/Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2/MMPI-2-RF). In analyses conducted with five…
The Shortened Raven Standard Progressive Matrices: Item Response Theory-Based Psychometric Analyses and Normative Data

ERIC Educational Resources Information Center

Van der Elst, Wim; Ouwehand, Carolijn; van Rijn, Peter; Lee, Nikki; Van Boxtel, Martin; Jolles, Jelle

2013-01-01

The purpose of the present study was to evaluate the psychometric properties of a shortened version of the Raven Standard Progressive Matrices (SPM) under an item response theory framework (the one- and two-parameter logistic models). The shortened Raven SPM was administered to N = 453 cognitively healthy adults aged between 24 and 83 years. The…
Measuring the ICF components of impairment, activity limitation and participation restriction: an item analysis using classical test theory and item response theory

PubMed Central

Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie

2009-01-01

Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in the development of measures. PMID:19422677
Missing data in FFQs: making assumptions about item non-response.

PubMed

Lamb, Karen E; Olstad, Dana Lee; Nguyen, Cattram; Milte, Catherine; McNaughton, Sarah A

2017-04-01

FFQs are a popular method of capturing dietary information in epidemiological studies and may be used to derive dietary exposures such as nutrient intake or overall dietary patterns and diet quality. As FFQs can involve large numbers of questions, participants may fail to respond to all questions, leaving researchers to decide how to deal with missing data when deriving intake measures. The aim of the present commentary is to discuss the current practice for dealing with item non-response in FFQs and to propose a research agenda for reporting and handling missing data in FFQs. Single imputation techniques, such as zero imputation (assuming no consumption of the item) or mean imputation, are commonly used to deal with item non-response in FFQs. However, single imputation methods make strong assumptions about the missing data mechanism and do not reflect the uncertainty created by the missing data. This can lead to incorrect inference about associations between diet and health outcomes. Although the use of multiple imputation methods in epidemiology has increased, these have seldom been used in the field of nutritional epidemiology to address missing data in FFQs. We discuss methods for dealing with item non-response in FFQs, highlighting the assumptions made under each approach. Researchers analysing FFQs should ensure that missing data are handled appropriately and clearly report how missing data were treated in analyses. Simulation studies are required to enable systematic evaluation of the utility of various methods for handling item non-response in FFQs under different assumptions about the missing data mechanism.
Development of the PROMIS coping expectancies of smoking item banks.

PubMed

Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

2014-09-01

Smoking is a coping strategy for many smokers who then have difficulty finding new ways to cope with negative affect when they quit. This paper describes analyses conducted to develop and evaluate item banks for assessing the coping expectancies of smoking for daily and nondaily smokers. Using data from a large sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning (DIF) analyses (according to gender, age, and ethnicity) to arrive at a unidimensional set of items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) for assessing coping expectancies of smoking. For both daily and nondaily smokers, the unidimensional Coping Expectancies item banks (21 items) are relatively DIF free and are highly reliable (0.96 and 0.97, respectively). A common 4-item SF for daily and nondaily smokers also showed good reliability (0.85). Adaptive tests required an average of 4.3 and 3.7 items for simulated daily and nondaily respondents, respectively, and achieved reliabilities of 0.91 for both when the maximum test length was 10 items. This research provides a new set of items that can be used to reliably assess coping expectancies of smoking, through a SF, CAT, or a tailored set selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Rasch analysis of the Mini-Mental Adjustment to Cancer Scale (mini-MAC) among a heterogeneous sample of long-term cancer survivors: A cross-sectional study

PubMed Central

2012-01-01

Background The mini-Mental Adjustment to Cancer Scale (mini-MAC) is a well-recognised, popular measure of coping in psycho-oncology and assesses five cancer-specific coping strategies. It has been suggested that these five subscales could be grouped to form the over-arching adaptive and maladptive coping subscales to facilitate the interpretation and clinical application of the scale. Despite the popularity of the mini-MAC, few studies have examined its psychometric properties among long-term cancer survivors, and further validation of the mini-MAC is needed to substantiate its use with the growing population of survivors. Therefore, this study examined the psychometric properties and dimensionality of the mini-MAC in a sample of long-term cancer survivors using Rasch analysis. Methods RUMM 2030 was used to analyse the mini-MAC data (n=851). Separate Rasch analyses were conducted for each of the original mini-MAC subscales as well as the over-arching adaptive and maladaptive coping subscales to examine summary and individual model fit statistics, person separation index (PSI), response format, local dependency, targeting, item bias (or differential item functioning -DIF), and dimensionality. Results For the fighting spirit, fatalism, and helplessness-hopelessness subscales, a revised three-point response format seemed more optimal than the original four-point response. To achieve model fit, items were deleted from four of the five subscales – Anxious Preoccupation items 7, 25, and 29; Cognitive Avoidance items 11 and 17; Fighting Spirit item 18; and Helplessness-Hopelessness items 16 and 20. For those subscales with sufficient items, analyses supported unidimensionality. Combining items to form the adaptive and maladaptive subscales was partially supported. Conclusions The original five subscales required item deletion and/or rescaling to improve goodness of fit to the Rasch model. While evidence was found for overarching subscales of adaptive and maladaptive coping, extensive modifications were necessary to achieve this result. Further exploration and validation of over-arching subscales assessing adaptive and maladaptive coping is necessary with cancer survivors. PMID:22607052
A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

PubMed

Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

2018-04-10

To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.

Development and Initial Validation of the Medical Fear Survey-Short Version

ERIC Educational Resources Information Center

Olatunji, Bunmi O.; Ebesutani, Chad; Sawchuk, Craig N.; McKay, Dean; Lohr, Jeffrey M.; Kleinknecht, Ronald A.

2012-01-01

The present investigation employs item response theory (IRT) to develop an abbreviated Medical Fear Survey (MFS). Application of IRT analyses in Study 1 (n = 931) to the original 50-item MFS resulted in a 25-item shortened version. Examination of the location parameters also resulted in a reduction of the Likert-type scaling of the MFS by removing…
Bifactor and Item Response Theory Analyses of Interviewer Report Scales of Cognitive Impairment in Schizophrenia

ERIC Educational Resources Information Center

Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert

2011-01-01

A psychometric analysis of 2 interview-based measures of cognitive deficits was conducted: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on 2 occasions to a sample of people with…
Assessment of Computer and Information Literacy in ICILS 2013: Do Different Item Types Measure the Same Construct?

ERIC Educational Resources Information Center

Ihme, Jan Marten; Senkbeil, Martin; Goldhammer, Frank; Gerick, Julia

2017-01-01

The combination of different item formats is found quite often in large scale assessments, and analyses on the dimensionality often indicate multi-dimensionality of tests regarding the task format. In ICILS 2013, three different item types (information-based response tasks, simulation tasks, and authoring tasks) were used to measure computer and…
Developing the Communicative Participation Item Bank: Rasch Analysis Results From a Spasmodic Dysphonia Sample

PubMed Central

Baylor, Carolyn R.; Yorkston, Kathryn M.; Eadie, Tanya L.; Miller, Robert M.; Amtmann, Dagmar

2011-01-01

Purpose The purpose of this study was to conduct the initial psychometric analyses of the Communicative Participation Item Bank—a new self-report instrument designed to measure the extent to which communication disorders interfere with communicative participation. This item bank is intended for community-dwelling adults across a range of communication disorders. Method A set of 141 candidate items was administered to 208 adults with spasmodic dysphonia. Participants rated the extent to which their condition interfered with participation in various speaking communication situations. Questionnaires were administered online or in a paper version per participant preference. Participants also completed the Voice Handicap Index (B. H. Jacobson et al., 1997) and a demographic questionnaire. Rasch analyses were conducted using Winsteps software (J. M. Linacre, 1991). Results The results show that items functioned better when the 5-category response format was recoded to a 4-category format. After removing 8 items that did not fit the Rasch model, the remaining 133 items demonstrated strong evidence of sufficient unidimensionality, with the model accounting for 89.3% of variance. Item location values ranged from −2.73 to 2.20 logits. Conclusions Preliminary Rasch analyses of the Communicative Participation Item Bank show strong psychometric properties. Further testing in populations with other communication disorders is needed. PMID:19717652
The Divergent Meanings of Life Satisfaction: Item Response Modeling of the Satisfaction with Life Scale in Greenland and Norway

ERIC Educational Resources Information Center

Vitterso, Joar; Biswas-Diener, Robert; Diener, Ed

2005-01-01

Cultural differences in response to the Satisfaction With Life Scale (SWLS) items is investigated. Data were fit to a mixed Rasch model in order to identify latent classes of participants in a combined sample of Norwegians (N = 461) and Greenlanders (N = 180). Initial analyses showed no mean difference in life satisfaction between the two…
Practical Implications of Test Dimensionality for Item Response Theory Calibration of the Medical College Admission Test. MCAT Monograph.

ERIC Educational Resources Information Center

Childs, Ruth A.; Oppler, Scott H.

The use of item response theory (IRT) in the Medical College Admission Test (MCAT) testing program has been limited. This study provides a basis for future IRT analyses of the MCAT by exploring the dimensionality of each of the MCAT's three multiple-choice test sections (Verbal Reasoning, Physical Sciences, and Biological Sciences) and the…
Adjusting the Adjusted X[superscript 2]/df Ratio Statistic for Dichotomous Item Response Theory Analyses: Does the Model Fit?

ERIC Educational Resources Information Center

Tay, Louis; Drasgow, Fritz

2012-01-01

Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…
Item response analysis of the Positive and Negative Syndrome Scale

PubMed Central

Santor, Darcy A; Ascher-Svanum, Haya; Lindenmayer, Jean-Pierre; Obenchain, Robert L

2007-01-01

Background Statistical models based on item response theory were used to examine (a) the performance of individual Positive and Negative Syndrome Scale (PANSS) items and their options, (b) the effectiveness of various subscales to discriminate among individual differences in symptom severity, and (c) the appropriateness of cutoff scores recently recommended by Andreasen and her colleagues (2005) to establish symptom remission. Methods Option characteristic curves were estimated using a nonparametric item response model to examine the probability of endorsing each of 7 options within each of 30 PANSS items as a function of standardized, overall symptom severity. Our data were baseline PANSS scores from 9205 patients with schizophrenia or schizoaffective disorder who were enrolled between 1995 and 2003 in either a large, naturalistic, observational study or else in 1 of 12 randomized, double-blind, clinical trials comparing olanzapine to other antipsychotic drugs. Results Our analyses show that the majority of items forming the Positive and Negative subscales of the PANSS perform very well. We also identified key areas for improvement or revision in items and options within the General Psychopathology subscale. The Positive and Negative subscale scores are not only more discriminating of individual differences in symptom severity than the General Psychopathology subscale score, but are also more efficient on average than the 30-item total score. Of the 8 items recently recommended to establish symptom remission, 1 performed markedly different from the 7 others and should either be deleted or rescored requiring that patients achieve a lower score of 2 (rather than 3) to signal remission. Conclusion This first item response analysis of the PANSS supports its sound psychometric properties; most PANSS items were either very good or good at assessing overall severity of illness. These analyses did identify some items which might be further improved for measuring individual severity differences or for defining remission thresholds. Findings also suggest that the Positive and Negative subscales are more sensitive to change than the PANSS total score and, thus, may constitute a "mini PANSS" that may be more reliable, require shorter administration and training time, and possibly reduce sample sizes needed for future research. PMID:18005449
A semi-parametric within-subject mixture approach to the analyses of responses and response times.

PubMed

Molenaar, Dylan; Bolsinova, Maria; Vermunt, Jeroen K

2018-05-01

In item response theory, modelling the item response times in addition to the item responses may improve the detection of possible between- and within-subject differences in the process that resulted in the responses. For instance, if respondents rely on rapid guessing on some items but not on all, the joint distribution of the responses and response times will be a multivariate within-subject mixture distribution. Suitable parametric methods to detect these within-subject differences have been proposed. In these approaches, a distribution needs to be assumed for the within-class response times. In this paper, it is demonstrated that these parametric within-subject approaches may produce false positives and biased parameter estimates if the assumption concerning the response time distribution is violated. A semi-parametric approach is proposed which resorts to categorized response times. This approach is shown to hardly produce false positives and parameter bias. In addition, the semi-parametric approach results in approximately the same power as the parametric approach. © 2017 The British Psychological Society.
Analysing task design and students' responses to context-based problems through different analytical frameworks

NASA Astrophysics Data System (ADS)

Broman, Karolina; Bernholt, Sascha; Parchmann, Ilka

2015-05-01

Background:Context-based learning approaches are used to enhance students' interest in, and knowledge about, science. According to different empirical studies, students' interest is improved by applying these more non-conventional approaches, while effects on learning outcomes are less coherent. Hence, further insights are needed into the structure of context-based problems in comparison to traditional problems, and into students' problem-solving strategies. Therefore, a suitable framework is necessary, both for the analysis of tasks and strategies. Purpose:The aim of this paper is to explore traditional and context-based tasks as well as students' responses to exemplary tasks to identify a suitable framework for future design and analyses of context-based problems. The paper discusses different established frameworks and applies the Higher-Order Cognitive Skills/Lower-Order Cognitive Skills (HOCS/LOCS) taxonomy and the Model of Hierarchical Complexity in Chemistry (MHC-C) to analyse traditional tasks and students' responses. Sample:Upper secondary students (n=236) at the Natural Science Programme, i.e. possible future scientists, are investigated to explore learning outcomes when they solve chemistry tasks, both more conventional as well as context-based chemistry problems. Design and methods:A typical chemistry examination test has been analysed, first the test items in themselves (n=36), and thereafter 236 students' responses to one representative context-based problem. Content analysis using HOCS/LOCS and MHC-C frameworks has been applied to analyse both quantitative and qualitative data, allowing us to describe different problem-solving strategies. Results:The empirical results show that both frameworks are suitable to identify students' strategies, mainly focusing on recall of memorized facts when solving chemistry test items. Almost all test items were also assessing lower order thinking. The combination of frameworks with the chemistry syllabus has been found successful to analyse both the test items as well as students' responses in a systematic way. The framework can therefore be applied in the design of new tasks, the analysis and assessment of students' responses, and as a tool for teachers to scaffold students in their problem-solving process. Conclusions:This paper gives implications for practice and for future research to both develop new context-based problems in a structured way, as well as providing analytical tools for investigating students' higher order thinking in their responses to these tasks.
Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks

PubMed Central

Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando

2014-01-01

Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843
Development of a Computer-Adaptive Physical Function Instrument for Social Security Administration Disability Determination

PubMed Central

Ni, Pengsheng; McDonough, Christine M.; Jette, Alan M.; Bogusz, Kara; Marfeo, Elizabeth E.; Rasch, Elizabeth K.; Brandt, Diane E.; Meterko, Mark; Chan, Leighton

2014-01-01

Objectives To develop and test an instrument to assess physical function (PF) for Social Security Administration (SSA) disability programs, the SSA-PF. Item Response Theory (IRT) analyses were used to 1) create a calibrated item bank for each of the factors identified in prior factor analyses, 2) assess the fit of the items within each scale, 3) develop separate Computer-Adaptive Test (CAT) instruments for each scale, and 4) conduct initial psychometric testing. Design Cross-sectional data collection; IRT analyses; CAT simulation. Setting Telephone and internet survey. Participants Two samples: 1,017 SSA claimants, and 999 adults from the US general population. Interventions None. Main Outcome Measure Model fit statistics, correlation and reliability coefficients, Results IRT analyses resulted in five unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. Comparing the simulated CATs to the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared to those of a sample of US adults. Conclusions The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. PMID:23578594
Development of a computer-adaptive physical function instrument for Social Security Administration disability determination.

PubMed

Ni, Pengsheng; McDonough, Christine M; Jette, Alan M; Bogusz, Kara; Marfeo, Elizabeth E; Rasch, Elizabeth K; Brandt, Diane E; Meterko, Mark; Haley, Stephen M; Chan, Leighton

2013-09-01

To develop and test an instrument to assess physical function for Social Security Administration (SSA) disability programs, the SSA-Physical Function (SSA-PF) instrument. Item response theory (IRT) analyses were used to (1) create a calibrated item bank for each of the factors identified in prior factor analyses, (2) assess the fit of the items within each scale, (3) develop separate computer-adaptive testing (CAT) instruments for each scale, and (4) conduct initial psychometric testing. Cross-sectional data collection; IRT analyses; CAT simulation. Telephone and Internet survey. Two samples: SSA claimants (n=1017) and adults from the U.S. general population (n=999). None. Model fit statistics, correlation, and reliability coefficients. IRT analyses resulted in 5 unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. On comparing the simulated CATs with the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared with those of a sample of U.S. adults. The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The Job Responsibilities Scale: Invariance in a Longitudinal Prospective Study.

ERIC Educational Resources Information Center

Ludlow, Larry H.; Lunz, Mary E.

1998-01-01

The degree of invariance of the Job Responsibilities Scale for medical technologists was studied for 1993 and 1995, conducting factor analyses of data from each year (1063 and 665 individuals, respectively). Nearly identical factor patterns were found, and Rasch rating scale analyses found nearly identical pairs of item estimates. Implications are…
Measuring depression after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Depression item bank and linkage with PHQ-9.

PubMed

Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Bombardier, Charles H; Pohlig, Ryan T; Heinemann, Allen W; Carle, Adam; Choi, Seung W

2015-05-01

To develop a calibrated spinal cord injury-quality of life (SCI-QOL) item bank, computer adaptive test (CAT), and short form to assess depressive symptoms experienced by individuals with SCI, transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a crosswalk to the Patient Health Questionnaire (PHQ)-9. We used grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, item response theory (IRT) analyses, and statistical linking techniques to transform scores to a PROMIS metric and to provide a crosswalk with the PHQ-9. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. Spinal Cord Injury--Quality of Life (SCI-QOL) Depression Item Bank Individuals with SCI were involved in all phases of SCI-QOL development. A sample of 716 individuals with traumatic SCI completed 35 items assessing depression, 18 of which were PROMIS items. After removing 7 non-PROMIS items, factor analyses confirmed a unidimensional pool of items. We used a graded response IRT model to estimate slopes and thresholds for the 28 retained items. The SCI-QOL Depression measure correlated 0.76 with the PHQ-9. The SCI-QOL Depression item bank provides a reliable and sensitive measure of depressive symptoms with scores reported in terms of general population norms. We provide a crosswalk to the PHQ-9 to facilitate comparisons between measures. The item bank may be administered as a CAT or as a short form and is suitable for research and clinical applications.
Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

NASA Astrophysics Data System (ADS)

Greenberg, Ariela Caren

Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
Development of the Contact Lens User Experience: CLUE Scales

PubMed Central

Wirth, R. J.; Edwards, Michael C.; Henderson, Michael; Henderson, Terri; Olivares, Giovanna; Houts, Carrie R.

2016-01-01

ABSTRACT Purpose The field of optometry has become increasingly interested in patient-reported outcomes, reflecting a common trend occurring across the spectrum of healthcare. This article reviews the development of the Contact Lens User Experience: CLUE system designed to assess patient evaluations of contact lenses. CLUE was built using modern psychometric methods such as factor analysis and item response theory. Methods The qualitative process through which relevant domains were identified is outlined as well as the process of creating initial item banks. Psychometric analyses were conducted on the initial item banks and refinements were made to the domains and items. Following this data-driven refinement phase, a second round of data was collected to further refine the items and obtain final item response theory item parameters estimates. Results Extensive qualitative work identified three key areas patients consider important when describing their experience with contact lenses. Based on item content and psychometric dimensionality assessments, the developing CLUE instruments were ultimately focused around four domains: comfort, vision, handling, and packaging. Item response theory parameters were estimated for the CLUE item banks (377 items), and the resulting scales were found to provide precise and reliable assignment of scores detailing users’ subjective experiences with contact lenses. Conclusions The CLUE family of instruments, as it currently exists, exhibits excellent psychometric properties. PMID:27383257
Introduction to bifactor polytomous item response theory analysis.

PubMed

Toland, Michael D; Sulis, Isabella; Giambona, Francesca; Porcu, Mariano; Campbell, Jonathan M

2017-02-01

A bifactor item response theory model can be used to aid in the interpretation of the dimensionality of a multifaceted questionnaire that assumes continuous latent variables underlying the propensity to respond to items. This model can be used to describe the locations of people on a general continuous latent variable as well as on continuous orthogonal specific traits that characterize responses to groups of items. The bifactor graded response (bifac-GR) model is presented in contrast to a correlated traits (or multidimensional GR model) and unidimensional GR model. Bifac-GR model specification, assumptions, estimation, and interpretation are demonstrated with a reanalysis of data (Campbell, 2008) on the Shared Activities Questionnaire. We also show the importance of marginalizing the slopes for interpretation purposes and we extend the concept to the interpretation of the information function. To go along with the illustrative example analyses, we have made available supplementary files that include command file (syntax) examples and outputs from flexMIRT, IRTPRO, R, Mplus, and STATA. Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.jsp.2016.11.001. Data needed to reproduce analyses in this article are available as supplemental materials (online only) in the Appendix of this article. Copyright © 2016 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
Validity of personality measurement in adults with anxiety disorders: psychometric properties of the Spanish NEO-FFI-R using Rasch analyses

PubMed Central

Inchausti, Felix; Mole, Joe; Fonseca-Pedrero, Eduardo; Ortuño-Sierra, Javier

2015-01-01

The aim of this study was to analyse the psychometric properties of the Spanish NEO Five Factor Inventory–Revised (NEO-FFI-R) using Rasch analyses, in order to test its rating scale functioning, the reliability of scores, internal structure, and differential item functioning (DIF) by gender in a psychiatric sample. The NEO-FFI-R responses of 433 Spanish adults (154 males) with an anxiety disorder as primary diagnosis were analysed using the Rasch model for rating scales. Two intermediate categories of response (‘neutral’ and ‘agree’) malfunctioned in the Neuroticism and Conscientiousness scales. In addition, model reliabilities were lower than expected in Agreeableness and Neuroticism, and the item fit values indicated each scale had items that did not achieve moderate to high discrimination on its dimension, particularly in the Agreeableness scale. Concerning unidimensionality, the five NEO-FFI-R scales showed large first components of unexplained variance. Finally, DIF by gender was detected in many items. The results suggest that the scores of the Spanish NEO-FFI-R are unreliable in psychiatric samples and cannot be generalized between males and females, especially in the Openness, Conscientiousness, and Agreeableness scales. Future directions for testing and refinement should be developed before the NEO-FFI-R can be used reliably in clinical samples. PMID:25954224
Measuring pregnancy planning: An assessment of the London Measure of Unplanned Pregnancy among urban, south Indian women

PubMed Central

Rocca, Corinne H.; Krishnan, Suneeta; Barrett, Geraldine; Wilson, Mark

2010-01-01

We evaluated the psychometric properties of the London Measure of Unplanned Pregnancy among Indian women using classical methods and Item Response Modeling. The scale exhibited good internal consistency and internal structure, with overall scores correlating well with each item’s response categories. Items performed similarly for pregnant and non-pregnant women, and scores decreased with increasing parity, providing evidence for validity. Analyses also detected limitations, including infrequent selection of middle response categories and some evidence of differential item functioning by parity. We conclude that the LMUP represents an improvement over existing measures but recommend steps for enhancing scale performance for this cultural context. PMID:21170147

Development of and Field-Test Results for the CAHPS PCMH Survey

PubMed Central

Scholle, Sarah Hudson; Vuong, Oanh; Ding, Lin; Fry, Stephanie; Gallagher, Patricia; Brown, Julie A.; Hays, Ron D.; Cleary, Paul D.

2017-01-01

Objective To develop and evaluate survey questions that assess processes of care relevant to Patient-Centered Medical Homes (PCMHs). Research Design We convened expert panels, reviewed evidence on effective care practices and existing surveys, elicited broad public input, and conducted cognitive interviews and a field test to develop items relevant to PCMHs that could be added to the CAHPS® Clinician & Group (CG-CAHPS) 1.0 Survey. Surveys were tested using a two-contact mail protocol in 10 adult and 33 pediatric practices (both private and community health centers) in Massachusetts. A total of 4,875 completed surveys were received (overall response rate of 25%). Analyses We calculated the rate of valid responses for each item. We conducted exploratory factor analyses and estimated item-to-total correlations, individual and site level reliability, and correlations among proposed multi-item composites. Results Ten items in four new domains (Comprehensiveness, Information, Self-Management Support, and Shared Decision-Making) and four items in two existing domains (Access and Coordination of Care) were selected to be supplemental items to be used in conjunction with the adult CG-CAHPS 1.0 survey. For the child version, four items in each of two new domains (Information and Self-Management Support) and five items in existing domains (Access, Comprehensiveness-Prevention, Coordination of Care) were selected. Conclusions This study provides support for the reliability and validity of new items to supplement the CG-CAHPS 1.0 survey to assess aspects of primary care that are important attributes of Patient-Centered Medical Homes. PMID:23064272
Methodology for developing and evaluating the PROMIS smoking item banks.

PubMed

Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando

2014-09-01

This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of the PROMIS health expectancies of smoking item banks.

PubMed

Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cerully, Jennifer; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

Smokers' health-related outcome expectancies are associated with a number of important constructs in smoking research, yet there are no measures currently available that focus exclusively on this domain. This paper describes the development and evaluation of item banks for assessing the health expectancies of smoking. Using data from a sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of health expectancies items for daily and nondaily smokers. We also evaluated the performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess health expectancies. A total of 24 items were included in the Health Expectancies item banks; 13 items are common across daily and nondaily smokers, 6 are unique to daily, and 5 are unique to nondaily. For both daily and nondaily smokers, the Health Expectancies item banks are unidimensional, reliable (reliability = 0.95 and 0.96, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.87). Results from simulated CATs showed that health expectancies can be assessed with good precision with an average of 5-6 items adaptively selected from the item banks. Health expectancies of smoking can be assessed on the basis of these item banks via SFs, CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods

PubMed Central

Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-01-01

Background The eHealth Literacy Scale (eHEALS) is a tool to assess consumers’ comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. Objective The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Methods Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. Results CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. Conclusions The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers’ eHealth literacy. PMID:28400356
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.

PubMed

Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E

2018-02-02

In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Rasch analyses of the Activities-specific Balance Confidence Scale with individuals 50 years and older with lower limb amputations

PubMed Central

Sakakibara, Brodie M.; Miller, William C.; Backman, Catherine L.

2012-01-01

Objective To explore shortened response formats for use with the Activities-specific Balance Confidence scale and then: 1) evaluate the unidimensionality of the scale; 2) evaluate the item difficulty; 3) evaluate the scale for redundancy and content gaps; and 4) evaluate the item standard error of measurement (SEM) and internal consistency reliability among aging individuals (≥50 years) with a lower-limb amputation living in the community. Design Secondary analysis of cross-sectional survey and chart review data. Setting Out-patient amputee clinics, Ontario, Canada. Participants Four hundred forty eight community living adults, at least 50 years old (mean = 68 years), who have used a prosthesis for at least 6 months for a major unilateral lower limb amputation. Three hundred twenty five (72.5%) were men. Intervention N/a Main Outcome Measure(s) Activities-specific Balance Confidence Scale. Results A 5-option response format outperformed 4- and 6-option formats. Factor analyses confirmed a unidimensional scale. The distance between response options is not the same for all items on the scale, evident by the Partial Credit Model (PCM) having a better fit to the data than the Rating Scale Model. Two items, however, did not fit the PCM within statistical reason. Revising the wording of the two items may resolve the misfit, and improve the construct validity and lower the SEM. Overall, the difficulty of the scale’s items is appropriate for use with aging individuals with lower-limb amputation, and is most reliable (Cronbach ∝ = 0.94) for use with individuals with moderately low balance confidence levels. Conclusions The ABC-scale with a simplified 5-option response format is a valid and reliable measure of balance confidence for use with individuals aging with a lower limb amputation. PMID:21704978
A New Look at the Psychometrics of the Parenting Scale through the Lens of Item Response Theory

PubMed Central

Lorber, Michael F.; Xu, Shu; Smith Slep, Amy M.; Bulling, Lisanne; O'Leary, Susan G.

2015-01-01

The psychometrics of the Parenting Scale's Overreactivity and Laxness subscales were evaluated using item response theory (IRT) techniques. The IRT analyses were based on two community samples of cohabiting parents of 3- to 8-year-old children, combined to yield an N of 852 families. The results supported the utility of the Overreactivity and Laxness subscales, particularly in discriminating among parents in the mid to upper reaches of each construct. The original versions of the Overreactivity and Laxness subscales were more reliable than alternative, shorter versions identified in replicated factor analyses from previously published research and in IRT analyses in the present research. Moreover, in several cases, the original versions of these subscales, in comparison with the shortened versions, exhibited greater six-month stabilities and correlations with child externalizing behavior and couple relationship satisfaction. Reliability was greater for the Laxness than for the Overreactivity subscale. Item performance on each subscale was highly variable. Together, the present findings are generally supportive of the psychometrics of the Parenting Scale, particularly for clinical research and practice. They also suggest areas for further development. PMID:24828855
A new look at the psychometrics of the parenting scale through the lens of item response theory.

PubMed

Lorber, Michael F; Xu, Shu; Slep, Amy M Smith; Bulling, Lisanne; O'Leary, Susan G

2014-01-01

The psychometrics of the Parenting Scale's Overreactivity and Laxness subscales were evaluated using item response theory (IRT) techniques. The IRT analyses were based on 2 community samples of cohabiting parents of 3- to 8-year-old children, combined to yield a total sample size of 852 families. The results supported the utility of the Overreactivity and Laxness subscales, particularly in discriminating among parents in the mid to upper reaches of each construct. The original versions of the Overreactivity and Laxness subscales were more reliable than alternative, shorter versions identified in replicated factor analyses from previously published research and in IRT analyses in the present research. Moreover, in several cases, the original versions of these subscales, in comparison with the shortened versions, exhibited greater 6-month stabilities and correlations with child externalizing behavior and couple relationship satisfaction. Reliability was greater for the Laxness than for the Overreactivity subscale. Item performance on each subscale was highly variable. Together, the present findings are generally supportive of the psychometrics of the Parenting Scale, particularly for clinical research and practice. They also suggest areas for further development.
Extending item response theory to online homework

NASA Astrophysics Data System (ADS)

Kortemeyer, Gerd

2014-06-01

Item response theory (IRT) becomes an increasingly important tool when analyzing "big data" gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large-enrollment physics course for scientists and engineers, the study compares outcomes from IRT analyses of exam and homework data, and then proceeds to investigate the effects of each confounding factor introduced in the online realm. It is found that IRT yields the correct trends for learner ability and meaningful item parameters, yet overall agreement with exam data is moderate. It is also found that learner ability and item discrimination is robust over a wide range with respect to model assumptions and introduced noise. Item difficulty is also robust, but over a narrower range.
The Protective Behavioral Strategies for Marijuana Scale: Further examination using item response theory.

PubMed

Pedersen, Eric R; Huang, Wenjing; Dvorak, Robert D; Prince, Mark A; Hummer, Justin F

2017-08-01

Given recent state legislation legalizing marijuana for recreational purposes and majority popular opinion favoring these laws, we developed the Protective Behavioral Strategies for Marijuana scale (PBSM) to identify strategies that may mitigate the harms related to marijuana use among those young people who choose to use the drug. In the current study, we expand on the initial exploratory study of the PBSM to further validate the measure with a large and geographically diverse sample (N = 2,117; 60% women, 30% non-White) of college students from 11 different universities across the United States. We sought to develop a psychometrically sound item bank for the PBSM and to create a short assessment form that minimizes respondent burden and time. Quantitative item analyses, including exploratory and confirmatory factor analyses with item response theory (IRT) and evaluation of differential item functioning (DIF), revealed an item bank of 36 items that was examined for unidimensionality and good content coverage, as well as a short form of 17 items that is free of bias in terms of gender (men vs. women), race (White vs. non-White), ethnicity (Hispanic vs. non-Hispanic), and recreational marijuana use legal status (state recreational marijuana was legal for 25.5% of participants). We also provide a scoring table for easy transformation from sum scores to IRT scale scores. The PBSM item bank and short form associated strongly and negatively with past month marijuana use and consequences. The measure may be useful to researchers and clinicians conducting intervention and prevention programs with young adults. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Using Anchoring Vignettes to Assess Group Differences in General Self-Rated Health

ERIC Educational Resources Information Center

Grol-Prokopczyk, Hanna; Freese, Jeremy; Hauser, Robert M.

2011-01-01

This article addresses a potentially serious problem with the widely used self-rated health (SRH) survey item: that different groups have systematically different ways of using the item's response categories. Analyses based on unadjusted SRH may thus yield misleading results. The authors evaluate anchoring vignettes as a possible solution to this…
Average Revisited in Context

ERIC Educational Resources Information Center

Watson, Jane; Chick, Helen

2012-01-01

This paper analyses the responses of 247 middle school students to items requiring the concept of average in three different contexts: a city's weather reported in maximum daily temperature, the number of children in a family, and the price of houses. The mixed but overall disappointing performance on the six items in the three contexts indicates…
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations.

PubMed

Teresi, Jeanne A; Ocepek-Welikson, Katja; Cook, Karon F; Kleinman, Marjorie; Ramirez, Mildred; Reid, M Carrington; Siu, Albert

2016-01-01

Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System ® (PROMIS ® ) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, "How much did pain interfere with enjoyment of social activities?" was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta ( θ ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness.
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations

PubMed Central

Teresi, Jeanne A.; Ocepek-Welikson, Katja; Cook, Karon F.; Kleinman, Marjorie; Ramirez, Mildred; Reid, M. Carrington; Siu, Albert

2017-01-01

Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System® (PROMIS®) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. Methods DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. Results The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, “How much did pain interfere with enjoyment of social activities?” was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta (θ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Conclusions Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness. PMID:28983449
Household item ownership and self-rated health: material and psychosocial explanations

PubMed Central

Pikhart, Hynek; Bobak, Martin; Rose, Richard; Marmot, Michael

2003-01-01

Background There has been an ongoing debate whether the effects of socioeconomic factors on health are due to absolute poverty and material factors or to relative deprivation and psychosocial factors. In the present analyses, we examined the importance for health of material factors, which may have a direct effect on health, and of those that may affect health indirectly, through psychosocial mechanisms. Methods Random national samples of men and women in Hungary (n = 973) and Poland (n = 1141) were interviewed (response rates 58% and 59%, respectively). The subjects reported their self-rated health, socioeconomic circumstances, including ownership of different household items, and perceived control over life. Household items were categorised as "basic needs", "socially oriented", and "luxury". We examined the association between the ownership of different groups of items and self-rated health. Since the lists of household items were different in Hungary and Poland, we conducted parallel identical analyses of the Hungarian and Polish data. Results The overall prevalence of poor or very poor health was 13% in Poland and 25% in Hungary. Education, material deprivation and the number of household items were all associated with poor health in bivariate analyses. All three groups of household items were positively related to self-rated health in age-adjusted analyses. The relation of basic needs items to poor health disappeared after controlling for other socioeconomic variables (mainly material deprivation). The relation of socially oriented and luxury items to poor health, however, persisted in multivariate models. The results were similar in both datasets. Conclusions These data suggest that health is influenced by both material and psychosocial aspects of socioeconomic factors. PMID:14641929
Responding to Nonwords in the Lexical Decision Task: Insights from the English Lexicon Project

PubMed Central

Yap, Melvin J.; Sibley, Daragh E.; Balota, David A.; Ratcliff, Roger; Rueckl, Jay

2014-01-01

Researchers have extensively documented how various statistical properties of words (e.g., word-frequency) influence lexical processing. However, the impact of lexical variables on nonword decision-making performance is less clear. This gap is surprising, since a better specification of the mechanisms driving nonword responses may provide valuable insights into early lexical processes. In the present study, item-level and participant-level analyses were conducted on the trial-level lexical decision data for almost 37,000 nonwords in the English Lexicon Project in order to identify the influence of different psycholinguistic variables on nonword lexical decision performance, and to explore individual differences in how participants respond to nonwords. Item-level regression analyses reveal that nonword response time was positively correlated with number of letters, number of orthographic neighbors, number of affixes, and baseword number of syllables, and negatively correlated with Levenshtein orthographic distance and baseword frequency. Participant-level analyses also point to within- and between-session stability in nonword responses across distinct sets of items, and intriguingly reveal that higher vocabulary knowledge is associated with less sensitivity to some dimensions (e.g., number of letters) but more sensitivity to others (e.g., baseword frequency). The present findings provide well-specified and interesting new constraints for informing models of word recognition and lexical decision. PMID:25329078
Development and validation of the Overall Depression Severity and Impairment Scale.

PubMed

Bentley, Kate H; Gallagher, Matthew W; Carl, Jenna R; Barlow, David H

2014-09-01

The need to capture severity and impairment of depressive symptomatology is widespread. Existing depression scales are lengthy and largely focus on individual symptoms rather than resulting impairment. The Overall Depression Severity and Impairment Scale (ODSIS) is a 5-item, continuous measure designed for use across heterogeneous mood disorders and with subthreshold depressive symptoms. This study examined the psychometric properties of the ODSIS in outpatients in a clinic for emotional disorders (N = 100), undergraduate students (N = 566), and community-based adults (N = 189). Internal consistency, latent structure, item response theory, classification accuracy, convergent and discriminant validity, and differential item functioning analyses were conducted. ODSIS scores exhibited excellent internal consistency, and confirmatory factor analyses supported a unidimensional structure. Item response theory results demonstrated that the ODSIS provides more information about individuals with high levels of depression than those with low levels of depression. Responses on the ODSIS discriminated well between individuals with and without a mood disorder and depression-related severity across clinical and subclinical levels. A cut score of 8 correctly classified 82% of outpatients as with or without a mood disorder; it evidenced a favorable balance of sensitivity and specificity and of positive and negative predictive values. The ODSIS demonstrated good convergent and discriminant validity, and results indicate that items function similarly across clinical and nonclinical samples. Overall, findings suggest that the ODSIS is a valid tool for measuring depression-related severity and impairment. The brevity and ease of use of the ODSIS support its utility for screening and monitoring treatment response across a variety of settings. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

PubMed

Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

2018-02-01

The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Use of a safety climate questionnaire in UK health care: factor structure, reliability and usability.

PubMed

Hutchinson, A; Cooper, K L; Dean, J E; McIntosh, A; Patterson, M; Stride, C B; Laurence, B E; Smith, C M

2006-10-01

To explore the factor structure, reliability, and potential usefulness of a patient safety climate questionnaire in UK health care. Four acute hospital trusts and nine primary care trusts in England. The questionnaire used was the 27 item Teamwork and Safety Climate Survey. Thirty three healthcare staff commented on the wording and relevance. The questionnaire was then sent to 3650 staff within the 13 NHS trusts, seeking to achieve at least 600 responses as the basis for the factor analysis. 1307 questionnaires were returned (36% response). Factor analyses and reliability analyses were carried out on 897 responses from staff involved in direct patient care, to explore how consistently the questions measured the underlying constructs of safety climate and teamwork. Some questionnaire items related to multiple factors or did not relate strongly to any factor. Five items were discarded. Two teamwork factors were derived from the remaining 11 teamwork items and three safety climate factors were derived from the remaining 11 safety items. Internal consistency reliabilities were satisfactory to good (Cronbach's alpha > or =0.69 for all five factors). This is one of the few studies to undertake a detailed evaluation of a patient safety climate questionnaire in UK health care and possibly the first to do so in primary as well as secondary care. The results indicate that a 22 item version of this safety climate questionnaire is useable as a research instrument in both settings, but also demonstrates a more general need for thorough validation of safety climate questionnaires before widespread usage.
Sexual orientation in the 2013 national health interview survey: a quality assessment.

PubMed

Dahlhamer, James M; Galinsky, Adena M; Joestl, Sarah S; Ward, Brian W

2014-12-01

Objective-This report presents a set of quality analyses of sexual orientation data collected in the 2013 National Health Interview Survey (NHIS). NHIS sexual orientation estimates are compared with those from the National Survey of Family Growth (NSFG) and the National Health and Nutrition Examination Survey (NHANES). Selected health outcomes by sexual orientation are compared between NHIS and NSFG. Assessments of item nonresponse, item response times, and responses to follow-up questions to the sexual orientation question are also presented. Methods-NHIS is a multipurpose health survey conducted continuously throughout the year by the Centers for Disease Control and Prevention's National Center for Health Statistics. Analyses in this report were based on NHIS data collected in 2013 from 34,557 adults aged 18 and over. Sampling weights were used to produce national estimates that are representative of the civilian noninstitutionalized U.S. adult population. Data from the 2006-2010 NSFG and 2009-2012 NHANES were used for the comparisons. Results-Based on the 2013 NHIS data, 96.6% of adults identified as straight, 1.6% identified as gay/lesbian, and 0.7% identified as bisexual. The remaining 1.1% of adults identified as ''something else,'' stated ''I don't know the answer,'' or refused to answer. Responses to follow-up questions suggest that the sexual orientation question is producing little classification error. In addition, largely similar patterns of association between sexual orientation and health were observed for NHIS and NSFG. Analyses of item nonresponse rates revealed few data quality issues, although item response times suggest possible shortcutting of the question and comprehension problems for select respondents. All material appearing in this report is in the public domain and may be reproduced or copied without permission; citation as to source, however, is appreciated.

The Communicative Participation Item Bank (CPIB): Item bank calibration and development of a disorder-generic short form

PubMed Central

Baylor, Carolyn; Yorkston, Kathryn; Eadie, Tanya; Kim, Jiseon; Chung, Hyewon; Amtmann, Dagmar

2015-01-01

Purpose The purpose of this study was to calibrate the items for the Communicative Participation Item Bank (CPIB) using Item Response Theory (IRT). One overriding objective was to examine if the IRT item parameters would be consistent across different diagnostic groups, thereby allowing creation of a disorder-generic instrument. The intended outcomes were the final item bank and a short form ready for clinical and research applications. Methods Self-report data were collected from 701 individuals representing four diagnoses: multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis and head and neck cancer. Participants completed the CPIB and additional self-report questionnaires. CPIB data were analyzed using the IRT Graded Response Model (GRM). Results The initial set of 94 candidate CPIB items were reduced to an item bank of 46 items demonstrating unidimensionality, local independence, good item fit, and good measurement precision. Differential item function (DIF) analyses detected no meaningful differences across diagnostic groups. A 10-item, disorder-generic short form was generated. Conclusions The CPIB provides speech-language pathologists with a unidimensional, self-report outcomes measurement instrument dedicated to the construct of communicative participation. This instrument may be useful to clinicians and researchers wanting to implement measures of communicative participation in their work. PMID:23816661
Measuring the quality of life in hypertension according to Item Response Theory

PubMed Central

Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

2017-01-01

ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764
Cognitive diagnosis modelling incorporating item response times.

PubMed

Zhan, Peida; Jiao, Hong; Liao, Dandan

2018-05-01

To provide more refined diagnostic feedback with collateral information in item response times (RTs), this study proposed joint modelling of attributes and response speed using item responses and RTs simultaneously for cognitive diagnosis. For illustration, an extended deterministic input, noisy 'and' gate (DINA) model was proposed for joint modelling of responses and RTs. Model parameter estimation was explored using the Bayesian Markov chain Monte Carlo (MCMC) method. The PISA 2012 computer-based mathematics data were analysed first. These real data estimates were treated as true values in a subsequent simulation study. A follow-up simulation study with ideal testing conditions was conducted as well to further evaluate model parameter recovery. The results indicated that model parameters could be well recovered using the MCMC approach. Further, incorporating RTs into the DINA model would improve attribute and profile correct classification rates and result in more accurate and precise estimation of the model parameters. © 2017 The British Psychological Society.
The Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES): item response theory findings.

PubMed

Grigg, Kaine; Manderson, Lenore

2016-03-17

Racism and associated discrimination are pervasive and persistent challenges with multiple cumulative deleterious effects contributing to inequities in various health outcomes. Globally, research over the past decade has shown consistent associations between racism and negative health concerns. Such research confirms that race endures as one of the strongest predictors of poor health. Due to the lack of validated Australian measures of racist attitudes, RACES (Racism, Acceptance, and Cultural-Ethnocentrism Scale) was developed. Here, we examine RACES' psychometric properties, including the latent structure, utilising Item Response Theory (IRT). Unidimensional and Multidimensional Rating Scale Model (RSM) Rasch analyses were utilised with 296 Victorian primary school students and 182 adolescents and 220 adults from the Australian community. RACES was demonstrated to be a robust 24-item three-dimensional scale of Accepting Attitudes (12 items), Racist Attitudes (8 items), and Ethnocentric Attitudes (4 items). RSM Rasch analyses provide strong support for the instrument as a robust measure of racist attitudes in the Australian context, and for the overall factorial and construct validity of RACES across primary school children, adolescents, and adults. RACES provides a reliable and valid measure that can be utilised across the lifespan to evaluate attitudes towards all racial, ethnic, cultural, and religious groups. A core function of RACES is to assess the effectiveness of interventions to reduce community levels of racism and in turn inequities in health outcomes within Australia.
A large-scale, long-term study of scale drift: The micro view and the macro view

NASA Astrophysics Data System (ADS)

He, W.; Li, S.; Kingsbury, G. G.

2016-11-01

The development of measurement scales for use across years and grades in educational settings provides unique challenges, as instructional approaches, instructional materials, and content standards all change periodically. This study examined the measurement stability of a set of Rasch measurement scales that have been in place for almost 40 years. In order to investigate the stability of these scales, item responses were collected from a large set of students who took operational adaptive tests using items calibrated to the measurement scales. For the four scales that were examined, item samples ranged from 2183 to 7923 items. Each item was administered to at least 500 students in each grade level, resulting in approximately 3000 responses per item. Stability was examined at the micro level analysing change in item parameter estimates that have occurred since the items were first calibrated. It was also examined at the macro level, involving groups of items and overall test scores for students. Results indicated that individual items had changes in their parameter estimates, which require further analysis and possible recalibration. At the same time, the results at the total score level indicate substantial stability in the measurement scales over the span of their use.
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Applied Cognition – General Concerns, Short Forms in Ethnically Diverse Groups

PubMed Central

Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P.; Crane, Paul K.; Cella, David; Teresi, Jeanne A.

2017-01-01

Aims The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System® (PROMIS®) Applied Cognition – General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample (n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. Methods DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. Results DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: “I have had to work really hard to pay attention or I would make a mistake” and “I have had trouble shifting back and forth between different activities that require thinking”. For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Conclusion Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition – General Concerns short form item set. One item, “It has seemed like my brain was not working as well as usual” might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups. PMID:28523238
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Applied Cognition - General Concerns, Short Forms in Ethnically Diverse Groups.

PubMed

Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P; Crane, Paul K; Cella, David; Teresi, Jeanne A

2016-01-01

The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System ® (PROMIS ® ) Applied Cognition - General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample ( n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: "I have had to work really hard to pay attention or I would make a mistake" and "I have had trouble shifting back and forth between different activities that require thinking". For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition - General Concerns short form item set. One item, "It has seemed like my brain was not working as well as usual" might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups.
Development and initial validation of a brief self-report measure of cognitive dysfunction in fibromyalgia.

PubMed

Kratz, Anna L; Schilling, Stephen G; Goesling, Jenna; Williams, David A

2015-06-01

Pain is often the focus of research and clinical care in fibromyalgia (FM); however, cognitive dysfunction is also a common, distressing, and disabling symptom in FM. Current efforts to address this problem are limited by the lack of a comprehensive, valid measure of subjective cognitive dysfunction in FM that is easily interpretable, accessible, and brief. The purpose of this study was to leverage cognitive functioning item banks that were developed as part of the Patient Reported Outcomes Measurement Information System (PROMIS) to devise a 10-item short form measure of cognitive functioning for use in FM. In study 1, a nationwide (U.S.) sample of 1,035 adults with FM (age range = 18-82, 95.2% female) completed 2 cognitive item pools. Factor analyses and item response theory analyses were used to identify dimensionality and optimally performing items. A recommended 10-item measure, called the Multidimensional Inventory of Subjective Cognitive Impairment (MISCI) was created. In study 2, 232 adults with FM completed the MISCI and a legacy measure of cognitive functioning that is used in FM clinical trials, the Multiple Ability Self-Report Questionnaire (MASQ). The MISCI showed excellent internal reliability, low ceiling/floor effects, and good convergent validity with the MASQ (r = -.82). This paper presents the MISCI, a 10-item measure of cognitive dysfunction in FM, developed through classical test theory and item response theory. This brief but comprehensive measure shows evidence of excellent construct validity through large correlations with a lengthy legacy measure of cognitive functioning. Copyright © 2015 American Pain Society. Published by Elsevier Inc. All rights reserved.
Improving measures of work-related physical functioning.

PubMed

McDonough, Christine M; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E; Marino, Molly E; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M; Chan, Leighton

2017-03-01

To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Newly developed questions were administered to 3532 recent SSA applicants for work disability benefits and 2025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to the existing WD-FAB, and computer-adaptive test simulations were conducted. Factor and IRT analyses supported integration of 44 new items into three existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants; identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability.
Improving Measures of Work-Related Physical Functioning

PubMed Central

McDonough, Christine M.; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E.; Marino, Molly E.; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E.; Jette, Alan M; Chan, Leighton

2016-01-01

Purpose To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration’s (SSA) disability determination process. Methods Newly developed questions were administered to 3,532 recent SSA applicants for work disability benefits and 2,025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to existing WD-FAB, and computer-adaptive test simulations were conducted. Results Factor and IRT analyses supported integration of 44 new items into 3 existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. Conclusions The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants, identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability. PMID:28005243
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods.

PubMed

Diviani, Nicola; Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-04-11

The eHealth Literacy Scale (eHEALS) is a tool to assess consumers' comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers' eHealth literacy. ©Nicola Diviani, Alexandra Lelia Dima, Peter Johannes Schulz. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.04.2017.
Grooming a CAT: customizing CAT administration rules to increase response efficiency in specific research and clinical settings.

PubMed

Kallen, Michael A; Cook, Karon F; Amtmann, Dagmar; Knowlton, Elizabeth; Gershon, Richard C

2018-05-05

To evaluate the degree to which applying alternative stopping rules would reduce response burden while maintaining score precision in the context of computer adaptive testing (CAT). Analyses were conducted on secondary data comprised of CATs administered in a clinical setting at multiple time points (baseline and up to two follow ups) to 417 study participants who had back pain (51.3%) and/or depression (47.0%). Participant mean age was 51.3 years (SD = 17.2) and ranged from 18 to 86. Participants tended to be white (84.7%), relatively well educated (77% with at least some college), female (63.9%), and married or living in a committed relationship (57.4%). The unit of analysis was individual assessment histories (i.e., CAT item response histories) from the parent study. Data were first aggregated across all individuals, domains, and time points in an omnibus dataset of assessment histories and then were disaggregated by measure for domain-specific analyses. Finally, assessment histories within a "clinically relevant range" (score ≥ 1 SD from the mean in direction of poorer health) were analyzed separately to explore score level-specific findings. Two different sets of CAT administration rules were compared. The original CAT (CAT ORIG ) rules required at least four and no more than 12 items be administered. If the score standard error (SE) reached a value < 3 points (T score metric) before 12 items were administered, the CAT was stopped. We simulated applying alternative stopping rules (CAT ALT ), removing the requirement that a minimum four items be administered, and stopped a CAT if responses to the first two items were both associated with best health, if the SE was < 3, if SE change < 0.1 (T score metric), or if 12 items were administered. We then compared score fidelity and response burden, defined as number of items administered, between CAT ORIG and CAT ALT . CAT ORIG and CAT ALT scores varied little, especially within the clinically relevant range, and response burden was substantially lower under CAT ALT (e.g., 41.2% savings in omnibus dataset). Alternate stopping rules result in substantial reductions in response burden with minimal sacrifice in score precision.
Development and Initial Validation of Military Deployment-Related TBI Quality-of-Life Item Banks.

PubMed

Toyinbo, Peter A; Vanderploeg, Rodney D; Donnell, Alison J; Mutolo, Sandra A; Cook, Karon F; Kisala, Pamela A; Tulsky, David S

2016-01-01

To investigate unique factors that affect health-related quality of life (QOL) in individuals with military deployment-related traumatic brain injury (MDR-TBI) and to develop appropriate assessment tools, consistent with the TBI-QOL/PROMIS/Neuro-QOL systems. Three focus groups from each of the 4 Veterans Administration (VA) Polytrauma Rehabilitation Centers, consisting of 20 veterans with mild to severe MDR-TBI, and 36 VA providers were involved in early stage of new item banks development. The item banks were field tested in a sample (N = 485) of veterans enrolled in VA and diagnosed with an MDR-TBI. Focus groups and survey. Developed item banks and short forms for Guilt, Posttraumatic Stress Disorder/Trauma, and Military-Related Loss. Three new item banks representing unique domains of MDR-TBI health outcomes were created: 15 new Posttraumatic Stress Disorder items plus 16 SCI-QOL legacy Trauma items, 37 new Military-Related Loss items plus 18 TBI-QOL legacy Grief/Loss items, and 33 new Guilt items. Exploratory and confirmatory factor analyses plus bifactor analysis of the items supported sufficient unidimensionality of the new item pools. Convergent and discriminant analyses results, as well as known group comparisons, provided initial support for the validity and clinical utility of the new item response theory-calibrated item banks and their short forms. This work provides a unique opportunity to identify issues specific to individuals with MDR-TBI and ensure that they are captured in QOL assessment, thus extending the existing TBI-QOL measurement system.
Solving the measurement invariance anchor item problem in item response theory.

PubMed

Meade, Adam W; Wright, Natalie A

2012-09-01

The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.
Two objective measures of self-esteem.

PubMed

Lorr, M; Wunderlich, R A

1986-01-01

Two scales were constructed to assess self-esteem, conceptualized as reflecting (a) feelings of competence and efficacy, and (b) perceived positive appraisal from significant others. To control for response bias a paired choice format was chosen for the items constructed. A buffer scale designed to measure social assertiveness was also included. Data were collected on three samples of high school boys. The item intercorrelations were subjected to principal component analyses followed by Varimax rotations. In each of the three analyses factors of Confidence, Popularity (Social Approval), and Social Assertiveness emerged. The revised self-esteem scales, each defined by 11 items, have been shown to have acceptable reliability and some concurrent validity based on correlations with the well-known Rosenberg Self-Esteem Scale.
Response pattern of depressive symptoms among college students: What lies behind items of the Beck Depression Inventory-II?

PubMed

de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang

2018-07-01

This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
What can we learn from PISA?: Investigating PISA's approach to scientific literacy

NASA Astrophysics Data System (ADS)

Schwab, Cheryl Jean

This dissertation is an investigation of the relationship between the multidimensional conception of scientific literacy and its assessment. The Programme for International Student Assessment (PISA), developed under the auspices of the Organization for Economic Cooperation and Development (OECD), offers a unique opportunity to evaluate the assessment of scientific literacy. PISA developed a continuum of performance for scientific literacy across three competencies (i.e., process, content, and situation). Foundational to the interpretation of PISA science assessment is PISA's definition of scientific literacy, which I argue incorporates three themes drawn from history: (a) scientific way of thinking, (b) everyday relevance of science, and (c) scientific literacy for all students. Three coordinated studies were conducted to investigate the validity of PISA science assessment and offer insight into the development of items to assess scientific 2 literacy. Multidimensional models of the internal structure of the PISA 2003 science items were found not to reflect the complex character of PISA's definition of scientific literacy. Although the multidimensional models across the three competencies significantly decreased the G2 statistic from the unidimensional model, high correlations between the dimensions suggest that the dimensions are similar. A cognitive analysis of student verbal responses to PISA science items revealed that students were using competencies of scientific literacy, but the competencies were not elicited by the PISA science items at the depth required by PISA's definition of scientific literacy. Although student responses contained only knowledge of scientific facts and simple scientific concepts, students were using more complex skills to interpret and communicate their responses. Finally the investigation of different scoring approaches and item response models illustrated different ways to interpret student responses to assessment items. These analyses highlighted the complexities of students' responses to the PISA science items and the use of the ordered partition model to accommodate different but equal item responses. The results of the three investigations are used to discuss ways to improve the development and interpretation of PISA's science items.
Development of a Self-Determination Measure for College Students: Validity Evidence for the Basic Needs Satisfaction at College Scale

ERIC Educational Resources Information Center

Jenkins-Guarnieri, Michael A.; Vaughan, Angela L.; Wright, Stephen L.

2015-01-01

We adapted a work self-determination measure to create the Basic Needs Satisfaction at College Scale. Confirmatory factor analysis and item response theory analyses with data from 525 adults supported a 3-factor model with 13 items most sensitive for lower to middle range levels of the autonomy, competence, and relatedness constructs.
Relationship between Measures of Working Memory Capacity and the Time Course of Short-Term Memory Retrieval and Interference Resolution

ERIC Educational Resources Information Center

Oztekin, Ilke; McElree, Brian

2010-01-01

The response-signal speed-accuracy trade-off (SAT) procedure was used to investigate the relationship between measures of working memory capacity and the time course of short-term item recognition. High- and low-span participants studied sequentially presented 6-item lists, immediately followed by a recognition probe. Analyses of composite list…
Item Response Theory Modeling and Categorical Regression Analyses of the Five-Factor Model Rating Form: A Study on Italian Community-Dwelling Adolescent Participants and Adult Participants.

PubMed

Fossati, Andrea; Widiger, Thomas A; Borroni, Serena; Maffei, Cesare; Somma, Antonella

2017-06-01

To extend the evidence on the reliability and construct validity of the Five-Factor Model Rating Form (FFMRF) in its self-report version, two independent samples of Italian participants, which were composed of 510 adolescent high school students and 457 community-dwelling adults, respectively, were administered the FFMRF in its Italian translation. Adolescent participants were also administered the Italian translation of the Borderline Personality Features Scale for Children-11 (BPFSC-11), whereas adult participants were administered the Italian translation of the Triarchic Psychopathy Measure (TriPM). Cronbach α values were consistent with previous findings; in both samples, average interitem r values indicated acceptable internal consistency for all FFMRF scales. A multidimensional graded item response theory model indicated that the majority of FFMRF items had adequate discrimination parameters; information indices supported the reliability of the FFMRF scales. Both categorical (i.e., item-level) and scale-level regression analyses suggested that the FFMRF scores may predict a nonnegligible amount of variance in the BPFSC-11 total score in adolescent participants, and in the TriPM scale scores in adult participants.

A cost-effective method to characterize variation in clinical practice.

PubMed

Chang, K; Sauereisen, S; Dlutowski, M; Veloski, J J; Nash, D B

1999-06-01

This study's objective was to measure variation in physicians' practice styles and policies. Family physicians and general internists were surveyed about evidence-based medicine in the areas of asthma, congestive heart failure, and diabetes mellitus. They were asked about clinical recommendations where standards of practice were uncertain, controversial, or changing in response to published guidelines. Also included were items dealing with managed care. Although there was wide variation in responses to 20 of 36 items, some responses were consistent with practice guidelines. Responses to several items indicated a tendency to overuse expensive tests. Overall, the results indicate that a brief, open-ended survey can assess practice variation quickly and economically, as contrasted with more expensive analyses of medical records or claims data. With proper validation such assessments can be used as baselines to guide interventions, as well as measures of the outcomes of these interventions to change practice styles.
Responding to nonwords in the lexical decision task: Insights from the English Lexicon Project.

PubMed

Yap, Melvin J; Sibley, Daragh E; Balota, David A; Ratcliff, Roger; Rueckl, Jay

2015-05-01

Researchers have extensively documented how various statistical properties of words (e.g., word frequency) influence lexical processing. However, the impact of lexical variables on nonword decision-making performance is less clear. This gap is surprising, because a better specification of the mechanisms driving nonword responses may provide valuable insights into early lexical processes. In the present study, item-level and participant-level analyses were conducted on the trial-level lexical decision data for almost 37,000 nonwords in the English Lexicon Project in order to identify the influence of different psycholinguistic variables on nonword lexical decision performance and to explore individual differences in how participants respond to nonwords. Item-level regression analyses reveal that nonword response time was positively correlated with number of letters, number of orthographic neighbors, number of affixes, and base-word number of syllables, and negatively correlated with Levenshtein orthographic distance and base-word frequency. Participant-level analyses also point to within- and between-session stability in nonword responses across distinct sets of items, and intriguingly reveal that higher vocabulary knowledge is associated with less sensitivity to some dimensions (e.g., number of letters) but more sensitivity to others (e.g., base-word frequency). The present findings provide well-specified and interesting new constraints for informing models of word recognition and lexical decision. (c) 2015 APA, all rights reserved).
Evaluation of measurement equivalence of the Family Satisfaction with the End-of-Life Care in an ethnically diverse cohort: Tests of differential item functioning

PubMed Central

Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert

2016-01-01

Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education. PMID:25160692
The Development of a Multiple-Item Annoyance Scale (MIAS) for Transportation Noise Annoyance

PubMed Central

Belke, Christin; Spilski, Jan

2018-01-01

In 2001, Team#6 of the International Commission on Biological Effects of Noise (ICBEN) recommended the use of two single international standardised questions and response scales. This recommendation has been widely accepted in the scientific community. Nevertheless, annoyance can be regarded as a multidimensional construct comprising the three elements: (1) experience of an often repeated noise-related disturbance and the behavioural response to cope with it, (2) an emotional/attitudinal response to the sound and its disturbing impact, and (3) the perceived control or coping capacity with regard to the noise situation. The psychometric properties of items reflecting these three elements have been explored for aircraft noise annoyance. Analyses were conducted using data of the NORAH-Study (Noise-Related Annoyance, Cognition, and Health), and a multi-item noise annoyance scale (MIAS) has been developed and tested post hoc by using a stepwise process (exploratory and confirmatory factor analyses). Preliminary results were presented to the 12th ICBEN Congress in 2017. In this study, the validation of MIAS is done for aircraft noise and extended to railway and road traffic noise. The results largely confirm the concept of MIAS as a second-order construct of annoyance for all of the investigated transportation noise sources; however, improvements can be made, in particular with regard to items addressing the perceived coping capacity. PMID:29757228
Item response theory analysis of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised in the Pooled Resource Open-Access ALS Clinical Trials Database.

PubMed

Bacci, Elizabeth D; Staniewska, Dorota; Coyne, Karin S; Boyer, Stacey; White, Leigh Ann; Zach, Neta; Cedarbaum, Jesse M

2016-01-01

Our objective was to examine dimensionality and item-level performance of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) across time using classical and modern test theory approaches. Confirmatory factor analysis (CFA) and Item Response Theory (IRT) analyses were conducted using data from patients with amyotrophic lateral sclerosis (ALS) Pooled Resources Open-Access ALS Clinical Trials (PRO-ACT) database with complete ALSFRS-R data (n = 888) at three time-points (Time 0, Time 1 (6-months), Time 2 (1-year)). Results demonstrated that in this population of 888 patients, mean age was 54.6 years, 64.4% were male, and 93.7% were Caucasian. The CFA supported a 4* individual-domain structure (bulbar, gross motor, fine motor, and respiratory domains). IRT analysis within each domain revealed misfitting items and overlapping item response category thresholds at all time-points, particularly in the gross motor and respiratory domain items. Results indicate that many of the items of the ALSFRS-R may sub-optimally distinguish among varying levels of disability assessed by each domain, particularly in patients with less severe disability. Measure performance improved across time as patient disability severity increased. In conclusion, modifications to select ALSFRS-R items may improve the instrument's specificity to disability level and sensitivity to treatment effects.
Detecting Test Tampering Using Item Response Theory

ERIC Educational Resources Information Center

Wollack, James A.; Cohen, Allan S.; Eckerly, Carol A.

2015-01-01

Test tampering, especially on tests for educational accountability, is an unfortunate reality, necessitating that the state (or its testing vendor) perform data forensic analyses, such as erasure analyses, to look for signs of possible malfeasance. Few statistical approaches exist for detecting fraudulent erasures, and those that do largely do not…
Response Time Differences between Computers and Tablets

ERIC Educational Resources Information Center

Kong, Xiaojing; Davis, Laurie Laughlin; McBride, Yuanyuan; Morrison, Kristin

2018-01-01

Item response time data were used in investigating the differences in student test-taking behavior between two device conditions: computer and tablet. Analyses were conducted to address the questions of whether or not the device condition had a differential impact on rapid guessing and solution behaviors (with response time effort used as an…
How We Know It Hurts: Item Analysis of Written Narratives Reveals Distinct Neural Responses to Others' Physical Pain and Emotional Suffering

PubMed Central

Bruneau, Emile; Dufour, Nicholas; Saxe, Rebecca

2013-01-01

People are often called upon to witness, and to empathize with, the pain and suffering of others. In the current study, we directly compared neural responses to others' physical pain and emotional suffering by presenting participants (n = 41) with 96 verbal stories, each describing a protagonist's physical and/or emotional experience, ranging from neutral to extremely negative. A separate group of participants rated “how much physical pain”, and “how much emotional suffering” the protagonist experienced in each story, as well as how “vivid and movie-like” the story was. Although ratings of Pain, Suffering and Vividness were positively correlated with each other across stories, item-analyses revealed that each scale was correlated with activity in distinct brain regions. Even within regions of the “Shared Pain network” identified using a separate data set, responses to others' physical pain and emotional suffering were distinct. More broadly, item analyses with continuous predictors provided a high-powered method for identifying brain regions associated with specific aspects of complex stimuli – like verbal descriptions of physical and emotional events. PMID:23638181
A signal detection-item response theory model for evaluating neuropsychological measures.

PubMed

Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

2018-02-05

Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
Assessing Hopelessness in Terminally Ill Cancer Patients: Development of the Hopelessness Assessment in Illness Questionnaire

PubMed Central

Rosenfeld, Barry; Pessin, Hayley; Lewis, Charles; Abbey, Jennifer; Olden, Megan; Sachs, Emily; Amakawa, Lia; Kolva, Elissa; Brescia, Robert; Breitbart, William

2013-01-01

Hopelessness has become an increasingly important construct in palliative care research, yet concerns exist regarding the utility of existing measures when applied to patients with a terminal illness. This article describes a series of studies focused on the exploration, development, and analysis of a measure of hopelessness specifically intended for use with terminally ill cancer patients. The 1st stage of measure development involved interviews with 13 palliative care experts and 30 terminally ill patients. Qualitative analysis of the patient interviews culminated in the development of a set of potential questionnaire items. In the 2nd study phase, we evaluated these preliminary items with a sample of 314 participants, using item response theory and classical test theory to identify optimal items and response format. These analyses generated an 8-item measure that we tested in a final study phase, using a 3rd sample (n = 228) to assess reliability and concurrent validity. These analyses demonstrated strong support for the Hopelessness Assessment in Illness Questionnaire providing greater explanatory power than existing measures of hopelessness and found little evidence that this assessment was confounded by illness-related variables (e.g., prognosis). In summary, these 3 studies suggest that this brief measure of hopelessness is particularly useful for palliative care settings. Further research is needed to assess the applicability of the measure to other populations and contexts. PMID:21443366
A Comparison of Three IRT Approaches to Examinee Ability Change Modeling in a Single-Group Anchor Test Design

ERIC Educational Resources Information Center

Paek, Insu; Park, Hyun-Jeong; Cai, Li; Chi, Eunlim

2014-01-01

Typically a longitudinal growth modeling based on item response theory (IRT) requires repeated measures data from a single group with the same test design. If operational or item exposure problems are present, the same test may not be employed to collect data for longitudinal analyses and tests at multiple time points are constructed with unique…
Development of a cross-cultural item bank for measuring quality of life related to mental health in multiple sclerosis patients.

PubMed

Michel, Pierre; Auquier, Pascal; Baumstarck, Karine; Pelletier, Jean; Loundou, Anderson; Ghattas, Badih; Boyer, Laurent

2015-09-01

Quality of life (QoL) measurements are considered important outcome measures both for research on multiple sclerosis (MS) and in clinical practice. Computerized adaptive testing (CAT) can improve the precision of measurements made using QoL instruments while reducing the burden of testing on patients. Moreover, a cross-cultural approach is also necessary to guarantee the wide applicability of CAT. The aim of this preliminary study was to develop a calibrated item bank that is available in multiple languages and measures QoL related to mental health by combining one generic (SF-36) and one disease-specific questionnaire (MusiQoL). Patients with MS were enrolled in this international, multicenter, cross-sectional study. The psychometric properties of the item bank were based on classical test and item response theories and approaches, including the evaluation of unidimensionality, item response theory model fitting, and analyses of differential item functioning (DIF). Convergent and discriminant validities of the item bank were examined according to socio-demographic, clinical, and QoL features. A total of 1992 patients with MS and from 15 countries were enrolled in this study to calibrate the 22-item bank developed in this study. The strict monotonicity of the Cronbach's alpha curve, the high eigenvalue ratio estimator (5.50), and the adequate CFA model fit (RMSEA = 0.07 and CFI = 0.95) indicated that a strong assumption of unidimensionality was warranted. The infit mean square statistic ranged from 0.76 to 1.27, indicating a satisfactory item fit. DIF analyses revealed no item biases across geographical areas, confirming the cross-cultural equivalence of the item bank. External validity testing revealed that the item bank scores correlated significantly with QoL scores but also showed discriminant validity for socio-demographic and clinical characteristics. This work demonstrated satisfactory psychometric characteristics for a QoL item bank for MS in multiple languages. This work may offer a common measure for the assessment of QoL in different cultural contexts and for international studies conducted on MS.
Integrating competing dimensional models of personality: linking the SNAP, TCI, and NEO using Item Response Theory.

PubMed

Stepp, Stephanie D; Yu, Lan; Miller, Joshua D; Hallquist, Michael N; Trull, Timothy J; Pilkonis, Paul A

2012-04-01

Mounting evidence suggests that several inventories assessing both normal personality and personality disorders measure common dimensional personality traits (i.e., Antagonism, Constraint, Emotional Instability, Extraversion, and Unconventionality), albeit providing unique information along the underlying trait continuum. We used Widiger and Simonsen's (2005) pantheoretical integrative model of dimensional personality assessment as a guide to create item pools. We then used Item Response Theory (IRT) to compare the assessment of these five personality traits across three established dimensional measures of personality: the Schedule for Nonadaptive and Adaptive Personality (SNAP), the Temperament and Character Inventory (TCI), and the Revised NEO Personality Inventory (NEO PI-R). We found that items from each inventory map onto these five common personality traits in predictable ways. The IRT analyses, however, documented considerable variability in the item and test information derived from each inventory. Our findings support the notion that the integration of multiple perspectives will provide greater information about personality while minimizing the weaknesses of any single instrument.
Item response theory detects differential item functioning between healthy and ill children in QoL measures

PubMed Central

Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.

2008-01-01

Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750
Integrating Competing Dimensional Models of Personality: Linking the SNAP, TCI, and NEO Using Item Response Theory

PubMed Central

Stepp, Stephanie D.; Yu, Lan; Miller, Joshua D.; Hallquist, Michael N.; Trull, Timothy J.; Pilkonis, Paul A.

2013-01-01

Mounting evidence suggests that several inventories assessing both normal personality and personality disorders measure common dimensional personality traits (i.e., Antagonism, Constraint, Emotional Instability, Extraversion, and Unconventionality), albeit providing unique information along the underlying trait continuum. We used Widiger and Simonsen’s (2005) pantheoretical integrative model of dimensional personality assessment as a guide to create item pools. We then used Item Response Theory (IRT) to compare the assessment of these five personality traits across three established dimensional measures of personality: the Schedule for Nonadaptive and Adaptive Personality (SNAP), the Temperament and Character Inventory (TCI), and the Revised NEO Personality Inventory (NEO PI-R). We found that items from each inventory map onto these five common personality traits in predictable ways. The IRT analyses, however, documented considerable variability in the item and test information derived from each inventory. Our findings support the notion that the integration of multiple perspectives will provide greater information about personality while minimizing the weaknesses of any single instrument. PMID:22452759
Applicability to Youth Sports of the Leadership Scale for Sports.

ERIC Educational Resources Information Center

Chelladurai, P.; Carron, Albert V.

1981-01-01

Item analyses of the responses of 54 high school wrestlers and 193 high school basketball players to the Leadership Scale for Sports support the instrument's applicability in high school sports. The scale taps highly similar response dimensions in varsity and high school athletes. (Author)
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form.

PubMed

Kisala, Pamela A; Tulsky, David S; Choi, Seung W; Kirshblum, Steven C

2015-05-01

To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Pressure Ulcers scale. 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item "short form" and is available for both research and clinical practice.
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.

PubMed

McCabe, Erin; Gross, Douglas P; Bulut, Okan

2018-06-07

The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Measuring emotion socialization in families affected by pediatric cancer: Refinement and reduction of the Parents' Beliefs about Children's Emotions questionnaire.

PubMed

Beitra, Danette; El-Behadli, Ana F; Faith, Melissa A

2018-01-01

The aim of this study is to conduct a multimethod psychometric reduction in the Parents' Beliefs about Children's Emotions (PBCE) questionnaire using an item response theory framework with a pediatric oncology sample. Participants were 216 pediatric oncology caregivers who completed the PBCE. The PBCE contains 105 items (11 subscales) rated on a 6-point Likert-type scale. We evaluated the PBCE subscale performance by applying a partial credit model in WINSTEPS. Sixty-six statistically weak items were removed, creating a 44-item PBCE questionnaire with 10 subscales and 3 response options per item. The refined scale displayed good psychometric properties and correlated .910 with the original PBCE. Additional analyses examined dimensionality, item-level (e.g. difficulty), and person-level (e.g. ethnicity) characteristics. The refined PBCE questionnaire provides better test information, improves instrument reliability, and reduces burden on families, providers, and researchers. With this improved measure, providers can more easily identify families who may benefit from psychosocial interventions targeting emotion socialization. The results of the multistep approach presented should be considered preliminary, given the limited sample size.
Perceptual chunking and its effect on memory in speech processing: ERP and behavioral evidence

PubMed Central

Gilbert, Annie C.; Boucher, Victor J.; Jemel, Boutheina

2014-01-01

We examined how perceptual chunks of varying size in utterances can influence immediate memory of heard items (monosyllabic words). Using behavioral measures and event-related potentials (N400) we evaluated the quality of the memory trace for targets taken from perceived temporal groups (TGs) of three and four items. Variations in the amplitude of the N400 showed a better memory trace for items presented in TGs of three compared to those in groups of four. Analyses of behavioral responses along with P300 components also revealed effects of chunk position in the utterance. This is the first study to measure the online effects of perceptual chunks on the memory trace of spoken items. Taken together, the N400 and P300 responses demonstrate that the perceptual chunking of speech facilitates information buffering and a processing on a chunk-by-chunk basis. PMID:24678304

Perceptual chunking and its effect on memory in speech processing: ERP and behavioral evidence.

PubMed

Gilbert, Annie C; Boucher, Victor J; Jemel, Boutheina

2014-01-01

We examined how perceptual chunks of varying size in utterances can influence immediate memory of heard items (monosyllabic words). Using behavioral measures and event-related potentials (N400) we evaluated the quality of the memory trace for targets taken from perceived temporal groups (TGs) of three and four items. Variations in the amplitude of the N400 showed a better memory trace for items presented in TGs of three compared to those in groups of four. Analyses of behavioral responses along with P300 components also revealed effects of chunk position in the utterance. This is the first study to measure the online effects of perceptual chunks on the memory trace of spoken items. Taken together, the N400 and P300 responses demonstrate that the perceptual chunking of speech facilitates information buffering and a processing on a chunk-by-chunk basis.
Item response theory analysis of the Lichtenberg Financial Decision Screening Scale.

PubMed

Teresi, Jeanne A; Ocepek-Welikson, Katja; Lichtenberg, Peter A

2017-01-01

The focus of these analyses was to examine the psychometric properties of the Lichtenberg Financial Decision Screening Scale (LFDSS). The purpose of the screen was to evaluate the decisional abilities and vulnerability to exploitation of older adults. Adults aged 60 and over were interviewed by social, legal, financial, or health services professionals who underwent in-person training on the administration and scoring of the scale. Professionals provided a rating of the decision-making abilities of the older adult. The analytic sample included 213 individuals with an average age of 76.9 (SD = 10.1). The majority (57%) were female. Data were analyzed using item response theory (IRT) methodology. The results supported the unidimensionality of the item set. Several IRT models were tested. Ten ordinal and binary items evidenced a slightly higher reliability estimate (0.85) than other versions and better coverage in terms of the range of reliable measurement across the continuum of financial incapacity.
Development and validation of a socioculturally competent trust in physician scale for a developing country setting.

PubMed

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-05-03

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. To develop and validate a new trust in physician scale for a developing country setting. Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. The final 12 item trust in physician scale has a good construct validity and internal consistency. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Development and validation of a socioculturally competent trust in physician scale for a developing country setting

PubMed Central

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-01-01

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. Objectives To develop and validate a new trust in physician scale for a developing country setting. Methods Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Results Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. Conclusions The final 12 item trust in physician scale has a good construct validity and internal consistency. PMID:25941182
Mokken scaling of the Myocardial Infarction Dimensional Assessment Scale (MIDAS).

PubMed

Thompson, David R; Watson, Roger

2011-02-01

The purpose of this study was to examine the hierarchical and cumulative nature of the 35 items of the Myocardial Infarction Dimensional Assessment Scale (MIDAS), a disease-specific health-related quality of life measure. Data from 668 participants who completed the MIDAS were analysed using the Mokken Scaling Procedure, which is a computer program that searches polychotomous data for hierarchical and cumulative scales on the basis of a range of diagnostic criteria. Fourteen MIDAS items were retained in a Mokken scale and these items included physical activity, insecurity, emotional reaction and dependency items but excluded items related to diet, medication or side-effects. Item difficulty, in item response theory terms, ran from physical activity items (low difficulty) to insecurity, suggesting that the most severe quality of life effect of myocardial infarction is loneliness and isolation. Items from the MIDAS form a strong and reliable Mokken scale, which provides new insight into the relationship between items in the MIDAS and the measurement of quality of life after myocardial infarction. © 2010 Blackwell Publishing Ltd.
Factor structure and gender stability in the multidimensional condom attitudes scale.

PubMed

Starosta, Amy J; Berghoff, Christopher R; Earleywine, Mitch

2015-06-01

Sexually transmitted infections continue to trouble the United States and can be attenuated through increased condom use. Attitudes about condoms are an important multidimensional factor that can affect sexual health choices and have been successfully measured using the Multidimensional Condom Attitudes Scale (MCAS). Such attitudes have the potential to vary between men and women, yet little work has been undertaken to identify if the MCAS accurately captures attitudes without being influenced by underlying gender biases. We examined the factor structure and gender invariance on the MCAS using confirmatory factor analysis and item response theory, within-subscale differential item functioning analyses. More than 770 participants provided data via the Internet. Results of differential item functioning analyses identified three items as differentially functioning between the genders, and removal of these items is recommended. Findings confirmed the previously hypothesized multidimensional nature of condom attitudes and the five-factor structure of the MCAS even after the removal of the three problematic items. In general, comparisons across genders using the MCAS seem reasonable from a methodological standpoint. Results are discussed in terms of improving sexual health research and interventions. © The Author(s) 2014.
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

PubMed Central

Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

PubMed

Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
A Shortened Version of the Suicide Cognitions Scale for Identifying Chronic Pain Patients at Risk for Suicide.

PubMed

Bryan, Craig J; Kanzler, Kathryn E; Grieser, Emily; Martinez, Annette; Allison, Sybil; McGeary, Donald

2017-03-01

Research in psychiatric outpatient and inpatient populations supports the utility of the Suicide Cognitions Scale (SCS) as an indicator of current and future risk for suicidal thoughts and behaviors. Designed to assess suicide-specific thoughts and beliefs, the SCS has yet to be evaluated among chronic pain patients, a group with elevated risk for suicide. The purpose of the present study was to develop and test a shortened version of the SCS (the SCS-S). A total of 228 chronic pain patients completed a battery of self-report surveys before or after a scheduled appointment. Three outpatient medical clinics (pain medicine, orofacial pain, and clinical health psychology). Confirmatory factor analysis, multivariate regression, and graded item response theory model analyses. Results of the CFAs suggested that a 3-factor solution was optimal. A shortened 9-item scale was identified based on the results of graded item response theory model analyses. Correlation and multivariate analyses supported the construct and incremental validity of the SCS-S. Results support the reliability and validity of the SCS-S among chronic pain patients, and suggest the scale may be a useful method for identifying high-risk patients in medical settings. © 2016 World Institute of Pain.
Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI)

PubMed Central

Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur

2016-01-01

We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non–expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI’s robustness and sensitivity in capturing useful data relating to the students’ conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. PMID:26903497
Can contingency learning alone account for item-specific control? Evidence from within- and between-language ISPC effects.

PubMed

Atalay, Nart Bedin; Misirlisoy, Mine

2012-11-01

The item-specific proportion congruence (ISPC) manipulation (Jacoby, Lindsay, & Hessels, 2003) produces larger Stroop interference for mostly congruent items than mostly incongruent items. This effect has been attributed to dynamic control over word-reading processes. However, proportion congruence of an item in the ISPC manipulation is completely confounded with response contingency, suggesting the alternative hypothesis, that the ISPC effect is a result of learning response contingencies (Schmidt & Besner, 2008). The current study asks whether the ISPC effect can be explained by a pure stimulus-response contingency-learning account, or whether other control processes play a role as well, by comparing within- and between-language conditions in a bilingual task. Experiment 1 showed that contingency learning for noncolor words was larger for the within-language than the between-language condition. Experiment 2 revealed significant ISPC effects for both within- and between-language conditions; importantly, the effect was larger in the former. The results of the contingency analyses for Experiment 2 were parallel to that of Experiment 1 and did not show an interaction between contingency and congruency. Put together, these sets of results support the view that contingency-learning processes dominate color-word ISPC effects.
The Swiss Health Literacy Survey: development and psychometric properties of a multidimensional instrument to assess competencies for health

PubMed Central

Wang, Jen; Thombs, Brett D.; Schmid, Margareta R.

2012-01-01

Abstract Background Growing recognition of the role of citizens and patients in health and health care has placed a spotlight on health literacy and patient education. Objective To identify specific competencies for health in definitions of health literacy and patient‐centred concepts and empirically test their dimensionality in the general population. Methods A thorough review of the literature on health literacy, self‐management, patient empowerment, patient education and shared decision making revealed considerable conceptual overlap as competencies for health and identified a corpus of 30 generic competencies for health. A questionnaire containing 127 items covering the 30 competencies was fielded as a telephone interview in German, French and Italian among 1255 respondents randomly selected from the resident population in Switzerland. Findings Analyses with the software MPlus to model items with mixed response categories showed that the items do not load onto a single factor. Multifactorial models with good fit could be erected for each of five dimensions defined a priori and their corresponding competencies: information and knowledge (four competencies, 17 items), general cognitive skills (four competencies, 17 items), social roles (two competencies, seven items), medical management (four competencies, 27 items) and healthy lifestyle (two competencies, six items). Multiple indicators and multiple causes models identified problematic differential item functioning for only six items belonging to two competencies. Conclusions The psychometric analyses of this instrument support broader conceptualization of health literacy not as a single competence but rather as a package of competencies for health. PMID:22390287
[Cross-cultural adaptation and validation of the PROMIS Global Health scale in the Portuguese language].

PubMed

Zumpano, Camila Eugênia; Mendonça, Tânia Maria da Silva; Silva, Carlos Henrique Martins da; Correia, Helena; Arnold, Benjamin; Pinto, Rogério de Melo Costa

2017-01-23

This study aimed to perform the cross-cultural adaptation and validation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Global Health scale in the Portuguese language. The ten Global Health items were cross-culturally adapted by the method proposed in the Functional Assessment of Chronic Illness Therapy (FACIT). The instrument's final version in Portuguese was self-administered by 1,010 participants in Brazil. The scale's precision was verified by floor and ceiling effects analysis, reliability of internal consistency, and test-retest reliability. Exploratory and confirmatory factor analyses were used to assess the construct's validity and instrument's dimensionality. Calibration of the items used the Gradual Response Model proposed by Samejima. Four global items required adjustments after the pretest. Analysis of the psychometric properties showed that the Global Health scale has good reliability, with Cronbach's alpha of 0.83 and intra-class correlation of 0.89. Exploratory and confirmatory factor analyses showed good fit in the previously established two-dimensional model. The Global Physical Health and Global Mental Health scale showed good latent trait coverage according to the Gradual Response Model. The PROMIS Global Health items showed equivalence in Portuguese compared to the original version and satisfactory psychometric properties for application in clinical practice and research in the Brazilian population.
Developing an item bank and short forms that assess the impact of asthma on quality of life.

PubMed

Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena

2014-02-01

The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. Copyright © 2013 Elsevier Ltd. All rights reserved.
Toward DSM-V: an item response theory analysis of the diagnostic process for DSM-IV alcohol abuse and dependence in adolescents.

PubMed

Gelhorn, Heather; Hartman, Christie; Sakai, Joseph; Stallings, Michael; Young, Susan; Rhee, Soo Hyun; Corley, Robin; Hewitt, John; Hopfer, Christian; Crowley, Thomas

2008-11-01

Item response theory analyses were used to examine alcohol abuse and dependence symptoms and diagnoses in adolescents. Previous research suggests that the DSM-IV alcohol use disorder (AUD) symptoms in adolescents may be characterized by a single dimension. The present study extends prior research with a larger and more comprehensive sample and an examination of an alternative diagnostic algorithm for AUDs. Approximately 5,587 adolescents between the ages of 12 and 18 years from adjudicated, clinical, and community samples were administered structured clinical interviews. Analyses were conducted to examine the severity of alcohol abuse and dependence symptoms and the severity of alcohol use problems (AUDs) within the diagnostic categories created by the DSM-IV. Although the DSM-IV diagnostic categories differ in severity of AUDs, there is substantial overlap and inconsistency in AUD severity of persons across these categories. Item Response Theory-based AUD severity estimates suggest that many persons diagnosed with abuse have AUD severity greater than persons with dependence. Similarly, many persons who endorse some symptoms but do not qualify for a diagnosis (i.e., diagnostic orphans) have more severe AUDs than persons with an abuse diagnosis. Additionally, two dependence items, "tolerance" and "larger/longer," show differences in severity between samples. The distinction between DSM-IV abuse and dependence based on severity can be improved using an alternative diagnostic algorithm that considers all of the alcohol abuse and dependence symptoms conjointly.
Measuring grief and loss after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Grief and Loss item bank and short form

PubMed Central

Kalpakjian, Claire Z.; Tulsky, David S.; Kisala, Pamela A.; Bombardier, Charles H.

2015-01-01

Objective To develop an item response theory (IRT) calibrated Grief and Loss item bank as part of the Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system. Design A literature review guided framework development of grief/loss. New items were created from focus groups. Items were revised based on expert review and patient feedback and were then field tested. Analyses included confirmatory factor analysis (CFA), graded response IRT modeling and evaluation of differential item functioning (DIF). Setting We tested a 20-item pool at several rehabilitation centers across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI answered the grief and loss questions. Results The final calibrated item bank resulted in 17 retained items. A unidimensional model was observed (CFI = 0.976; RMSEA = 0.078) and measurement precision was good (theta range between −1.48 to 2.48). Ten items were flagged for DIF, however, after examination of effect sizes found this to be negligible with little practical impact on score estimates. Conclusions This study indicates that the SCI-QOL Grief and Loss item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010969
A Psychometric Evaluation of the DSM-IV Criteria for Antisocial Personality Disorder: Dimensionality, Local Reliability, and Differential Item Functioning Across Gender.

PubMed

Paap, Muirne C S; Braeken, Johan; Pedersen, Geir; Urnes, Øyvind; Karterud, Sigmund; Wilberg, Theresa; Hummelen, Benjamin

2017-12-01

This study aims at evaluating the psychometric properties of the antisocial personality disorder (ASPD) criteria in a large sample of patients, most of whom had one or more personality disorders (PD). PD diagnoses were assessed by experienced clinicians using the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Axis II PDs. Analyses were performed within an item response theory framework. Results of the analyses indicated that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. Differential item functioning across gender was restricted to two criteria and had little impact on the latent ASPD trait level. Patients fulfilling both the adult ASPD criteria and the conduct disorder criteria had similar latent trait distributions as patients fulfilling only the adult ASPD criteria. Overall, the ASPD items fit the purpose of a diagnostic instrument well, that is, distinguishing patients with moderate from those with high antisocial personality scores.
Scale Refinement and Initial Evaluation of a Behavioral Health Function Measurement Tool for Work Disability Evaluation

PubMed Central

Marfeo, Elizabeth E.; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Rasch, Elizabeth K.; Brandt, Diane E.; Jette, Alan M.

2014-01-01

Objectives To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Design Cross-sectional survey followed by item response theory (IRT) calibration data simulations Setting Community Participants A sample of individuals applying for SSA disability benefits, claimants (N=1015), and a normative comparative sample of US adults (N=1000) Interventions None. Main Outcome Measure Social Security Administration Behavioral Health Function (SSA-BH) measurement instrument Results Item response theory analyses supported the unidimensionality of four SSA-BH scales: Mood and Emotions (35 items), Self-Efficacy (23 items), Social Interactions (6 items), and Behavioral Control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10- item CATs with the full item bank indicated robust ability of the CAT approach to comprehensively characterize behavioral health function along four distinct dimensions. Conclusions Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all four scales. Behavioral function profiles of SSA claimants were generated and compared to age and sex matched norms along four scales: Mood and Emotions, Behavioral Control, Social Interactions, and Self-Efficacy. Utilizing the CAT based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the SSA’s work disability programs. PMID:23542404
Development and Validation of the PROMIS Pediatric Sleep Disturbance and Sleep-Related Impairment Item Banks.

PubMed

Forrest, Christopher B; Meltzer, Lisa J; Marcus, Carole L; de la Motte, Anna; Kratchman, Amy; Buysse, Daniel J; Pilkonis, Paul A; Becker, Brandon D; Bevans, Katherine B

2018-03-13

To develop and evaluate the measurement properties of child-report and parent-proxy versions of the PROMIS ® Pediatric Sleep Disturbance and Sleep-Related Impairment item banks. A national sample of 1,104 children (8-17 years-old) and 1,477 parents of children 5-17 years-old was recruited from an internet panel to evaluate the psychometric properties of 43 sleep health items. A convenience sample of children and parents recruited from a pediatric sleep clinic was obtained to provide evidence of the measures' validity; polysomnography data were collected from a subgroup of these children. Factor analyses suggested two dimensions: sleep disturbance and daytime sleep-related impairment. The final item banks included 15 items for Sleep Disturbance and 13 for Sleep-Related Impairment. Items were calibrated using the graded response model from item response theory. Of the 28 items, 16 are included in the parallel PROMIS adult sleep health measures. Reliability of the measures exceeded 0.90. Validity was supported by correlations with existing measures of pediatric sleep health and higher sleep disturbance and sleep-related impairment scores for children with sleep problems and those with chronic and neurodevelopmental disorders. The sleep health measures were not correlated with results from polysomnography. The PROMIS Pediatric Sleep Disturbance and Sleep-Related Impairment item banks provide subjective assessments of a child's difficulties falling and staying asleep as well as daytime sleepiness and its impact on functioning. They may prove useful in the future for clinical research and practice. Future research should evaluate their responsiveness to clinical change in diverse patient populations.
Which kind of psychometrics is adequate for patient satisfaction questionnaires?

PubMed

Konerding, Uwe

2016-01-01

The construction and psychometric analysis of patient satisfaction questionnaires are discussed. The discussion is based upon the classification of multi-item questionnaires into scales or indices. Scales consist of items that describe the effects of the latent psychological variable to be measured, and indices consist of items that describe the causes of this variable. Whether patient satisfaction questionnaires should be constructed and analyzed as scales or as indices depends upon the purpose for which these questionnaires are required. If the final aim is improving care with regard to patients' preferences, then these questionnaires should be constructed and analyzed as indices. This implies two requirements: 1) items for patient satisfaction questionnaires should be selected in such a way that the universe of possible causes of patient satisfaction is covered optimally and 2) Cronbach's alpha, principal component analysis, exploratory factor analysis, confirmatory factor analysis, and analyses with models from item response theory, such as the Rasch Model, should not be applied for psychometric analyses. Instead, multivariate regression analyses with a direct rating of patient satisfaction as the dependent variable and the individual questionnaire items as independent variables should be performed. The coefficients produced by such an analysis can be applied for selecting the best items and for weighting the selected items when a sum score is determined. The lower boundaries of the validity of the unweighted and the weighted sum scores can be estimated by their correlations with the direct satisfaction rating. While the first requirement is fulfilled in the majority of the previous patient satisfaction questionnaires, the second one deviates from previous practice. Hence, if patient satisfaction is actually measured with the final aim of improving care with regard to patients' preferences, then future practice should be changed so that the second requirement is also fulfilled.

A Computer-Adaptive Disability Instrument for Lower Extremity Osteoarthritis Research Demonstrated Promising Breadth, Precision and Reliability

PubMed Central

Jette, Alan M.; McDonough, Christine M.; Haley, Stephen M.; Ni, Pengsheng; Olarsch, Sippy; Latham, Nancy; Hambleton, Ronald K.; Felson, David; Kim, Young-jo; Hunter, David

2012-01-01

Objective To develop and evaluate a prototype measure (OA-DISABILITY-CAT) for osteoarthritis research using Item Response Theory (IRT) and Computer Adaptive Test (CAT) methodologies. Study Design and Setting We constructed an item bank consisting of 33 activities commonly affected by lower extremity (LE) osteoarthritis. A sample of 323 adults with LE osteoarthritis reported their degree of limitation in performing everyday activities and completed the Health Assessment Questionnaire-II (HAQ-II). We used confirmatory factor analyses to assess scale unidimensionality and IRT methods to calibrate the items and examine the fit of the data. Using CAT simulation analyses, we examined the performance of OA-DISABILITY-CATs of different lengths compared to the full item bank and the HAQ-II. Results One distinct disability domain was identified. The 10-item OA-DISABILITY-CAT demonstrated a high degree of accuracy compared with the full item bank (r=0.99). The item bank and the HAQ-II scales covered a similar estimated scoring range. In terms of reliability, 95% of OA-DISABILITY reliability estimates were over 0.83 versus 0.60 for the HAQ-II. Except at the highest scores the 10-item OA-DISABILITY-CAT demonstrated superior precision to the HAQ-II. Conclusion The prototype OA-DISABILITY-CAT demonstrated promising measurement properties compared to the HAQ-II, and is recommended for use in LE osteoarthritis research. PMID:19216052
Development of a tool to assess adherence to a model of the division of responsibility in feeding young children: using response mapping to capacitate validation measures.

PubMed

Lohse, Barbara; Satter, Ellyn; Arnold, Kristen

2014-04-01

Accurate early assessment and targeted intervention with problematic parent/child feeding dynamics is critical for the prevention and treatment of child obesity. The division of responsibility in feeding (sDOR), articulated by the Satter Feeding Dynamics Model (fdSatter), has been demonstrated clinically as an effective approach to reduce child feeding problems, including those leading to obesity. Lack of a tested instrument to examine adherence to fdSatter stimulated initial construction of the Satter Feeding Dynamics Inventory (fdSI). The aim of this project was to refine the item pool to establish translational validity, making the fdSI suitable for advanced psychometric analysis. Cognitive interviews (n = 80) with caregivers of varied socioeconomic strata informed revisions that demonstrated face and content validity. fdSI responses were mapped to interviews using an iterative, multi-phase thematic approach to provide an instrument ready for construct validation. fdSI development required five interview phases over 32 months: Foundational; Refinement; Transitional; Assurance; and Launching. Each phase was associated with item reduction and revision. Thirteen items were removed from the 38-item Foundational phase and seven were revised in the Refinement phase. Revisions, deletions, and additions prompted by Transitional and Assurance phase interviews resulted in the 15-item Launching phase fdSI. Only one Foundational phase item was carried through all development phases, emphasizing the need to test for item comprehension and interpretation before psychometric analyses. Psychometric studies of item pools without encrypted meanings will facilitate progress toward a tool that accurately detects adherence to sDOR. Ability to measure sDOR will facilitate focus on feeding behaviors associated with reduced risk of childhood obesity.
Refining the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) item candidates: interpretation of a self-reported outcome measure of functional performance by young people with neurodevelopmental disabilities.

PubMed

Kramer, Jessica M; Schwartz, Ariel

2017-10-01

This study examined the item interpretability and rating scale use of the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) by young people with developmental disabilities. The PEDI-PRO assesses the functional performance of discrete functional tasks in the context of everyday life situations. A two-phase cognitive interview design was implemented with a convenience sample of 37 young people (mean age 19y, SD 2y 5mo; 13 males and 24 females; 68% with intellectual disability) with developmental disabilities. In phase I, 182 item candidates were each reviewed by an average of four young people. In phase II, 103 items were carried forward or revised and each reviewed by an average of seven additional young people. Two raters coded responses for intended item interpretation and performance quality; codes were analysed using descriptive statistics. Qualitative analysis explored young people's self-evaluation process. Items were interpreted as intended by most young people (mean 86%). Young people can use PEDI-PRO response categories appropriately to describe their performance: 94% of positive performance descriptions coincided with a positive response category choice; 73% of negative descriptions coincided with a negative response category choice. Young people interpreted items in a literal manner, and their self-evaluation incorporated the use of supports that facilitate functional performance. The PEDI-PRO's measurement framework appears to support the self-evaluation of functional performance of young people with developmental disabilities. © 2017 Mac Keith Press.
French Norms for the Harvard Group Scale of Hypnotic Susceptibility, Form A.

PubMed

Anlló, Hernán; Becchio, Jean; Sackur, Jérôme

2017-01-01

The authors present French norms for the Harvard Group Scale of Hypnotic Susceptibility, Form A (HGSHS:A). They administered an adapted translation of Shor and Orne's original text (1962) to a group of 126 paid volunteers. Participants also rated their own responses following our translation of Kihlstrom's Scale of Involuntariness (2006). Item pass rates, score distributions, and reliability were calculated and compared with several other reference samples. Analyses show that the present French norms are congruous with the reference samples. Interestingly, the passing rate for some items drops significantly if "entirely voluntary" responses (as identified by Kihlstrom's scale) are scored as "fail." Copies of the translated scales and response booklet are available online.
A preliminary quality of life questionnaire-bronchiectasis: a patient-reported outcome measure for bronchiectasis.

PubMed

Quittner, Alexandra L; Marciel, Kristen K; Salathe, Matthias A; O'Donnell, Anne E; Gotfried, Mark H; Ilowite, Jonathan S; Metersky, Mark L; Flume, Patrick A; Lewis, Sandra A; McKevitt, Matthew; Montgomery, A Bruce; O'Riordan, Thomas G; Barker, Alan F

2014-08-01

The Quality of Life Questionnaire-Bronchiectasis (QOL-B) is the first disease-specific, patient-reported outcome measure for patients with bronchiectasis. Content validity, cognitive testing, responsivity to open-label treatment, and psychometric analyses are presented. Reviews of literature, existing measures, and physician input were used to generate the initial QOL-B. Modifications following preliminary cognitive testing (N = 35 patients with bronchiectasis) generated version (V) 1.0. An open-ended patient interview study (N = 28) provided additional information and was content analyzed to derive saturation matrices, which summarized all disease-related topics mentioned by each participant. This resulted in QOL-B V2.0. Psychometric analyses were carried out using results from an open-label phase 2 trial, in which 89 patients were enrolled and treated with aztreonam for inhalation solution. Responsivity to open-label treatment was observed. Additional analyses generated QOL-B V3.0, with 37 items on eight scales: respiratory symptoms; physical, role, emotional, and social functioning; vitality; health perceptions; and treatment burden. For each scale, scores are standardized on a 0-to-100-point scale; higher scores indicate better health-related quality of life. No total score is calculated. A final cognitive testing study (N = 40) resulted in a minor change to one social functioning scale item (QOL-B V3.1). Content validity, cognitive testing, responsivity to open-label treatment, and initial psychometric analyses supported QOL-B items and structure. This interim QOL-B is a promising tool for evaluating the efficacy of new therapies for patients with bronchiectasis and for measuring symptoms, functioning, and quality of life in these patients on a routine basis. A final psychometric validation study is needed and is forthcoming. ClinicalTrials.gov; No.: NCT00805025; URL: www.clinicaltrials.gov.
The Caregiver Contribution to Heart Failure Self-Care (CACHS): Further Psychometric Testing of a Novel Instrument.

PubMed

Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael

2017-04-01

Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
The emotion dysregulation inventory: Psychometric properties and item response theory calibration in an autism spectrum disorder sample.

PubMed

Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A

2018-06-01

Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018, 11: 928-941. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical techniques were applied to identify the best final items. The EDI is unique because it captures common emotional problems in ASD and is appropriate for both nonverbal and verbal youth. It is an efficient and sensitive measure for use in clinical assessments, monitoring, and research with youth with ASD. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
Measurement issues in research on social support and health.

PubMed Central

Dean, K; Holst, E; Kreiner, S; Schoenborn, C; Wilson, R

1994-01-01

STUDY OBJECTIVE--The aims were: (1) to identify methodological problems that may explain the inconsistencies and contradictions in the research evidence on social support and health, and (2) to validate a frequently used measure of social support in order to determine whether or not it could be used in multivariate analyses of population data in research on social support and health. DESIGN AND METHODS--Secondary analysis of data collected in a cross sectional survey of a multistage cluster sample of the population of the United States, designed to study relationships in behavioural, social support and health variables. Statistical models based on item response theory and graph theory were used to validate the measure of social support to be used in subsequent analyses. PARTICIPANTS--Data on 1755 men and women aged 20 to 64 years were available for the scale validation. RESULTS--Massive evidence of item bias was found for all items of a group membership subscale. The most serious problems were found in relationship to an item measuring membership in work related groups. Using that item in the social network scale in multivariate analyses would distort findings on the statistical effects of education, employment status, and household income. Evidence of item bias was also found for a sociability subscale. When marital status was included to create what is called an intimate contacts subscale, the confounding grew worse. CONCLUSIONS--The composite measure of social network is not valid and would seriously distort the findings of analyses attempting to study relationships between the index and other variables. The findings show that valid measurement is a methodological issue that must be addressed in scientific research on population health. PMID:8189179
Indicators of Family Care for Development for Use in Multicountry Surveys

PubMed Central

Kariger, Patricia; Engle, Patrice; Britto, Pia M. Rebello; Sywulka, Sara M.; Menon, Purnima

2012-01-01

Indicators of family care for development are essential for ascertaining whether families are providing their children with an environment that leads to positive developmental outcomes. This project aimed to develop indicators from a set of items, measuring family care practices and resources important for caregiving, for use in epidemiologic surveys in developing countries. A mixed method (quantitative and qualitative) design was used for item selection and evaluation. Qualitative and quantitative analyses were conducted to examine the validity of candidate items in several country samples. Qualitative methods included the use of global expert panels to identify and evaluate the performance of each candidate item as well as in-country focus groups to test the content validity of the items. The quantitative methods included analyses of item-response distributions, using bivariate techniques. The selected items measured two family care practices (support for learning/stimulating environment and limit-setting techniques) and caregiving resources (adequacy of the alternate caregiver when the mother worked). Six play-activity items, indicative of support for learning/stimulating environment, were included in the core module of UNICEF's Multiple Cluster Indictor Survey 3. The other items were included in optional modules. This project provided, for the first time, a globally-relevant set of items for assessing family care practices and resources in epidemiological surveys. These items have multiple uses, including national monitoring and cross-country comparisons of the status of family care for development used globally. The obtained information will reinforce attention to efforts to improve the support for development of children. PMID:23304914
Development and evaluation of the PI-G: a three-scale measure based on the German translation of the PROMIS ® pain interference item bank.

PubMed

Farin, Erik; Nagl, Michaela; Gramm, Lukas; Heyduck, Katja; Glattacker, Manuela

2014-05-01

Study aim was to translate the PROMIS(®) pain interference (PI) item bank (41 items) into German, test its psychometric properties in patients with chronic low back pain and develop static subforms. We surveyed N = 262 patients undergoing rehabilitation who were asked to fill out questionnaires at the beginning and 2 weeks after the end of rehabilitation, applying the Oswestry Disability Index (ODI) and Pain Disability Index (PDI) in addition to the PROMIS(®) PI items. For psychometric testing, a 1-parameter item response theory (IRT) model was used. Exploratory and confirmatory factor analyses as well as reliability and construct validity analyses were conducted. The assumptions regarding IRT scaling of the translated PROMIS(®) PI item bank as a whole were not confirmed. However, we succeeded in devising three static subforms (PI-G scales: PI mental 13 items, PI functional 11 items, PI physical 4 items), revealing good psychometric properties. The PI-G scales in their static form can be recommended for use in German-speaking countries. Their strengths versus the ODI and PDI are that pain interference is assessed in a differentiated manner and that several psychometric values are somewhat better than those associated with the ODI and PDI (distribution properties, IRT model fit, reliability). To develop an IRT-scaled item bank of the German translations of the PROMIS(®) PI items, it would be useful to have additional studies (e.g., with larger sample sizes and using a 2-parameter IRT model).
Caregiver Appraisals of Functional Dependence in Individuals With Dementia and Associated Caregiver Upset: Psychometric Properties of a New Scale and Response Patterns by Caregiver and Care Recipient Characteristics

PubMed Central

GITLIN, LAURA N.; ROTH, DAVID L.; BURGIO, LOUIS D.; LOEWENSTEIN, DAVID A.; WINTER, LARAINE; NICHOLS, LINDA; ARGÜELLES, SOLEDAD; CORCORAN, MARY; BURNS, ROBERT; MARTINDALE, JENNIFER

2008-01-01

Objective To evaluate psychometric properties and response patterns of the Caregiver Assessment of Function and Upset (CAFU), a 15-item multidimensional measure of dependence in dementia patients and caregiver reaction. Method 640 families were administered the CAFU (53% White, 43% African American, and 4% mixed race and ethnicity). We created a random split of the sample and conducted exploratory factor analyses on Sample 1 and confirmatory factor analyses on Sample 2. Convergent and discriminant validity were evaluated using Spearman rank correlation coefficients. Results A two-factor structure for functional items was derived, and excellent factorial validity was obtained. Convergent and discriminant validity were obtained for function and upset measures. Differential response patterns for dependence and caregiver upset were found for caregiver race, relationship, and care recipient gender but not for caregiver gender. Discussion The CAFU is easily administered, reliable, and valid for evaluating appraisals of dependencies and upsetting care areas. PMID:15750049
The diagnostic utility of separation anxiety disorder symptoms: An item response theory analysis

PubMed Central

Cooper-Vince, Christine E.; Emmert-Aronson, Benjamin O.; Pincus, Donna B.; Comer, Jonathan S.

2013-01-01

At present, it is not clear whether the current definition of separation anxiety disorder (SAD) is the optimal classification of developmentally inappropriate, severe, and interfering separation anxiety in youth. Much remains to be learned about the relative contributions of individual SAD symptoms for informing diagnosis. Two-parameter logistic Item Response Theory analyses were conducted on the eight core SAD symptoms in an outpatient anxiety sample of treatment-seeking children (N=359, 59.3% female, MAge=11.2) and their parents to determine the diagnostic utility of each of these symptoms. Analyses considered values of item threshold, which characterize the SAD severity level at which each symptom has a 50% chance of being endorsed, and item discrimination, which characterize how well each symptom distinguishes individuals with higher and lower levels of SAD. Distress related to separation and fear of being alone without major attachment figures showed the strongest discrimination properties and the lowest thresholds for being endorsed. In contrast, worry about harm befalling attachment figures showed the poorest discrimination properties, and nightmares about separation showed the highest threshold for being endorsed. Distress related to separation demonstrated crossing differential item functioning associated with age—at lower separation anxiety levels excessive fear at separation was more likely to be endorsed for children ≥9 years, whereas at higher levels this symptom was more likely to be endorsed by children <9 years. Implications are discussed for optimizing the taxonomy of SAD in youth. PMID:23963543
Analyses of Children's Mathematics Proficiency from ECLS-K 1998 and 2010 Cohorts: Why Early Mathematics?

ERIC Educational Resources Information Center

Lee, Joohi; Pant, Mohan D.

2017-01-01

This article presents the correlation analyses of mathematics item response theory scores from the Early Childhood Longitudinal Study, Kindergarten Class of 1998 and 2010 data, and proposes the critical need for systematic efforts to improve the quality of pre- and in-service teachers of young children in teaching mathematics.
Development and Analyses of the Coping Stress Inventory

ERIC Educational Resources Information Center

Gadzella, Bernadette M.; Pierce, Devin; Young, Adena

2008-01-01

This is a report on the development of a coping stress inventory and the analyses of the data collected from 344 participants. The Coping Stress Inventory, CSI, with 16 items intercorrelated in the categories (Behavioral, Emotional, and Cognitive Appraisal). The internal consistency for the CSI was 0.77. Responses to the CSI were compared (a)…
Measurement Invariance in Careers Research: Using IRT to Study Gender Differences in Medical Students' Specialization Decisions

ERIC Educational Resources Information Center

Behrend, Tara S.; Thompson, Lori Foster; Meade, Adam W.; Newton, Dale A.; Grayson, Martha S.

2008-01-01

The current study demonstrates the use of item response theory (IRT) to conduct measurement invariance analyses in careers research. A self-report survey was used to assess the importance 1,363 fourth-year medical students placed on opportunities to provide comprehensive patient care when choosing a career specialty. IRT analyses supported…
Item response modeling: a psychometric assessment of the children's fruit, vegetable, water, and physical activity self-efficacy scales among Chinese children.

PubMed

Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C

2017-09-16

This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form

PubMed Central

Kisala, Pamela A.; Tulsky, David S.; Choi, Seung W.; Kirshblum, Steven C.

2015-01-01

Objective To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Design Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Pressure Ulcers scale. Results 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. Conclusions The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item “short form” and is available for both research and clinical practice. PMID:26010965
Item Analyses of Memory Differences

PubMed Central

Salthouse, Timothy A.

2017-01-01

Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285
[KON-2006--Neurotic Personality Questionnaire].

PubMed

Aleksandrowicz, Jerzy W; Klasa, Katarzyna; Sobański, Jerzy A; Stolarska, Dorota

2007-01-01

Construction of a questionnaire describing personality traits connected to the occurrence and persistence of neurotic disorders. Responses of 794 patients (before treatment) and 520 persons from the control group on items of the constructed personality questionnaire and the symptom checklist "0". Analyses of subscales reliability and item-scale correlations, test-retest and split-half reliability. Factor analyses estimating internal reliability of the questionnaire. Cross-validation with the KO"0". symptom checklist Psychometric properties of KON-2006 questionnaire indicate that it is consistent and reliable enough. Validity analyses indicate a large probability that the X-KON coefficient informs on personality dysfunctions related to neurotic disorders. The Neurotic Personality Questionnaire KON-2006 may serve to estimate personality traits connected to the occurrence and persistence of neurotic disorders as well as changes resulting from psychotherapy.
The self in conflict: the role of executive processes during truthful and deceptive responses about attitudes.

PubMed

Johnson, Ray; Henkell, Heather; Simon, Elizabeth; Zhu, John

2008-01-01

This study sought to extend previous results regarding deceptions about specific memories by investigating the role of executive processes in deceptions about evaluative judgments. In addition, given that previous studies of deception have not included valence manipulations, we also wanted to determine whether the goodness/badness aspect of the items would affect the processes used during deception. Thus, we compared behavioral and event-related potential (ERP) activity while participants made truthful and directed lie (i.e., press opposite of the truth) responses about attitude items with which they either strongly agreed or disagreed. Consistent with previous results, deceptive responses required greater cognitive control as indicated by slower RTs, larger medial frontal negativities (MFN) and smaller late positive components than truthful responses. Furthermore, the magnitude of these deception-related effects was dependent on the valence that participants assigned to the items (i.e., agree/disagree). Directed lie responses about attitudes also resulted in greatly reduced pre-response positivities, an indication that participants strategically monitored their responses even in the absence of explicit task demands. Item valence also differentially affected the amplitude of three ERP components in a 650 ms pre-response interval, independently of whether truthful or deceptive responses were made. Analyses using dipole locations based on results from fMRI studies of evaluative judgments and deception indicated a high degree of overlap between the ERP and fMRI results and revealed the possible temporal characteristics of the hemodynamic activations.

The Role of Content and Context in PISA Interest Scales: A study of the embedded interest items in the PISA 2006 science assessment

NASA Astrophysics Data System (ADS)

Drechsel, Barbara; Carstensen, Claus; Prenzel, Manfred

2011-01-01

This paper focuses interest in science as one of the attitudinal aspects of scientific literacy. Large-scale data from the Programme for International Student Assessment (PISA) 2006 are analysed in order to describe student interest more precisely. So far the analyses have provided a general indicator of interest, aggregated over all contexts and contents in the science test. With its innovative approach PISA embeds interest items within the cognitive test unit and its contents and contexts. The main difference from conventional interest measures is that in most questionnaires, a relatively small number of interest items cover broad fields of contents and contexts. The science units represent a number of systematically differentiated scientific contexts and contents. The units' stimulus texts allow for concrete descriptions of relevant content aspects, applications, and contexts. In the analyses, multidimensional item response models are applied in order to disentangle student interest. The results indicate that multidimensional models fit the data. A two-dimensional model separating interest into two different knowledge of science dimensions described in the PISA science framework is further analysed with respect to gender, performance differences, and country. The findings give a comprehensive description of students' interest in science. The paper deals with methodological problems and describes requirements of the test construction for further assessments. The results are discussed with regard to their significance for science education.
Validation of the conceptual research utilization scale: an application of the standards for educational and psychological testing in healthcare.

PubMed

Squires, Janet E; Estabrooks, Carole A; Newburn-Cook, Christine V; Gierl, Mark

2011-05-19

There is a lack of acceptable, reliable, and valid survey instruments to measure conceptual research utilization (CRU). In this study, we investigated the psychometric properties of a newly developed scale (the CRU Scale). We used the Standards for Educational and Psychological Testing as a validation framework to assess four sources of validity evidence: content, response processes, internal structure, and relations to other variables. A panel of nine international research utilization experts performed a formal content validity assessment. To determine response process validity, we conducted a series of one-on-one scale administration sessions with 10 healthcare aides. Internal structure and relations to other variables validity was examined using CRU Scale response data from a sample of 707 healthcare aides working in 30 urban Canadian nursing homes. Principal components analysis and confirmatory factor analyses were conducted to determine internal structure. Relations to other variables were examined using: (1) bivariate correlations; (2) change in mean values of CRU with increasing levels of other kinds of research utilization; and (3) multivariate linear regression. Content validity index scores for the five items ranged from 0.55 to 1.00. The principal components analysis predicted a 5-item 1-factor model. This was inconsistent with the findings from the confirmatory factor analysis, which showed best fit for a 4-item 1-factor model. Bivariate associations between CRU and other kinds of research utilization were statistically significant (p < 0.01) for the latent CRU scale score and all five CRU items. The CRU scale score was also shown to be significant predictor of overall research utilization in multivariate linear regression. The CRU scale showed acceptable initial psychometric properties with respect to responses from healthcare aides in nursing homes. Based on our validity, reliability, and acceptability analyses, we recommend using a reduced (four-item) version of the CRU scale to yield sound assessments of CRU by healthcare aides. Refinement to the wording of one item is also needed. Planned future research will include: latent scale scoring, identification of variables that predict and are outcomes to conceptual research use, and longitudinal work to determine CRU Scale sensitivity to change.
Item response theory analysis applied to the Spanish version of the Personal Outcomes Scale.

PubMed

Guàrdia-Olmos, J; Carbó-Carreté, M; Peró-Cebollero, M; Giné, C

2017-11-01

The study of measurements of quality of life (QoL) is one of the great challenges of modern psychology and psychometric approaches. This issue has greater importance when examining QoL in populations that were historically treated on the basis of their deficiency, and recently, the focus has shifted to what each person values and desires in their life, as in cases of people with intellectual disability (ID). Many studies of QoL scales applied in this area have attempted to improve the validity and reliability of their components by incorporating various sources of information to achieve consistency in the data obtained. The adaptation of the Personal Outcomes Scale (POS) in Spanish has shown excellent psychometric attributes, and its administration has three sources of information: self-assessment, practitioner and family. The study of possible congruence or incongruence of observed distributions of each item between sources is therefore essential to ensure a correct interpretation of the measure. The aim of this paper was to analyse the observed distribution of items and dimensions from the three Spanish POS information sources cited earlier, using the item response theory. We studied a sample of 529 people with ID and their respective practitioners and family member, and in each case, we analysed items and factors using Samejima's model of polytomic ordinal scales. The results indicated an important number of items with differential effects regarding sources, and in some cases, they indicated significant differences in the distribution of items, factors and sources of information. As a result of this analysis, we must affirm that the administration of the POS, considering three sources of information, was adequate overall, but a correct interpretation of the results requires that it obtain much more information to consider, as well as some specific items in specific dimensions. The overall ratings, if these comments are considered, could result in bias. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
What Do You Think You Are Measuring? A Mixed-Methods Procedure for Assessing the Content Validity of Test Items and Theory-Based Scaling

PubMed Central

Koller, Ingrid; Levenson, Michael R.; Glück, Judith

2017-01-01

The valid measurement of latent constructs is crucial for psychological research. Here, we present a mixed-methods procedure for improving the precision of construct definitions, determining the content validity of items, evaluating the representativeness of items for the target construct, generating test items, and analyzing items on a theoretical basis. To illustrate the mixed-methods content-scaling-structure (CSS) procedure, we analyze the Adult Self-Transcendence Inventory, a self-report measure of wisdom (ASTI, Levenson et al., 2005). A content-validity analysis of the ASTI items was used as the basis of psychometric analyses using multidimensional item response models (N = 1215). We found that the new procedure produced important suggestions concerning five subdimensions of the ASTI that were not identifiable using exploratory methods. The study shows that the application of the suggested procedure leads to a deeper understanding of latent constructs. It also demonstrates the advantages of theory-based item analysis. PMID:28270777
Evaluating Differential Item Functioning in the English General Practice Patient Survey: Comparison of South Asian and White British Subgroups.

PubMed

Setodji, Claude M; Elliott, Marc N; Abel, Gary; Burt, Jenni; Roland, Martin; Campbell, John

2015-09-01

To evaluate two 5-item patient experience scales from the English General Practice (GP) Patient Survey for evidence of differential item functioning (DIF) given prior evidence of substantially worse reported health care experiences for South Asian compared with white British respondents. A national survey of English patients' primary care experiences. We used classic test and item response theory analysis to examine the possibility of DIF by patient ethnicity (South Asian, white British) after controlling for age, sex, health status, and quality of life in the English GP Patient Survey conducted in 2011/2012. Data were available for 873,051 respondents (818,219 white British/54,832 South Asian from 7795 English practices) who answered items relating to experiences of GP or nurses' care. Internal consistency reliability was high and similar for South Asian and white British patients. White British patients reported better average experiences than South Asians, but there was no evidence of DIF or different item response curves for white British and South Asian respondents, even in sensitivity analyses using matched samples. All communication items in the English GP Patient Survey showed similar South Asian versus white British differences, with no evidence of DIF. In contrast, differences due to scale use or expectations are typically variable rather than constant across scales. While other possibilities remain, these findings increase the likelihood that the observed negative responses of South Asian patients to this national survey reflect true differences in their experiences of care.
Identification of high school students' ability level of constructing free body diagrams to solve restricted and structured response items in force matter

NASA Astrophysics Data System (ADS)

Rahmaniar, Andinisa; Rusnayati, Heni; Sutiadi, Asep

2017-05-01

While solving physics problem particularly in force matter, it is needed to have the ability of constructing free body diagrams which can help students to analyse every force which acts on an object, the length of its vector and the naming of its force. Mix method was used to explain the result without any special treatment to participants. The participants were high school students in first grade totals 35 students. The purpose of this study is to identify students' ability level of constructing free body diagrams in solving restricted and structured response items. Considering of two types of test, every student would be classified into four levels ability of constructing free body diagrams which is every level has different characteristic and some students were interviewed while solving test in order to know how students solve the problem. The result showed students' ability of constructing free body diagrams on restricted response items about 34.86% included in no evidence of level, 24.11% inadequate level, 29.14% needs improvement level and 4.0% adequate level. On structured response items is about 16.59% included no evidence of level, 23.99% inadequate level, 36% needs improvement level, and 13.71% adequate level. Researcher found that students who constructed free body diagrams first and constructed free body diagrams correctly were more successful in solving restricted and structured response items.
Evaluating the Dimensionality of Self-Determination Theory's Relative Autonomy Continuum.

PubMed

Sheldon, Kennon M; Osin, Evgeny N; Gordeeva, Tamara O; Suchkov, Dmitry D; Sychev, Oleg A

2017-09-01

We conducted a theoretical and psychometric evaluation of self-determination theory's "relative autonomy continuum" (RAC), an important aspect of the theory whose validity has recently been questioned. We first derived a Comprehensive Relative Autonomy Index (C-RAI) containing six subscales and 24 items, by conducting a paired paraphrase content analysis of existing RAI measures. We administered the C-RAI to multiple U.S. and Russian samples, assessing motivation to attend class, study a major, and take responsibility. Item-level and scale-level multidimensional scaling analyses, confirmatory factor analyses, and simplex/circumplex modeling analyses reaffirmed the validity of the RAC, across multiple samples, stems, and studies. Validation analyses predicting subjective well-being and trait autonomy from the six separate subscales, in combination with various higher order composites (weighted and unweighted), showed that an aggregate unweighted RAI score provides the most unbiased and efficient indicator of the overall quality of motivation within the behavioral domain being assessed.
Thyroid-specific questions on work ability showed known-groups validity among Danes with thyroid diseases.

PubMed

Nexo, Mette Andersen; Watt, Torquil; Bonnema, Steen Joop; Hegedüs, Laszlo; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue

2015-07-01

We aimed to identify the best approach to work ability assessment in patients with thyroid disease by evaluating the factor structure, measurement equivalence, known-groups validity, and predictive validity of a broad set of work ability items. Based on the literature and interviews with thyroid patients, 24 work ability items were selected from previous questionnaires, revised, or developed anew. Items were tested among 632 patients with thyroid disease (non-toxic goiter, toxic nodular goiter, Graves' disease (with or without orbitopathy), autoimmune hypothyroidism, and other thyroid diseases), 391 of which had participated in a study 5 years previously. Responses to select items were compared to general population data. We used confirmatory factor analyses for categorical data, logistic regression analyses and tests of differential item function, and head-to-head comparisons of relative validity in distinguishing known groups. Although all work ability items loaded on a common factor, the optimal factor solution included five factors: role physical, role emotional, thyroid-specific limitations, work limitations (without disease attribution), and work performance. The scale on thyroid-specific limitations showed the most power in distinguishing clinical groups and time since diagnosis. A global single item proved useful for comparisons with the general population, and a thyroid-specific item predicted labor market exclusion within the next 5 years (OR 5.0, 95 % CI 2.7-9.1). Items on work limitations with attribution to thyroid disease were most effective in detecting impact on work ability and showed good predictive validity. Generic work ability items remain useful for general population comparisons.
Depression symptoms across cultures: an IRT analysis of standard depression symptoms using data from eight countries.

PubMed

Haroz, E E; Bolton, P; Gross, A; Chan, K S; Michalopoulos, L; Bass, J

2016-07-01

Prevalence estimates of depression vary between countries, possibly due to differential functioning of items between settings. This study compared the performance of the widely used Hopkins symptom checklist 15-item depression scale (HSCL-15) across multiple settings using item response theory analyses. Data came from adult populations in the low and middle income countries (LMIC) of Colombia, Indonesia, Kurdistan Iraq, Rwanda, Iraq, Thailand (Burmese refugees), and Uganda (N = 4732). Item parameters based on a graded response model were compared across LMIC settings. Differential item functioning (DIF) by setting was evaluated using multiple indicators multiple causes (MIMIC) models. Most items performed well across settings except items related to suicidal ideation and "loss of sexual interest or pleasure," which had low discrimination parameters (suicide: a = 0.31 in Thailand to a = 2.49 in Indonesia; sexual interest: a = 0.74 in Rwanda to a = 1.26 in one region of Kurdistan). Most items showed some degree of DIF, but DIF only impacted aggregate scale-level scores in Indonesia. Thirteen of the 15 HSCL depression items performed well across diverse settings, with most items showing a strong relationship to the underlying trait of depression. The results support the cross-cultural applicability of most of these depression symptoms across LMIC settings. DIF impacted aggregate depression scores in one setting illustrating a possible source of measurement invariance in prevalence estimates.
Item-level psychometrics and predictors of performance for Spanish/English bilingual speakers on an object and action naming battery.

PubMed

Edmonds, Lisa A; Donovan, Neila J

2012-04-01

There is a pressing need for psychometrically sound naming materials for Spanish/English bilingual adults. To address this need, in this study the authors examined the psychometric properties of An Object and Action Naming Battery (An O&A Battery; Druks & Masterson, 2000) in bilingual speakers. Ninety-one Spanish/English bilinguals named O&A Battery items in English and Spanish. Responses underwent a Rasch analysis. Using correlation and regression analyses, the authors evaluated the effect of psycholinguistic (e.g., imageability) and participant (e.g., proficiency ratings) variables on accuracy. Rasch analysis determined unidimensionality across English and Spanish nouns and verbs and robust item-level psychometric properties, evidence for content validity. Few items did not fit the model, there were no ceiling or floor effects after uninformative and misfit items were removed, and items reflected a range of difficulty. Reliability coefficients were high, and the number of statistically different ability levels provided indices of sensitivity. Regression analyses revealed significant correlations between psycholinguistic variables and accuracy, providing preliminary construct validity. The participant variables that contributed most to accuracy were proficiency ratings and time of language use. Results suggest adequate content and construct validity of O&A items retained in the analysis for Spanish/English bilingual adults and support future efforts to evaluate naming in older bilinguals and persons with bilingual aphasia.
An item response curves analysis of the Force Concept Inventory

NASA Astrophysics Data System (ADS)

Morris, Gary A.; Harshman, Nathan; Branum-Martin, Lee; Mazur, Eric; Mzoughi, Taha; Baker, Stephen D.

2012-09-01

Several years ago, we introduced the idea of item response curves (IRC), a simplistic form of item response theory (IRT), to the physics education research community as a way to examine item performance on diagnostic instruments such as the Force Concept Inventory (FCI). We noted that a full-blown analysis using IRT would be a next logical step, which several authors have since taken. In this paper, we show that our simple approach not only yields similar conclusions in the analysis of the performance of items on the FCI to the more sophisticated and complex IRT analyses but also permits additional insights by characterizing both the correct and incorrect answer choices. Our IRC approach can be applied to a variety of multiple-choice assessments but, as applied to a carefully designed instrument such as the FCI, allows us to probe student understanding as a function of ability level through an examination of each answer choice. We imagine that physics teachers could use IRC analysis to identify prominent misconceptions and tailor their instruction to combat those misconceptions, fulfilling the FCI authors' original intentions for its use. Furthermore, the IRC analysis can assist test designers to improve their assessments by identifying nonfunctioning distractors that can be replaced with distractors attractive to students at various ability levels.
The Recovery Knowledge Inventory for Measurement of Nursing Student Views on Recovery-oriented Mental Health Services.

PubMed

Happell, Brenda; Byrne, Louise; Platania-Phung, Chris

2015-01-01

Recovery-oriented services are a goal for policy and practice in the Australian mental health service system. Evidence-based reform requires an instrument to measure knowledge of recovery concepts. The Recovery Knowledge Inventory (RKI) was designed for this purpose, however, its suitability and validity for student health professionals has not been evaluated. The purpose of the current article is to report the psychometric features of the RKI for measuring nursing students' views on recovery. The RKI, a self-report measure, consists of four scales: (I) Roles and Responsibilities, (II) Non-Linearity of the Recovery Process, (III) Roles of Self-Definition and Peers, and (IV) Expectations Regarding Recovery. Confirmatory and exploratory factor analyses of the baseline data (n = 167) were applied to assess validity and reliability. Exploratory factor analyses generally replicated the item structure suggested by the three main scales, however more stringent analyses (confirmatory factor analysis) did not provide strong support for convergent validity. A refined RKI with 16 items had internal reliabilities of α = .75 for Roles and Responsibilities, α = .49 for Roles of Self-Definition and Peers, and α = .72, for Recovery as Non-Linear Process. If the RKI is to be applied to nursing student populations, the conceptual underpinning of the instrument needs to be reworked, and new items should be generated to evaluate and improve scale validity and reliability.
Stroke Self-efficacy Questionnaire: a Rasch-refined measure of confidence post stroke.

PubMed

Riazi, Afsane; Aspden, Trefor; Jones, Fiona

2014-05-01

Measuring self-efficacy during rehabilitation provides an important insight into understanding recovery post stroke. A Rasch analysis of the Stroke Self-efficacy Questionnaire (SSEQ) was undertaken to establish its use as a clinically meaningful and scientifically rigorous measure. One hundred and eighteen stroke patients completed the SSEQ with the help of an interviewer. Participants were recruited from local acute stroke units and community stroke rehabilitation teams. Data were analysed with confirmatory factor analysis conducted using AMOS and Rasch analysis conducted using RUMM2030 software. Confirmatory factor analysis and Rasch analyses demonstrated the presence of two separate scales that measure stroke survivors' self-efficacy with: i) self-management and ii) functional activities. Guided by Rasch analyses, the response categories of these two scales were collapsed from an 11-point to a 4-point scale. Modified scales met the expectations of the Rasch model. Items satisfied the Rasch requirements (overall and individual item fit, local response independence, differential item functioning, unidimensionality). Furthermore, the two subscales showed evidence of good construct validity. The new SSEQ has good psychometric properties and is a clinically useful assessment of self-efficacy after stroke. The scale measures stroke survivors' self-efficacy with self-management and activities as two unidimensional constructs. It is recommended for use in clinical and research interventions, and in evaluating stroke self-management interventions.
Psychometric Evaluation of the Ford Insomnia Response to Stress Test (FIRST) in Early Pregnancy.

PubMed

Gelaye, Bizu; Zhong, Qiu-Yue; Barrios, Yasmin V; Redline, Susan; Drake, Christopher L; Williams, Michelle A

2016-04-15

To evaluate the construct validity and factor structure of the Spanish-language version of the Ford Insomnia Response to Stress Test questionnaire (FIRST-S) when used in early pregnancy. A cohort of 647 women were interviewed at ≤ 16 weeks of gestation to collect information regarding lifestyle, demographic, and sleep characteristics. The factorial structure of the FIRST-S was tested through exploratory and confirmatory factor analyses (EFA and CFA). Internal consistency and construct validity were also assessed by evaluating the association between the FIRST-S with symptoms of depression, anxiety, and sleep quality. Item response theory (IRT) analyses were conducted to complement classical test theory (CTT) analytic approaches. The mean score of the FIRST-S was 13.8 (range: 9-33). The results of the EFA showed that the FIRST-S contained a one-factor solution that accounted for 69.8% of the variance. The FIRST-S items showed good internal consistency (Cronbach α = 0.81). CFA results corroborated the one-factor structure finding from the EFA; and yielded measures indicating goodness of fit (comparative fit index of 0.902) and accuracy (root mean square error of approximation of 0.057). The FIRST-S had good construct validity as demonstrated by statistically significant associations of FIRST-S scores with sleep quality, antepartum depression and anxiety symptoms. Finally, results from IRT analyses suggested excellent item infit and outfit measures. The FIRST-S was found to have good construct validity and internal consistency for assessing vulnerability to insomnia during early pregnancy. © 2016 American Academy of Sleep Medicine.
Psychometric Properties and Performance of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Depression Short Forms in Ethnically Diverse Groups

PubMed Central

Teresi, Jeanne A.; Ocepek-Welikson, Katja; Kleinman, Marjorie; Ramirez, Mildred; Kim, Giyeon

2017-01-01

Short form measures from the Patient Reported Outcomes Measurement Information System® (PROMIS®) are used widely. The present study was among the first to examine differential item functioning (DIF) in the PROMIS Depression short form scales in a sample of over 5000 racially/ethnically diverse patients with cancer. DIF analyses were conducted across different racial/ethnic, educational, age, gender and language groups. Methods DIF hypotheses, generated by content experts, informed the evaluation of the DIF analyses. The graded item response theory (IRT) model was used to evaluate the five-level ordinal items. The primary tests of DIF were Wald tests; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude was evaluated using expected item score functions, and the non-compensatory differential item functioning (NCDIF) and T1 indexes, both based on group differences in the item curves. Aggregate impact was evaluated with expected scale score (test) response functions; individual impact was assessed through examination of differences in DIF adjusted and unadjusted depression estimates. Results Many items evidenced DIF; however, only a few had slightly elevated magnitude. No items evidenced salient DIF with respect to NCDIF and the scale-level impact was minimal for all group comparisons. The following short form items might be targeted for further study because they were also hypothesized to evidence DIF. One item showed slightly higher magnitude of DIF for age: nothing to look forward to; conditional on depression, this item was more likely to be endorsed in the depressed direction by individuals in older groups as contrasted with the cohort aged 21 to 49. This item was also hypothesized to show age DIF. Only one item (failure) showed DIF of slightly higher magnitude (just above threshold) for Whites vs. Asians/Pacific Islanders in the direction of higher likelihood of endorsement for Asians/Pacific Islanders. This item was also hypothesized to show DIF for minority groups. The impact of DIF was negligible. Conditional on depression, the items, worthless and hopeless were more likely to be endorsed in the depressed direction by respondents with less than high school education vs. those with a graduate degree; the magnitude of DIF was slightly above the T1 threshold, but not that of NCDIF. These items were also hypothesized to show DIF in the direction of more feelings of worthlessness by groups with lower education. While the magnitude and aggregate impact of DIF was small, in a few instances, individual impact was observed. Information provided was relatively high, particularly in the middle upper (depressed) tail of the distribution. Reliability estimates were high (> 0.90) across all studied groups, regardless of estimation method. Conclusions This was the first study to evaluate measurement equivalence of the PROMIS Depression short forms across large samples of ethnically diverse groups. There were few items with DIF, and none of high magnitude, thus supporting the use of PROMIS Depression short form measures across such groups. These results could be informative for those using the short forms in minority populations or clinicians evaluating individuals with the depression short forms. PMID:28553573
Modeling Student Test-Taking Motivation in the Context of an Adaptive Achievement Test

ERIC Educational Resources Information Center

Wise, Steven L.; Kingsbury, G. Gage

2016-01-01

This study examined the utility of response time-based analyses in understanding the behavior of unmotivated test takers. For the data from an adaptive achievement test, patterns of observed rapid-guessing behavior and item response accuracy were compared to the behavior expected under several types of models that have been proposed to represent…
Using root cause analysis to promote critical thinking in final year Bachelor of Midwifery students.

PubMed

Carter, Amanda G; Sidebotham, Mary; Creedy, Debra K; Fenwick, Jennifer; Gamble, Jenny

2014-06-01

Midwives require well developed critical thinking to practice autonomously. However, multiple factors impinge on students' deep learning in the clinical context. Analysis of actual case scenarios using root cause analysis may foster students' critical thinking and application of 'best practice' principles in complex clinical situations. To examine the effectiveness of an innovative teaching strategy involving root cause analysis to develop students' perceptions of their critical thinking abilities. A descriptive, mixed methods design was used. Final 3rd year undergraduate midwifery students (n=22) worked in teams to complete and present an assessment item based on root cause analysis. The cases were adapted from coroners' reports. After graduation, 17 (77%) students evaluated the course using a standard university assessment tool. In addition 12 (54%) students provided specific feedback on the teaching strategy using a 16-item survey tool based on the domain concepts of Educational Acceptability, Educational Impact, and Preparation for Practice. Survey responses were on a 5-point Likert scale and analysed using descriptive statistics. Open-ended responses were analysed using content analysis. The majority of students perceived the course and this teaching strategy positively. The domain mean scores were high for Educational Acceptability (mean=4.3, SD=.49) and Educational Impact (mean=4.19, SD=.75) but slightly lower for Preparation for Practice (mean=3.7, SD=.77). Overall student responses to each item were positive with no item mean less than 3.42. Students found the root cause analysis challenging and time consuming but reported development of critical thinking skills about the complexity of practice, clinical governance and risk management principles. Analysing complex real life clinical cases to determine a root cause enhanced midwifery students' perceptions of their critical thinking. Teaching and assessment strategies to promote critical thinking need to be made explicit to students in order to foster ongoing development. © 2013.
Parietal lobe critically supports successful paired immediate and single-item delayed memory for targets.

PubMed

Krumm, Sabine; Kivisaari, Sasa L; Monsch, Andreas U; Reinhardt, Julia; Ulmer, Stephan; Stippich, Christoph; Kressig, Reto W; Taylor, Kirsten I

2017-05-01

The parietal lobe is important for successful recognition memory, but its role is not yet fully understood. We investigated the parietal lobes' contribution to immediate paired-associate memory and delayed item-recognition memory separately for hits (targets) and correct rejections (distractors). We compared the behavioral performance of 56 patients with known parietal and medial temporal lobe dysfunction (i.e. early Alzheimer's Disease) to 56 healthy control participants in an immediate paired and delayed single item object memory task. Additionally, we performed voxel-based morphometry analyses to investigate the functional-neuroanatomic relationships between performance and voxel-based estimates of atrophy in whole-brain analyses. Behaviorally, all participants performed better identifying targets than rejecting distractors. The voxel-based morphometry analyses associated atrophy in the right ventral parietal cortex with fewer correct responses to familiar items (i.e. hits) in the immediate and delayed conditions. Additionally, medial temporal lobe integrity correlated with better performance in rejecting distractors, but not in identifying targets, in the immediate paired-associate task. Our findings suggest that the parietal lobe critically supports successful immediate and delayed target recognition memory, and that the ventral aspect of the parietal cortex and the medial temporal lobe may have complementary preferences for identifying targets and rejecting distractors, respectively, during recognition memory. Copyright © 2017. Published by Elsevier Inc.
Adaptation of the Practice Environment Scale for military nurses: a psychometric analysis.

PubMed

Swiger, Pauline A; Raju, Dheeraj; Breckenridge-Sproat, Sara; Patrician, Patricia A

2017-09-01

The aim of this study was to confirm the psychometric properties of Practice Environment Scale of the Nursing Work Index in a military population. This study also demonstrates association rule analysis, a contemporary exploratory technique. One of the instruments most commonly used to evaluate the nursing practice environment is the Practice Environment Scale of the Nursing Work Index. Although the instrument has been widely used, the reliability, validity and individual item function are not commonly evaluated. Gaps exist with regard to confirmatory evaluation of the subscale factors, individual item analysis and evaluation in the outpatient setting and with non-registered nursing staff. This was a secondary data analysis of existing survey data. Multiple psychometric methods were used for this analysis using survey data collected in 2014. First, descriptive analyses were conducted, including exploration using association rules. Next, internal consistency was tested and confirmatory factor analysis was performed to test the factor structure. The specified factor structure did not hold; therefore, exploratory factor analysis was performed. Finally, item analysis was executed using item response theory. The differential item functioning technique allowed the comparison of responses by care setting and nurse type. The results of this study indicate that responses differ between groups and that several individual items could be removed without altering the psychometric properties of the instrument. The instrument functions moderately well in a military population; however, researchers may want to consider nurse type and care setting during analysis to identify any meaningful variation in responses. © 2017 John Wiley & Sons Ltd.
The construct validity of the Major Depression Inventory: A Rasch analysis of a self-rating scale in primary care.

PubMed

Nielsen, Marie Germund; Ørnbøl, Eva; Vestergaard, Mogens; Bech, Per; Christensen, Kaj Sparle

2017-06-01

We aimed to assess the measurement properties of the ten-item Major Depression Inventory when used on clinical suspicion in general practice by performing a Rasch analysis. General practitioners asked consecutive persons to respond to the web-based Major Depression Inventory on clinical suspicion of depression. We included 22 practices and 245 persons. Rasch analysis was performed using RUMM2030 software. The Rasch model fit suggests that all items contribute to a single underlying trait (defined as internal construct validity). Mokken analysis was used to test dimensionality and scalability. Our Rasch analysis showed misfit concerning the sleep and appetite items (items 9 and 10). The response categories were disordered for eight items. After modifying the original six-point to a four-point scoring system for all items, we achieved ordered response categories for all ten items. The person separation reliability was acceptable (0.82) for the initial model. Dimensionality testing did not support combining the ten items to create a total score. The scale appeared to be well targeted to this clinical sample. No significant differential item functioning was observed for gender, age, work status and education. The Rasch and Mokken analyses revealed two dimensions, but the Major Depression Inventory showed fit to one scale if items 9 and 10 were excluded. Our study indicated scalability problems in the current version of the Major Depression Inventory. The conducted analysis revealed better statistical fit when items 9 and 10 were excluded. Copyright © 2017 Elsevier Inc. All rights reserved.

A new Integrated Negative Symptom structure of the Positive and Negative Syndrome Scale (PANSS) in schizophrenia using item response analysis.

PubMed

Khan, Anzalee; Lindenmayer, Jean-Pierre; Opler, Mark; Yavorsky, Christian; Rothman, Brian; Lucic, Luka

2013-10-01

Debate persists with regard to how best to categorize the syndromal dimension of negative symptoms in schizophrenia. The aim was to first review published Principle Components Analysis (PCA) of the PANSS, and extract items most frequently included in the negative domain, and secondly, to examine the quality of items using Item Response Theory (IRT) to select items that best represent a measurable dimension (or dimensions) of negative symptoms. First, 22 factor analyses and PCA met were included. Second, using a large dataset (n=7187) of participants in clinical trials with chronic schizophrenia, we extracted items loading on one or more PCA. Third, items not loading with a value of ≥ 0.5, or loading on more than one component with values of ≥ 0.5 were discarded. Fourth, resulting items were included in a non-parametric IRT and retained based on Option Characteristic Curves (OCCs) and Item Characteristic Curves (ICCs). 15 items loaded on a negative domain in at least one study, with Emotional Withdrawal loading on all studies. Non-parametric IRT retained nine items as an Integrated Negative Factor: Emotional Withdrawal, Blunted Affect, Passive/Apathetic Social Withdrawal, Poor Rapport, Lack of Spontaneity/Conversation Flow, Active Social Avoidance, Disturbance of Volition, Stereotyped Thinking and Difficulty in Abstract Thinking. This is the first study to use a psychometric IRT process to arrive at a set of negative symptom items. Future steps will include further examination of these nine items in terms of their stability, sensitivity to change, and correlations with functional and cognitive outcomes. © 2013 Elsevier B.V. All rights reserved.
Measurement Equivalence of the K6 Scale: The Effects of Race/Ethnicity and Language

PubMed Central

Kim, Giyeon; DeCoster, Jamie; Bryant, Ami N.; Ford, Katy L.

2017-01-01

This study examined the measurement equivalence of the K6 across diverse racial/ethnic and linguistic groups in the U.S. differential item functioning analyses using item response theory were conducted among 44,846 U.S. adults drawn from the California Health Interview Survey. Results show that four items (“nervous,” “restless,” “depressed,” and “everything an effort”) varied significantly across races/ethnicities and four items (“nervous,” “hopeless,” “restless,” and “depressed”) varied significantly across languages. In additional effect size analyses designed to separate effects of race/ethnicity from language, the structure of the White English group was substantially different from both the Hispanic/Latino English group and Hispanic/Latino Spanish group, whereas the Hispanic/Latino Spanish group was not different from the Hispanic/Latino English group. The findings suggest that there was evident measurement nonequivalence in the K6 among racially/ethnically and linguistically diverse adults and that the observed nonequivalence in the K6 appears to be driven by language rather than race/ethnicity. PMID:26282779
Should the SCOPA-COG be modified? A Rasch analysis perspective.

PubMed

Forjaz, M J; Frades-Payo, B; Rodriguez-Blazquez, C; Ayala, A; Martinez-Martin, P

2010-02-01

The SCales for Outcomes in PArkinson's disease-Cognition (SCOPA-COG) is a specific measure of cognitive function for Parkinson's disease (PD) patients. Previous studies, under the frame of the classic test theory, indicate satisfactory psychometric properties. The Rasch model, an item response theory approach, provides new information about the scale, as well as results in a linear scale. This study aims at analysing the SCOPA-COG according to the Rasch model and, on the basis of results, suggesting modification to the SCOPA-COG. Fit to the Rasch model was analysed using a sample of 384 PD patients. A good fit was obtained after rescoring for disordered thresholds. The person separation index, a reliability measure, was 0.83. Differential item functioning was observed by age for three items and by gender for one item. The SCOPA-COG is a unidimensional measure of global cognitive function in PD patients, with good scale targeting and no empirical evidence for use of the subscale scores. Its adequate reliability and internal construct validity were supported. The SCOPA-COG, with the proposed scoring scheme, generates true linear interval scores.
Do animals and furniture items elicit different brain responses in human infants?

PubMed

Jeschonek, Susanna; Marinovic, Vesna; Hoehl, Stefanie; Elsner, Birgit; Pauen, Sabina

2010-11-01

One of the earliest categorical distinctions to be made by preverbal infants is the animate-inanimate distinction. To explore the neural basis for this distinction in 7-8-month-olds, an equal number of animal and furniture pictures was presented in an ERP-paradigm. The total of 118 pictures, all looking different from each other, were presented in a semi-randomized order for 1000ms each. Infants' brain responses to exemplars from both categories differed systematically regarding the negative central component (Nc: 400-600ms) at anterior channels. More specifically, the Nc was enhanced for animals in one subgroup of infants, and for furniture items in another subgroup of infants. Explorative analyses related to categorical priming further revealed category-specific differences in brain responses in the late time window (650-1550ms) at right frontal channels: Unprimed stimuli (preceded by a different-category item) elicited a more positive response as compared to primed stimuli (preceded by a same-category item). In sum, these findings suggest that the infant's brain discriminates exemplars from both global domains. Given the design of our task, we conclude that processes of category identification are more likely to account for our findings than processes of on-line category formation during the experimental session. Copyright © 2009 Elsevier B.V. All rights reserved.
Exploratory Factor Analyses of the CAHPS® Hospital Pilot Survey Responses across and within Medical, Surgical, and Obstetric Services

PubMed Central

O'Malley, A James; Zaslavsky, Alan M; Hays, Ron D; Hepner, Kimberly A; Keller, San; Cleary, Paul D

2005-01-01

Objectives To estimate the associations among hospital-level scores from the Consumer Assessments of Healthcare Providers and Systems (CAHPS®) Hospital pilot survey within and across different services (surgery, obstetrics, medical), and to evaluate differences between hospital- and patient-level analyses. Data Source CAHPS Hospital pilot survey data provided by the Centers for Medicare and Medicaid Services. Study Design Responses to 33 questionnaire items were analyzed using patient- and hospital-level exploratory factor analytic (EFA) methods to identify both a patient-level and hospital-level composite structures for the CAHPS Hospital survey. The latter EFA was corrected for patient-level sampling variability using a hierarchical model. We compared results of these analyses with each other and to separate EFAs conducted at the service level. To quantify the similarity of assessments across services, we compared correlations of different composites within the same service with those of the same composite across different services. Data Collection Cross-sectional data were collected during the summer of 2003 via mail and telephone from 19,720 patients discharged from November 2002 through January 2003 from 132 hospitals in three states. Principal Findings Six factors provided the best description of inter-item covariation at the patient level. Analyses that assessed variability across both services and hospitals suggested that three dimensions provide a parsimonious summary of inter-item covariation at the hospital level. Hospital-level factor structures also differed across services; as much variation in quality reports was explained by service as by composite. Conclusions Variability of CAHPS scores across hospitals can be reported parsimoniously using a limited number of composites. There is at least as much distinct information in composite scores from different services as in different composite scores within each service. Because items cluster slightly differently in the different services, service-specific composites may be more informative when comparing patients in a given service across hospitals. When studying individual-level variability, a more differentiated structure is probably more appropriate. PMID:16316439
Older adults' drug benefit beliefs: construct definition and measure development.

PubMed

Cline, Richard R; Gupta, Kiran; Singh, Reshmi L

2008-03-01

The Medicare Prescription Drug, Improvement and Modernization Act of 2003 provides coverage of outpatient prescription drugs for Medicare beneficiaries. Although much has been learned since the program's implementation, a context within which this information can be understood is lacking. The purpose of this study was to develop a reliable and valid multi-item instrument measuring beliefs about Medicare prescription drug benefits. Survey items were generated using focus group transcripts, other surveys on the Medicare Part "D" program, and past studies of choice and satisfaction in drug insurance programs. Using data from the survey pilot test, item and reliability analyses were used to reduce and refine an initial pool of items. Data then were collected from a cross-sectional, mail survey of older adults living in Minnesota. Data were analyzed using exploratory factor analysis. Summated rating scales then were constructed and assessed further using reliability analyses. Construct validity of summated scales was examined by comparing scale scores across response categories of survey items that collected information on general political attitudes, perceptions of the Medicare Part "D" program, health status, and health care utilization and demographics. The adjusted response rate for the main survey was 55.98% (744/1329). Iterative factor analysis produced 2 interpretable scales. The first, termed "access/equity" (13 items, Cronbach's alpha=0.89) measures beliefs that a Medicare drug benefit should both provide affordable prescription drugs for beneficiaries and do this in a manner that is equitable for all participants. The second, termed "comprehensibility" (6 items, Cronbach's alpha=0.80) assesses beliefs that regulations governing a Medicare drug benefit should be easily understood. Discriminant validity tests suggest that these measures behave in a manner consistent with related research in these areas. Measures of 2 facets of older adults' drug benefit beliefs were developed using a multiple step procedure. Future research could focus on developing a better understanding of other facets of these beliefs and sound methods of measurement.
Validation of the brief version of the Recovery Self-Assessment (RSA-B) using Rasch measurement theory.

PubMed

Barbic, Skye P; Kidd, Sean A; Davidson, Larry; McKenzie, Kwame; O'Connell, Maria J

2015-12-01

In psychiatry, the recovery paradigm is increasingly identified as the overarching framework for service provision. Currently, the Recovery Self-Assessment (RSA), a 36-item rating scale, is commonly used to assess the uptake of a recovery orientation in clinical services. However, the consumer version of the RSA has been found challenging to complete because of length and the reading level required. In response to this feedback, a brief 12-item version of the RSA was developed (RSA-B). This article describes the development of the modified instrument and the application of traditional psychometric analysis and Rasch Measurement Theory to test the psychometrics properties of the RSA-B. Data from a multisite study of adults with serious mental illnesses (n = 1256) who were followed by assertive community treatment teams were examined for reliability, clinical meaning, targeting, response categories, model fit, reliability, dependency, and raw interval-level measurement. Analyses were performed using the Rasch Unidimensional Measurement Model (RUMM 2030). Adequate fit to the Rasch model was observed (χ2 = 112.46, df = 90, p = .06) and internal consistency was good (r = .86). However, Rasch analysis revealed limitations of the 12-item version, with items covering only 39% of the targeted theoretical continuum, 2 misfitting items, and strong evidence for the 5 option response categories not working as intended. This study revealed areas for improvement in the shortened version of the 12-item RSA-B. A revisit of the conceptual model and original 36-item rating scale is encouraged to select items that will help practitioners and researchers measure the full range of recovery orientation. (c) 2015 APA, all rights reserved).
Correspondence of verbal descriptor and numeric rating scales for pain intensity: an item response theory calibration.

PubMed

Edelen, Maria Orlando; Saliba, Debra

2010-07-01

Assessing pain intensity in older adults is critical and challenging. There is debate about the most effective way to ask older adults to describe their pain severity, and clinicians vary in their preferred approaches, making comparison of pain intensity scores across settings difficult. A total of 3,676 residents from 71 community nursing homes across eight states were asked about pain presence. The 1,960 residents who reported pain within the past 5 days (53% of total, 70% female; age: M = 77.9, SD = 12.4) were included in analyses. Those who reported pain were also asked to provide a rating of pain intensity using either a verbal descriptor scale (VDS; mild, moderate, severe, and very severe and horrible), a numeric rating scale (NRS; 0 = no pain to 10 = worst pain imaginable), or both. We used item response theory (IRT) methods to identify the correspondence between the VDS and the NRS response options by estimating item parameters for these and five additional pain items. The sample reported moderate amounts of pain on average. Examination of the IRT location parameters for the pain intensity items indicated the following approximate correspondence: VDS mild approximately NRS 1-4, VDS moderate approximately NRS 5-7, VDS severe approximately NRS 8-9, and VDS very severe, horrible approximately NRS 10. This IRT calibration provides a crosswalk between the two response scales so that either can be used in practice depending on the preference of the clinician and respondent.
Psychometric Evaluation of the Hypogonadism Impact of Symptoms Questionnaire Short Form (HIS-Q-SF).

PubMed

Gelhorn, Heather L; Roberts, Laurie J; Khandelwal, Nikhil; Revicki, Dennis A; DeRogatis, Leonard R; Dobs, Adrian; Hepp, Zsolt; Miller, Michael G

2017-08-01

The Hypogonadism Impact of Symptoms Questionnaire Short Form (HIS-Q-SF) is a patient-reported outcome measurement designed to evaluate the symptoms of hypogonadism. The HIS-Q-SF is an abbreviated version including17 items from the original 28-item HIS-Q. To conduct item analyses and reduction, evaluate the psychometric properties of the HIS-Q-SF, and provide guidance on score interpretation. A 12-week observational longitudinal study of hypogonadal men was conducted as part of the original HIS-Q psychometric evaluation. Participants completed the original HIS-Q every 2 weeks. Blood samples were collected to evaluate testosterone levels. Participants completed the Aging Male's Symptoms Scale, the International Index of Erectile Function, the Short Form-12, and the PROMIS Sexual Activity, Satisfaction with Sex Life, Sleep Disturbance, and Applied Cognition Scales (baseline and weeks 6 and 12). Clinicians completed the Clinical Global Impression of Severity and Change scales and a clinical form. Item performance was evaluated using descriptive statistics and Rasch analyses. Reliability (internal consistency and test-retest), validity (concurrent and know groups), and responsiveness were assessed. One hundred seventy-seven men participated (mean age = 54.1 years, range = 23-83). Similar to the full HIS-Q, the final abbreviated HIS-Q-SF instrument includes five domains (sexual, energy, sleep, cognition, and mood) with two sexual subdomains (libido and sexual function). For key domains, test-retest reliability was very good, and construct validity was good for all domains. Known-groups validity was demonstrated for all domain scores, subdomain scores, and total score based on the Clinical Global Impression-Severity. All domains and subdomains were responsive to change based on patient-rated anchor questions. The HIS-Q-SF could be a useful tool in clinical practice, epidemiologic studies, and other academic research settings. Careful consideration was given to the selection of the final HIS-Q-SF items based on quantitative data and clinical expert feedback. Overall, the reduced set of items demonstrated strong psychometric properties. Testosterone levels for the participating men were not as low as anticipated, which could have limited the ability to examine the relations between the HIS-Q-SF and testosterone levels. Further, the analyses used data collected through administration of the full HIS-Q, and future studies should administer the standalone HIS-Q-SF to replicate the psychometric analyses reported in the present study. Similar to the original HIS-Q, the HIS-Q-SF has evidence supporting reliability, validity, and responsiveness. The short form includes a smaller set of items that might be more suitable for use in clinical practice or academic research settings. Gelhorn HL, Roberts LJ, Khandelwal N, et al. Psychometric Evaluation of the Hypogonadism Impact of Symptoms Questionnaire Short Form (HIS-Q-SF). J Sex Med 2017;14:1046-1058. Copyright © 2017 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
Maximum Marginal Likelihood Estimation of a Monotonic Polynomial Generalized Partial Credit Model with Applications to Multiple Group Analysis.

PubMed

Falk, Carl F; Cai, Li

2016-06-01

We present a semi-parametric approach to estimating item response functions (IRF) useful when the true IRF does not strictly follow commonly used functions. Our approach replaces the linear predictor of the generalized partial credit model with a monotonic polynomial. The model includes the regular generalized partial credit model at the lowest order polynomial. Our approach extends Liang's (A semi-parametric approach to estimate IRFs, Unpublished doctoral dissertation, 2007) method for dichotomous item responses to the case of polytomous data. Furthermore, item parameter estimation is implemented with maximum marginal likelihood using the Bock-Aitkin EM algorithm, thereby facilitating multiple group analyses useful in operational settings. Our approach is demonstrated on both educational and psychological data. We present simulation results comparing our approach to more standard IRF estimation approaches and other non-parametric and semi-parametric alternatives.
The value of item response theory in clinical assessment: a review.

PubMed

Thomas, Michael L

2011-09-01

Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical assessment are reviewed to appraise its current and potential value. Benefits of IRT include comprehensive analyses and reduction of measurement error, creation of computer adaptive tests, meaningful scaling of latent variables, objective calibration and equating, evaluation of test and item bias, greater accuracy in the assessment of change due to therapeutic intervention, and evaluation of model and person fit. The theory may soon reinvent the manner in which tests are selected, developed, and scored. Although challenges remain to the widespread implementation of IRT, its application to clinical assessment holds great promise. Recommendations for research, test development, and clinical practice are provided.
Calibrating well-being, quality of life and common mental disorder items: psychometric epidemiology in public mental health research.

PubMed

Böhnke, Jan R; Croudace, Tim J

2016-08-01

The assessment of 'general health and well-being' in public mental health research stimulates debates around relative merits of questionnaire instruments and their items. Little evidence regarding alignment or differential advantages of instruments or items has appeared to date. Population-based psychometric study of items employed in public mental health narratives. Multidimensional item response theory was applied to General Health Questionnaire (GHQ-12), Warwick-Edinburgh Mental Well-being Scale (WEMWBS) and EQ-5D items (Health Survey for England, 2010-2012; n = 19 290). A bifactor model provided the best account of the data and showed that the GHQ-12 and WEMWBS items assess mainly the same construct. Only one item of the EQ-5D showed relevant overlap with this dimension (anxiety/depression). Findings were corroborated by comparisons with alternative models and cross-validation analyses. The consequences of this lack of differentiation (GHQ-12 v. WEMWBS) for mental health and well-being narratives deserves discussion to enrich debates on priorities in public mental health and its assessment. © The Royal College of Psychiatrists 2015.
World Health Assembly agendas and trends of international health issues for the last 43 years: analysis of World Health Assembly agendas between 1970 and 2012.

PubMed

Kitamura, Tomomi; Obara, Hiromi; Takashima, Yoshihiro; Takahashi, Kenzo; Inaoka, Kimiko; Nagai, Mari; Endo, Hiroyoshi; Jimba, Masamine; Sugiura, Yasuo

2013-05-01

To analyse the trends and characteristics of international health issues through agenda items of the World Health Assembly (WHA) from 1970 to 2012. Agendas in Committees A/B of the WHA were classified as Administrative or Technical and Health Matters. Agenda items of Health Matters were sorted into five categories by the WHO reform in the 65th WHA. The agenda items in each category and sub-category were counted. There were 1647 agenda items including 423 Health Matters, which were sorted into five categories: communicable diseases (107, 25.3%), health systems (81, 19.1%), noncommunicable diseases (59, 13.9%), preparedness surveillance and response (58, 13.7%), and health through the life course (36, 8.5%). Among the sub-categories, HIV/AIDS, noncommunicable diseases in general, health for all, millennium development goals, influenza, and international health regulations, were discussed frequently and appeared associated with the public health milestones, but maternal and child health were discussed three times. The number of the agenda items differed for each Director-General's term of office. The WHA agendas cover a variety of items, but not always reflect international health issues in terms of disease burden. The Member States of WHO should take their responsive roles in proposing more balanced agenda items. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Evaluating Instrument Quality in Science Education: Rasch-based analyses of a Nature of Science test

NASA Astrophysics Data System (ADS)

Neumann, Irene; Neumann, Knut; Nehm, Ross

2011-07-01

Given the central importance of the Nature of Science (NOS) and Scientific Inquiry (SI) in national and international science standards and science learning, empirical support for the theoretical delineation of these constructs is of considerable significance. Furthermore, tests of the effects of varying magnitudes of NOS knowledge on domain-specific science understanding and belief require the application of instruments validated in accordance with AERA, APA, and NCME assessment standards. Our study explores three interrelated aspects of a recently developed NOS instrument: (1) validity and reliability; (2) instrument dimensionality; and (3) item scales, properties, and qualities within the context of Classical Test Theory and Item Response Theory (Rasch modeling). A construct analysis revealed that the instrument did not match published operationalizations of NOS concepts. Rasch analysis of the original instrument-as well as a reduced item set-indicated that a two-dimensional Rasch model fit significantly better than a one-dimensional model in both cases. Thus, our study revealed that NOS and SI are supported as two separate dimensions, corroborating theoretical distinctions in the literature. To identify items with unacceptable fit values, item quality analyses were used. A Wright Map revealed that few items sufficiently distinguished high performers in the sample and excessive numbers of items were present at the low end of the performance scale. Overall, our study outlines an approach for how Rasch modeling may be used to evaluate and improve Likert-type instruments in science education.
A Mixed Effects Randomized Item Response Model

ERIC Educational Resources Information Center

Fox, J.-P.; Wyrick, Cheryl

2008-01-01

The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…
Computer-adaptive test to measure community reintegration of Veterans.

PubMed

Resnik, Linda; Tian, Feng; Ni, Pengsheng; Jette, Alan

2012-01-01

The Community Reintegration of Injured Service Members (CRIS) measure consists of three scales measuring extent of, perceived limitations in, and satisfaction with community reintegration. Length of the CRIS may be a barrier to its widespread use. Using item response theory (IRT) and computer-adaptive test (CAT) methodologies, this study developed and evaluated a briefer community reintegration measure called the CRIS-CAT. Large item banks for each CRIS scale were constructed. A convenience sample of 517 Veterans responded to all items. Exploratory and confirmatory factor analyses (CFAs) were used to identify the dimensionality within each domain, and IRT methods were used to calibrate items. Accuracy and precision of CATs of different lengths were compared with the full-item bank, and data were examined for differential item functioning (DIF). CFAs supported unidimensionality of scales. Acceptable item fit statistics were found for final models. Accuracy of 10-, 15-, 20-, and variable-item CATs for all three scales was 0.88 or above. CAT precision increased with number of items administered and decreased at the upper ranges of each scale. Three items exhibited moderate DIF by sex. The CRIS-CAT demonstrated promising measurement properties and is recommended for use in community reintegration assessment.
The Multidimensional Assessment of Interoceptive Awareness (MAIA)

PubMed Central

Mehling, Wolf E.; Price, Cynthia; Daubenmier, Jennifer J.; Acree, Mike; Bartmess, Elizabeth; Stewart, Anita

2012-01-01

This paper describes the development of a multidimensional self-report measure of interoceptive body awareness. The systematic mixed-methods process involved reviewing the current literature, specifying a multidimensional conceptual framework, evaluating prior instruments, developing items, and analyzing focus group responses to scale items by instructors and patients of body awareness-enhancing therapies. Following refinement by cognitive testing, items were field-tested in students and instructors of mind-body approaches. Final item selection was achieved by submitting the field test data to an iterative process using multiple validation methods, including exploratory cluster and confirmatory factor analyses, comparison between known groups, and correlations with established measures of related constructs. The resulting 32-item multidimensional instrument assesses eight concepts. The psychometric properties of these final scales suggest that the Multidimensional Assessment of Interoceptive Awareness (MAIA) may serve as a starting point for research and further collaborative refinement. PMID:23133619
Rasch validation of the Arabic version of the lower extremity functional scale.

PubMed

Alnahdi, Ali H

2018-02-01

The purpose of this study was to examine the internal construct validity of the Arabic version of the Lower Extremity Functional Scale (20-item Arabic LEFS) using Rasch analysis. Patients (n = 170) with lower extremity musculoskeletal dysfunction were recruited. Rasch analysis of 20-item Arabic LEFS was performed. Once the initial Rasch analysis indicated that the 20-item Arabic LEFS did not fit the Rasch model, follow-up analyses were conducted to improve the fit of the scale to the Rasch measurement model. These modifications included removing misfitting individuals, changing item scoring structure, removing misfitting items, addressing bias caused by response dependency between items and differential item functioning (DIF). Initial analysis indicated deviation of the 20-item Arabic LEFS from the Rasch model. Disordered thresholds in eight items and response dependency between six items were detected with the scale as a whole did not meet the requirement of unidimensionality. Refinements led to a 15-item Arabic LEFS that demonstrated excellent internal consistency (person separation index [PSI] = 0.92) and satisfied all the requirement of the Rasch model. Rasch analysis did not support the 20-item Arabic LEFS as a unidimensional measure of lower extremity function. The refined 15-item Arabic LEFS met all the requirement of the Rasch model and hence is a valid objective measure of lower extremity function. The Rasch-validated 15-item Arabic LEFS needs to be further tested in an independent sample to confirm its fit to the Rasch measurement model. Implications for Rehabilitation The validity of the 20-item Arabic Lower Extremity Functional Scale to measure lower extremity function is not supported. The 15-item Arabic version of the LEFS is a valid measure of lower extremity function and can be used to quantify lower extremity function in patients with lower extremity musculoskeletal disorders.
The Quality of Working Life Questionnaire for Cancer Survivors (QWLQ-CS): a Pre-test Study.

PubMed

de Jong, Merel; Tamminga, Sietske J; de Boer, Angela G E M; Frings-Dresen, Monique H W

2016-06-02

Returning to and continuing work is important to many cancer survivors, but also represents a challenge. We know little about subjective work outcomes and how cancer survivors perceive being returned to work. Therefore, we developed the Quality of Working Life Questionnaire for Cancer Survivors (QWLQ-CS). Our aim was to pre-test the items of the initial QWLQ-CS on acceptability and comprehensiveness. In addition, item retention was performed by pre-assessing the relevance scores and response distributions of the items in the QWLQ-CS. Semi-structured interviews were conducted after cancer survivors, who had returned to work, filled in the 102 items of the QWLQ-CS. To improve acceptability and comprehensiveness, the semi-structured interview inquired about items that were annoying, difficult, confusing, twofold or redundant. If cancer survivors had difficulty explaining their opinion or emotion about an item, the interviewer used verbal probing technique to investigate the cancer survivor's underlying thoughts. The cancer survivors' comments on the items were analysed, and items were revised accordingly. Decisions on item retention regarding the relevance of items and the response distributions were made by means of pre-set decision rules. The 19 cancer survivors (53 % male) had a mean age of 51 ± 11 years old. They were diagnosed between 2009 and 2013 with lymphoma, leukaemia, prostate cancer, breast cancer, or colon cancer. Acceptability of the QWLQ-CS was good - none of the items were annoying - but 73 items were considered difficult, confusing, twofold or redundant. To improve acceptability, for instance, the authors replaced the phrase 'disease' with 'health situation' in several items. Consequently, comprehensiveness was improved by the authors rephrasing and adjusting items by adding clarifying words, such as 'in the work situation'. The pre-assessment of the relevance scores resulted in a sufficient number of cancer survivors indicating the items as relevant to their quality of working life, and no evident indication for uneven response distributions. Therefore, all items were retained. The 104 items of the preliminary QWLQ-CS were found relevant, acceptable and comprehensible by cancer survivors who have returned to work. The QWLQ-CS is now suitable for larger sample sizes of cancer survivors, which is necessary to test the psychometric properties of this questionnaire.
Evaluation of the Irritable Bowel Syndrome Quality of Life (IBS-QOL) questionnaire in diarrheal-predominant irritable bowel syndrome patients

PubMed Central

2013-01-01

Background Diarrhea-predominant irritable bowel syndrome (IBS-d) significantly diminishes the health-related quality of life (HRQOL) of patients. Psychological and social impacts are common with many IBS-d patients reporting comorbid depression, anxiety, decreased intimacy, and lost working days. The Irritable Bowel Syndrome Quality of Life (IBS-QOL) questionnaire is a 34-item instrument developed and validated for measurement of HRQOL in non-subtyped IBS patients. The current paper assesses this previously-validated instrument employing data collected from 754 patients who participated in a randomized clinical trial of a novel treatment, eluxadoline, for IBS-d. Methods Psychometric methods common to HRQOL research were employed to evaluate the IBS-QOL. Many of the historical analyses of the IBS-QOL validations were used. Other techniques that extended the original methods were applied where more appropriate for the current dataset. In IBS-d patients, we analyzed the items and substructure of the IBS-QOL via item reduction, factor structure, internal consistency, reproducibility, construct validity, and ability to detect change. Results This study supports the IBS-QOL as a psychometrically valid measure. Factor analyses suggested that IBS-specific QOL as measured by the IBS-QOL is a unidimensional construct. Construct validity was further buttressed by significant correlations between IBS-QOL total scores and related measures of IBS-d severity including the historically-relevant Irritable Bowel Syndrome Adequate Relief (IBS-AR) item and the FDA’s Clinical Responder definition. The IBS-QOL also showed a significant ability to detect change as evidenced by analysis of treatment effects. A minority of the items, unrelated to the IBS-d, performed less well by the standards set by the original authors. Conclusions We established that the IBS-QOL total score is a psychometrically valid measure of HRQOL in IBS-d patients enrolled in this study. Our analyses suggest that the IBS-QOL items demonstrate very good construct validity and ability to detect changes due to treatment effects. Furthermore, our analyses suggest that the IBS-QOL items measure a univariate construct and we believe further modeling of the IBS-QOL from an item response theory (IRT) approach under both non-treatment and treatment conditions would greatly further our understanding as item-based methods could be used to develop a short form. PMID:24330412

The emotion regulation questionnaire in women with cancer: A psychometric evaluation and an item response theory analysis.

PubMed

Brandão, Tânia; Schulz, Marc S; Gross, James J; Matos, Paula Mena

2017-10-01

Emotion regulation is thought to play an important role in adaptation to cancer. However, the emotion regulation questionnaire (ERQ), a widely used instrument to assess emotion regulation, has not yet been validated in this context. This study addresses this gap by examining the psychometric properties of the ERQ in a sample of Portuguese women with cancer. The ERQ was administered to 204 women with cancer (mean age = 48.89 years, SD = 7.55). Confirmatory factor analysis and item response theory analysis were used to examine psychometric properties of the ERQ. Confirmatory factor analysis confirmed the 2-factor solution proposed by the original authors (expressive suppression and cognitive reappraisal). This solution was invariant across age and type of cancer. Item response theory analyses showed that all items were moderately to highly discriminant and that items are better suited for identifying moderate levels of expressive suppression and cognitive reappraisal. Support was found for the internal consistency and test-retest reliability of the ERQ. The pattern of relationships with emotional control, alexithymia, emotional self-efficacy, attachment, and quality of life provided evidence of the convergent and concurrent validity for both dimensions of the ERQ. Overall, the ERQ is a psychometrically sound approach for assessing emotion regulation strategies in the oncological context. Clinical implications are discussed. Copyright © 2016 John Wiley & Sons, Ltd.
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda.

PubMed

Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert

2008-12-02

The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda.
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda

PubMed Central

Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert

2008-01-01

Background The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. Methods A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. Results The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. Conclusion This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda. PMID:19055716
Capturing the true burden of dystonia on patients: the Cervical Dystonia Impact Profile (CDIP-58).

PubMed

Cano, S J; Warner, T T; Linacre, J M; Bhatia, K P; Thompson, A J; Fitzpatrick, R; Hobart, J C

2004-11-09

To develop a new rating scale for measuring the health impact of cervical dystonia (CD) that includes patients' perceptions and complements existing observer dependent clinician rating scales. Scale development was in three stages. In Stage 1, a large pool of items was generated from patient interviews (n = 25), expert opinion, and literature review. In Stage 2, these items were administered by postal survey to people with CD. The resulting data were analyzed using Rasch item analysis to construct, from the item pool, a rating scale that satisfied criteria for rigorous measurement. In Stage 3, the measurement properties of this rating scale were examined in an independent sample of people with CD. In Stage 1, 150 items concerning the health impact of CD were generated. In Stage 2, 556 people completed questionnaires (87% response rate) and a 58-item rating scale measuring the health impact of CD in eight areas was constructed (CD Impact Profile, CDIP-58). In Stage 3, CDIP-58 data from 391 people (87% response rate) were received. Analyses supported the measurement of eight unidimensional constructs (infit mean square range 0.62 to 1.50), item calibration (33.37 to 67.56), and patient separation statistics (2.59 to 3.38). Items demonstrated stable calibrations in subgroups of people with CD supporting the stability of the CDIP-58. The CDIP-58 is a reliable and valid patient-based rating scale measuring the health impact of CD in eight health dimensions.
More relevant, precise, and efficient items for assessment of physical function and disability: moving beyond the classic instruments

PubMed Central

Fries, J F; Bruce, B; Bjorner, J; Rose, M

2006-01-01

Objectives Patient reported outcomes (PROs) have become standard study endpoints. However, little attention has been given to using item improvement to advance PRO performance which could improve precision, clarity, patient relevance, and information content of “physical function/disability” items and thus the performance of resulting instruments. Methods The present study included1860 physical function/disability items from 165 instruments. Item formulations were assessed by frequency of use, modified Delphi consensus, respondent judgement of clarity and importance, and item response theory (IRT). Data from 1100 rheumatoid arthritis, osteoarthritis, and normal ageing subjects, using qualitative item review, focus groups, cognitive interviews, and patient survey were used to achieve a unique item pool that was clear, reliable, sensitive to change, readily translatable, devoid of floor and ceiling limitations, contained unidimensional subdomains, and had maximal information content. Results A “present tense” time frame was used most frequently, better understood, more readily translated, and more directly estimated the latent trait of disability. Items in the “past tense” had 80–90% false negatives (p<0.001). The best items were brief, clear, and contained a single construct. Responses with four to five options were preferred by both experts and respondents. The term physical function may be preferable to the term disability because of fewer floor effects. IRT analyses of “disability” suggest four independent subdomains (mobility, dexterity, axial, and compound) with factor loadings of 0.81–0.99. Conclusions Major improvement in performance of items and instruments is possible, and may have the effect of substantially reducing sample size requirements for clinical trials. PMID:17038464
Measurement invariance across Genders on the Childhood Illness Attitude Scales (CIAS).

PubMed

Thorisdottir, Audur S; Villadsen, Anna; LeBouthillier, Daniel M; Rask, Charlotte Ulrikka; Wright, Kristi D; Walker, John R; Feldgaier, Steven; Asmundson, Gordon J G

2017-07-01

The Childhood Illness Attitude Scales (CIAS) were created as a developmentally appropriate measure for symptoms of health anxiety (HA) in school-aged children. Despite overall sound psychometric properties reported in previous studies, more comprehensive examination of the latent structure and potential response bias in the CIAS is needed. The purpose of the present study was to cross-validate the latent structure of the CIAS across genders and to examine gender-specific variations in CIAS scores. The sample comprised data from 602 Canadian and Danish school-aged children (M age =10.54, SD=0.99; 52.5% girls). Confirmatory factor analyses were conducted to test 3-, modified 3-, and 4-factor models in both samples. Multigroup confirmatory factor analysis was performed to test factor structure invariance across boys and girls in a combined sample. Differential Item Functioning (DIF) was assessed using test characteristic curves. A modified 3-factor solution (i.e., fears=11 items, help-seeking=6 items, and symptom effects=4 items) provided the best fit to the data (χ 2 (364, N=602)=681.7, p<0.001; χ 2 /df=1.803; RMSEA=0.037; CFI=0.926). The factor structure was stable, well-fitting, and indicated measurement invariance across groups. DIF analyses revealed no gender-based response bias at the scale level. Results support a revised 3-factor version of the CIAS that can be used with confidence to assess symptoms of HA in school-aged boys and girls. Copyright © 2017 Elsevier Inc. All rights reserved.
Well-being as a moving target: measurement equivalence of the Bradburn Affect Balance Scale.

PubMed

Maitland, S B; Dixon, R A; Hultsch, D F; Hertzog, C

2001-03-01

Although the Bradburn Affect Balance scale (ABS) is a frequently used two-factor indicator of well-being in later life, its measurement and invariance properties are not well documented. We examined these issues using confirmatory factor analyses of cross-sectional (adults ages 54-87 years) and longitudinal data from the Victoria Longitudinal Study. Stability of the positive and negative affect factors was moderate across a 3-year period. Overall, factor loadings for positive affect items were invariant over time with the exception of the pleased item. Negative affect items were time invariant. However, age-group comparisons between young-old and old-old groups revealed age differences in loadings for the upset item at Time 1. Finally, gender groups differed in loadings for the top of the world and going your way items. Thus a pattern of partial measurement equivalence characterized item response to the ABS. Our results suggest that group comparisons and longitudinal change in ABS scale scores of positive and negative affect should be interpreted with caution.
Development and Psychometric Evaluation of the Gay Male Sexual Difficulties Scale.

PubMed

McDonagh, Lorraine K; Stewart, Ian; Morrison, Melanie A; Morrison, Todd G

2016-08-01

Sexual difficulties (i.e., disturbances in normal sexual responding) have the potential to significantly and negatively affect men's social and psychological well-being. However, a review of published measurement tools indicates that most have limited applicability to gay men, and none offer a nuanced understanding of sexual difficulties, as experienced by members of this population. To address this omission, the Gay Male Sexual Difficulties Scale (GMSDS) was developed using a sequential mixed-methods approach. The 25-item GMSDS uses a 6-point frequency Likert-type response format and examines: difficulties with receptive and insertive anal intercourse (5 items each); erectile difficulties (4 items); foreskin difficulties (4 items); body embarrassment (4 items); and seminal fluid concerns (3 items). The measure's scale score dimensionality, assessed using both exploratory and confirmatory factor analyses, as well as scale score reliability and validity (e.g., known-groups and convergent) was tested and deemed to be satisfactory. Limitations of the current series of studies and directions for future research are discussed.
An Item Response Theory (IRT) analysis of the Short Inventory of Problems-Alcohol and Drugs (SIP-AD) among non-treatment seeking men-who-have-sex-with-men: evidence for a shortened 10-item SIP-AD.

PubMed

Hagman, Brett T; Kuerbis, Alexis N; Morgenstern, Jon; Bux, Donald A; Parsons, Jeffrey T; Heidinger, Bram E

2009-11-01

The Short Inventory of Problems-Alcohol and Drugs (SIP-AD) is a 15-item measure that assesses concurrently negative consequences associated with alcohol and illicit drug use. Current psychometric evaluation has been limited to classical test theory (CTT) statistics, and it has not been validated among non-treatment seeking men-who-have-sex-with-men (MSM). Methods from Item Response Theory (IRT) can improve upon CTT by providing an in-depth analysis of how each item performs across the underlying latent trait that it is purported to measure. The present study examined the psychometric properties of the SIP-AD using methods from both IRT and CTT among a non-treatment seeking MSM sample (N=469). Participants were recruited from the New York City area and were asked to participate in a series of studies examining club drug use. Results indicated that five items on the SIP-AD demonstrated poor item misfit or significant differential item functioning (DIF) across race/ethnicity and HIV status. These five items were dropped and two-parameter IRT analyses were conducted on the remaining 10 items, which indicated a restricted range of item location parameters (-.15 to -.99) plotted at the lower end of the latent negative consequences severity continuum, and reasonably high discrimination parameters (1.30 to 2.22). Additional CTT statistics were compared between the original 15-item SIP-AD and the refined 10-item SIP-AD and suggest that the differences were negligible with the refined 10-item SIP-AD indicating a high degree of reliability and validity. Findings suggest the SIP-AD can be shortened to 10 items and appears to be a non-biased reliable and valid measure among non-treatment seeking MSM.
Validation of the conceptual research utilization scale: an application of the standards for educational and psychological testing in healthcare

PubMed Central

2011-01-01

Background There is a lack of acceptable, reliable, and valid survey instruments to measure conceptual research utilization (CRU). In this study, we investigated the psychometric properties of a newly developed scale (the CRU Scale). Methods We used the Standards for Educational and Psychological Testing as a validation framework to assess four sources of validity evidence: content, response processes, internal structure, and relations to other variables. A panel of nine international research utilization experts performed a formal content validity assessment. To determine response process validity, we conducted a series of one-on-one scale administration sessions with 10 healthcare aides. Internal structure and relations to other variables validity was examined using CRU Scale response data from a sample of 707 healthcare aides working in 30 urban Canadian nursing homes. Principal components analysis and confirmatory factor analyses were conducted to determine internal structure. Relations to other variables were examined using: (1) bivariate correlations; (2) change in mean values of CRU with increasing levels of other kinds of research utilization; and (3) multivariate linear regression. Results Content validity index scores for the five items ranged from 0.55 to 1.00. The principal components analysis predicted a 5-item 1-factor model. This was inconsistent with the findings from the confirmatory factor analysis, which showed best fit for a 4-item 1-factor model. Bivariate associations between CRU and other kinds of research utilization were statistically significant (p < 0.01) for the latent CRU scale score and all five CRU items. The CRU scale score was also shown to be significant predictor of overall research utilization in multivariate linear regression. Conclusions The CRU scale showed acceptable initial psychometric properties with respect to responses from healthcare aides in nursing homes. Based on our validity, reliability, and acceptability analyses, we recommend using a reduced (four-item) version of the CRU scale to yield sound assessments of CRU by healthcare aides. Refinement to the wording of one item is also needed. Planned future research will include: latent scale scoring, identification of variables that predict and are outcomes to conceptual research use, and longitudinal work to determine CRU Scale sensitivity to change. PMID:21595888
Do impulsive individuals benefit more from food go/no-go training? Testing the role of inhibition capacity in the no-go devaluation effect.

PubMed

Chen, Zhang; Veling, Harm; Dijksterhuis, Ap; Holland, Rob W

2018-05-01

Not responding to food items in a go/no-go task can lead to devaluation of these food items, which may help people regulate their eating behavior. The Behavior Stimulus Interaction (BSI) theory explains this devaluation effect by assuming that inhibiting impulses triggered by appetitive foods elicits negative affect, which in turn devalues the food items. BSI theory further predicts that the devaluation effect will be stronger when food items are more appetitive and when individuals have low inhibition capacity. To test these hypotheses, we manipulated the appetitiveness of food items and measured individual inhibition capacity with the stop-signal task. Food items were consistently paired with either go or no-go cues, so that participants responded to go items and not to no-go items. Evaluations of these items were measured before and after go/no-go training. Across two preregistered experiments, we consistently found no-go foods were liked less after the training compared to both go foods and foods not used in the training. Unexpectedly, this devaluation effect occurred for both appetitive and less appetitive food items. Exploratory signal detection analyses suggest this latter finding might be explained by increased learning of stimulus-response contingencies for the less appetitive items when they are presented among appetitive items. Furthermore, the strength of devaluation did not consistently correlate with individual inhibition capacity, and Bayesian analyses combining data from both experiments provided moderate support for the null hypothesis. The current project demonstrated the devaluation effect induced by the go/no-go training, but failed to obtain further evidence for BSI theory. Since the devaluation effect was reliably obtained across experiments, the results do reinforce the notion that the go/no-go training is a promising tool to help people regulate their eating behavior. Copyright © 2017 Elsevier Ltd. All rights reserved.
INTRODUCTION TO PATIENT-REPORTED OUTCOME ITEM BANKS: ISSUES IN MINORITY AGING RESEARCH

PubMed Central

Templin, Thomas N; Hays, Ron D; Gershon, Richard C; Rothrock, Nan; Jones, Richard N; Teresi, Jeanne A; Stewart, Anita; Weech-Maldonado, Robert; Wallace, Steve

2014-01-01

In 2004 NIH awarded contracts to initiate the development of high quality psychological and neuropsychological outcome measures for improved assessment of health-related outcomes. The workshop introduced these measurement development initiatives, the measures created, and the NIH supported resource (Assessment Center) for internet or tablet-based test administration and scoring. Presentation covered: (a) item response theory (IRT) and assessment of test bias, (b) construction of item banks and computerized adaptive testing, and (c) the different ways in which qualitative analyses contribute to the definition of construct domains and the refinement of outcome constructs. The panel discussion included questions about representativeness of samples, and assessment of cultural bias. PMID:23570428
Immediate list recall as a measure of short-term episodic memory: insights from the serial position effect and item response theory.

PubMed

Gavett, Brandon E; Horwitz, Julie E

2012-03-01

The serial position effect shows that two interrelated cognitive processes underlie immediate recall of a supraspan word list. The current study used item response theory (IRT) methods to determine whether the serial position effect poses a threat to the construct validity of immediate list recall as a measure of verbal episodic memory. Archival data were obtained from a national sample of 4,212 volunteers aged 28-84 in the Midlife Development in the United States study. Telephone assessment yielded item-level data for a single immediate recall trial of the Rey Auditory Verbal Learning Test (RAVLT). Two parameter logistic IRT procedures were used to estimate item parameters and the Q(1) statistic was used to evaluate item fit. A two-dimensional model better fit the data than a unidimensional model, supporting the notion that list recall is influenced by two underlying cognitive processes. IRT analyses revealed that 4 of the 15 RAVLT items (1, 12, 14, and 15) were misfit (p < .05). Item characteristic curves for items 14 and 15 decreased monotonically, implying an inverse relationship between the ability level and the probability of recall. Elimination of the four misfit items provided better fit to the data and met necessary IRT assumptions. Performance on a supraspan list learning test is influenced by multiple cognitive abilities; failure to account for the serial position of words decreases the construct validity of the test as a measure of episodic memory and may provide misleading results. IRT methods can ameliorate these problems and improve construct validity.
Post Hoc Analyses of Anxiety Measures in Adult Patients With Generalized Anxiety Disorder Treated With Vilazodone

PubMed Central

Khan, Arif; Durgam, Suresh; Tang, Xiongwen; Ruth, Adam; Mathews, Maju; Gommoll, Carl P.

2016-01-01

Objective To investigate vilazodone, currently approved for major depressive disorder in adults, for generalized anxiety disorder (GAD). Method Three randomized, double-blind, placebo-controlled studies showing positive results for vilazodone (2,040 mg/d) in adult patients with GAD (DSM-IV-TR) were pooled for analyses; data were collected from June 2012 to March 2014. Post hoc outcomes in the pooled intent-to-treat population (n = 1,462) included mean change from baseline to week 8 in Hamilton Anxiety Rating Scale (HARS) total score, psychic and somatic anxiety subscale scores, and individual item scores; HARS response (≥ 50% total score improvement) and remission (total score ≤ 7) at week 8; and category shifts, defined as HARS item score ≥ 2 at baseline (moderate to very severe symptoms) and score of 0 at week 8 (no symptoms). Results The least squares mean difference was statistically significant for vilazodone versus placebo in change from baseline to week 8 in HARS total score (−1.83, P < .0001) and in psychic anxiety (−1.21, P < .0001) and somatic anxiety (−0.63, P < .01) subscale scores; differences from placebo were significant on 11 of 14 HARS items (P < .05). Response rates were higher with vilazodone than placebo (48% vs 39%, P < .001), as were remission rates (27% vs 21%, P < .01). The percentage of patients who shifted to no symptoms was significant for vilazodone on several items: anxious mood, tension, intellectual, depressed mood, somatic-muscular, somatic-sensory, cardiovascular, respiratory, and autonomic symptoms (P < .05). Conclusions Treatment with vilazodone versus placebo was effective in adult GAD patients, with significant differences between treatment groups found on both psychic and somatic HARS items. Trial Registration ClinicalTrials.gov identifiers: NCT01629966, NCT01766401, NCT01844115. PMID:27486544
Post Hoc Analyses of Anxiety Measures in Adult Patients With Generalized Anxiety Disorder Treated With Vilazodone.

PubMed

Khan, Arif; Durgam, Suresh; Tang, Xiongwen; Ruth, Adam; Mathews, Maju; Gommoll, Carl P

2016-01-01

To investigate vilazodone, currently approved for major depressive disorder in adults, for generalized anxiety disorder (GAD). Three randomized, double-blind, placebo-controlled studies showing positive results for vilazodone (2,040 mg/d) in adult patients with GAD (DSM-IV-TR) were pooled for analyses; data were collected from June 2012 to March 2014. Post hoc outcomes in the pooled intent-to-treat population (n = 1,462) included mean change from baseline to week 8 in Hamilton Anxiety Rating Scale (HARS) total score, psychic and somatic anxiety subscale scores, and individual item scores; HARS response (≥ 50% total score improvement) and remission (total score ≤ 7) at week 8; and category shifts, defined as HARS item score ≥ 2 at baseline (moderate to very severe symptoms) and score of 0 at week 8 (no symptoms). The least squares mean difference was statistically significant for vilazodone versus placebo in change from baseline to week 8 in HARS total score (-1.83, P < .0001) and in psychic anxiety (-1.21, P < .0001) and somatic anxiety (-0.63, P < .01) subscale scores; differences from placebo were significant on 11 of 14 HARS items (P < .05). Response rates were higher with vilazodone than placebo (48% vs 39%, P < .001), as were remission rates (27% vs 21%, P < .01). The percentage of patients who shifted to no symptoms was significant for vilazodone on several items: anxious mood, tension, intellectual, depressed mood, somatic-muscular, somatic-sensory, cardiovascular, respiratory, and autonomic symptoms (P < .05). Treatment with vilazodone versus placebo was effective in adult GAD patients, with significant differences between treatment groups found on both psychic and somatic HARS items. ClinicalTrials.gov identifiers: NCT01629966, NCT01766401, NCT01844115.
Further evaluation of leisure items in the attention condition of functional analyses.

PubMed

Roscoe, Eileen M; Carreau, Abbey; MacDonald, Jackie; Pence, Sacha T

2008-01-01

Research suggests that including leisure items in the attention condition of a functional analysis may produce engagement that masks sensitivity to attention. In this study, 4 individuals' initial functional analyses indicated that behavior was maintained by nonsocial variables (n = 3) or by attention (n = 1). A preference assessment was used to identify items for subsequent functional analyses. Four conditions were compared, attention with and without leisure items and control with and without leisure items. Following this, either high- or low-preference items were included in the attention condition. Problem behavior was more probable during the attention condition when no leisure items or low-preference items were included, and lower levels of problem behavior were observed during the attention condition when high-preference leisure items were included. These findings suggest how preferred items may hinder detection of behavioral function.
Santa Clara Strength of Religious Faith Questionnaire: Psychometric analysis in older adults

PubMed Central

Cummings, Jeremy P.; Carson, Cody S.; Shrestha, Srijana; Kunik, Mark E.; Armento, Maria E.; Stanley, Melinda A.; Amspoker, Amber B.

2014-01-01

Objectives To assist researchers and clinicians considering using the Santa Clara Strength of Religious Faith Questionnaire (SCSRFQ) with older-adult samples, the current study analyzed the psychometrics of SCSRFQ scores in two older-adult samples. Method Adults age 55 or older who had formerly participated in studies of cognitive-behavioral therapy for anxiety and/or depression were recruited to complete questionnaires. In Study 1 (N = 66), the authors assessed the relations between the SCSRFQ and other measures of religiousness/spirituality, mental health, and demographic variables, using bivariate correlations and nonparametric tests. In Study 2 (N = 223), the authors also conducted confirmatory and exploratory factor analyses of the SCSRFQ, as well as an Item Response Theory analysis. Results The SCSRFQ was moderately to highly positively correlated with all measures of religiousness/spirituality. Relations with mental health were weak and differed across samples. Ethnic minorities scored higher than White participants on the SCSRFQ, but only in Study 2. Factor analyses showed that a single-factor model fit the SCSRFQ best. According to Item Response Theory analysis, SCSRFQ items discriminated well between participants with low-to-moderate levels of the construct but provided little information at higher levels. Conclusion Although the SCSRFQ scores had adequate psychometric characteristics, the measure’s usefulness may be limited in samples of older adults. PMID:24892461
A Rasch measure of teachers' views of teacher-student relationships in the primary school.

PubMed

Leitao, Natalie; Waugh, Russell F

2012-01-01

This study investigated teacher-student relationships from the teachers' point of view at Perth metropolitan schools in Western Australia. The study identified three key social and emotional aspects that affect teacher-student relationships, namely, Connectedness, Availability and Communication. Data were collected by questionnaire (N = 139) with stem-items answered in three perspectives: (1) Idealistic: this is what I would like to happen; (2) Capability: this is what I am capable of; and (3) Behaviour: this is what actually happens, using four ordered response categories: not at all (score 1), some of the time (score 2), most of the time (score 3), and almost always (score 4). Data were analysed with a Rasch measurement model and a uni-dimensional, linear scale with 24 items, ordered from easy to hard, was created. The data were shown to be highly reliable, so that valid inferences could be made from the scale. The Person Separation Index (akin to a reliability index) was 0.93; there was good global teacher and item fit to the measurement model; there was good item fit; the targeting of the item difficulties against the teacher measures was good, and the response categories were answered consistently and logically. Teachers said that the ideal items were all easier than their corresponding capability items which were in turn easier than the behaviour items (where the items fitted the model), as conceptualized. The easiest ideal items were: I like this child and This child and I get along well together. The hardest ideal item (but still easy) was: I am available for this child. The easiest behaviour item (but still hard) was: This child and I get along well together. The hardest behaviour item (and very hard) was: I am interested to learn about this child's personal thoughts, feelings and experiences. The difficulties of the items supported the conceptual structure of the variable.
Development and initial evaluation of the SCI-FI/AT

PubMed Central

Jette, Alan M.; Slavin, Mary D.; Ni, Pengsheng; Kisala, Pamela A.; Tulsky, David S.; Heinemann, Allen W.; Charlifue, Susie; Tate, Denise G.; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve

2015-01-01

Objectives To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Design Cross sectional survey followed by computerized adaptive test (CAT) simulations. Setting Inpatient and community settings. Participants A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. Interventions None Main outcome measure SCI-FI/AT Results Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. Conclusion With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI. PMID:26010975
Development and initial evaluation of the SCI-FI/AT.

PubMed

Jette, Alan M; Slavin, Mary D; Ni, Pengsheng; Kisala, Pamela A; Tulsky, David S; Heinemann, Allen W; Charlifue, Susie; Tate, Denise G; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve

2015-05-01

To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Cross sectional survey followed by computerized adaptive test (CAT) simulations. Inpatient and community settings. A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. None SCI-FI/AT RESULTS: Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI.

Development of a questionnaire for assessing the childbirth experience (QACE).

PubMed

Carquillat, Pierre; Vendittelli, Françoise; Perneger, Thomas; Guittier, Marie-Julia

2017-08-30

Due to its potential impact on women's psychological health, assessing perceptions of their childbirth experience is important. The aim of this study was to develop a multidimensional self-reporting questionnaire to evaluate the childbirth experience. Factors influencing the childbirth experience were identified from a literature review and the results of a previous qualitative study. A total of 25 items were combined from existing instruments or were created de novo. A draft version was pilot tested for face validity with 30 women and submitted for evaluation of its construct validity to 477 primiparous women at one-month post-partum. The recruitment took place in two obstetric clinics from Swiss and French university hospitals. To evaluate the content validity, we compared item responses to general childbirth experience assessments on a numeric, 0 to 10 rating scale. We dichotomized two group assessment scores: "0 to 7" and "8 to 10". We performed an exploratory factor analysis to identify underlying dimensions. In total, 291 women completed the questionnaire (response rate = 61%). The responses to 22 items were statistically significant between the 0 to 7 and 8 to 10 groups for the general childbirth experience assessments. An exploratory factor analysis yielded four sub-scales, which were labelled "relationship with staff" (4 items), "emotional status" (3 items), "first moments with the new born," (3 items) and "feelings at one month postpartum" (3 items). All 4 scales had satisfactory internal consistency levels (alpha coefficients from 0.70 to 0.85). The full 25-item version can be used to analyse each item by itself, and the short 4-dimension version can be scored to summarize the general assessment of the childbirth experience. The Questionnaire for Assessing the Childbirth Experience (QACE) could be useful as a screening instrument to identify women with negative childbirth experiences. It can be used as both a research instrument in its short version and a questionnaire for use in clinical practice in its full version.
The Mindful Attention Awareness Scale: Further Examination of Dimensionality, Reliability, and Concurrent Validity Estimates.

PubMed

Osman, Augustine; Lamis, Dorian A; Bagge, Courtney L; Freedenthal, Stacey; Barnes, Sean M

2016-01-01

We examined the factor structure and psychometric properties of the Mindful Attention Awareness Scale (MAAS) in a sample of 810 undergraduate students. Using common exploratory factor analysis (EFA), we obtained evidence for a 1-factor solution (41.84% common variance). To confirm unidimensionality of the 15-item MAAS, we conducted a 1-factor confirmatory factor analysis (CFA). Results of the EFA and CFA, respectively, provided support for a unidimensional model. Using differential item functioning analysis methods within item response theory modeling (IRT-based DIF), we found that individuals with high and low levels of nonattachment responded similarly to the MAAS items. Following a detailed item analysis, we proposed a 5-item short version of the instrument and present descriptive statistics and composite score reliability for the short and full versions of the MAAS. Finally, correlation analyses showed that scores on the full and short versions of the MAAS were associated with measures assessing related constructs. The 5-item MAAS is as useful as the original MAAS in enhancing our understanding of the mindfulness construct.
Geriatric Anxiety Scale: item response theory analysis, differential item functioning, and creation of a ten-item short form (GAS-10).

PubMed

Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L

2015-07-01

The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
Reporting and methodological quality of meta-analyses in urological literature.

PubMed

Xia, Leilei; Xu, Jing; Guzzo, Thomas J

2017-01-01

To assess the overall quality of published urological meta-analyses and identify predictive factors for high quality. We systematically searched PubMed to identify meta-analyses published from January 1st, 2011 to December 31st, 2015 in 10 predetermined major paper-based urology journals. The characteristics of the included meta-analyses were collected, and their reporting and methodological qualities were assessed by the PRISMA checklist (27 items) and AMSTAR tool (11 items), respectively. Descriptive statistics were used for individual items as a measure of overall compliance, and PRISMA and AMSTAR scores were calculated as the sum of adequately reported domains. Logistic regression was used to identify predictive factors for high qualities. A total of 183 meta-analyses were included. The mean PRISMA and AMSTAR scores were 22.74 ± 2.04 and 7.57 ± 1.41, respectively. PRISMA item 5, protocol and registration, items 15 and 22, risk of bias across studies, items 16 and 23, additional analysis had less than 50% adherence. AMSTAR item 1, " a priori " design, item 5, list of studies and item 10, publication bias had less than 50% adherence. Logistic regression analyses showed that funding support and " a priori " design were associated with superior reporting quality, following PRISMA guideline and " a priori " design were associated with superior methodological quality. Reporting and methodological qualities of recently published meta-analyses in major paper-based urology journals are generally good. Further improvement could potentially be achieved by strictly adhering to PRISMA guideline and having " a priori " protocol.
A psychometric evaluation of the Arm Motor Ability Test.

PubMed

O'Dell, Michael W; Kim, Grace; Rivera, Lisa; Fieo, Robert; Christos, Paul; Polistena, Caitlin; Fitzgerald, Kerri; Gorga, Delia

2013-06-01

To further examine the psychometric properties of a 9-item version of the Arm Motor Ability Test (AMAT-9) in persons with stroke. Thirty-two community-dwelling persons > 6 months post-stroke undergoing robotics treatment (mean age = 56.0 years, time post-stroke = 4.1 years, National Institutes of Health Stroke Scale score = 4.1, and AMAT-9 score = 1.22). Construct validity (including Rasch analyses) used baseline data prior to treatment (n = 32). Standardized response mean was calculated for subjects completing the protocol (n = 29). The Wolf Motor Function Test (WMFT), Fugl-Meyer Assessment (FMA), Action Research Arm Test (ARAT), and Stroke Impact Scale (SIS) were also administered. Spearman-rank correlation coefficients between AMAT-9 and the WMFT, FMA, and ARAT were strong (0.78-0.79, all p < 0.001). The correlation between the AMAT-9 and SIS Hand Function sub-score was stronger than that between the AMAT-9 and the Communication sub-score (0.40, p = 0.025 and -0.16, p = 0.39, respectively). Rasch analyses provided evidence for an appropriate hierarchical structure of item difficulties, unidimensionality, and good reliability. The AMAT demonstrated a comparable standardized response mean of 0.98. The AMAT-9 is valid and responsive among subjects scoring in the lower range of the scale. It has the advantage of assessing function and by eliminating the standing item from the previous iteration, it may be more easily used with severely impaired patients.
Secondary School Students' Views of Inhibiting Factors in Seeking Counselling

ERIC Educational Resources Information Center

Chan, Stephanie; Quinn, Philip

2012-01-01

This study examines secondary school students' perceptions of inhibiting factors in seeking counselling. Responses to a questionnaire completed by 1346 secondary school students were analysed using quantitative and qualitative methods. Exploratory factor analysis highlighted that within 21 pre-defined inhibiting factors, items loaded strongly on…
Symbolic meanings of sex in relationships: Developing the Meanings of Sexual Behavior Inventory.

PubMed

Shaw, Amanda M; Rogge, Ronald D

2017-10-01

Consistent with symbolic interactionism and motivation research, the study explored the meanings of sexual behavior in romantic relationships in a sample of 3,003 online respondents. Starting with a pool of 104 respondent-generated items, Exploratory and Confirmatory Factor analyses in separate sample halves revealed a stable set of 9 dimensions within that item pool that formed 2 higher-order factors representing positive (to share pleasure, to bond, to de-stress, to energize the relationship, to learn more about each other) and negative (to manage conflict, as an incentive, to express anger, and to control partner) meanings of sexual behavior within relationships. Item Response Theory analyses helped select the 4-5 most effective items of each dimension for inclusion in the Meanings of Sexual Behavior Inventory (MoSBI). Generalizability analyses suggested that the MoSBI subscale scores continued to show high levels of internal consistency across a broad range of demographic subgroups (e.g., racial/ethnic groups, gay and lesbian respondents, and various levels of education). The MoSBI subscales demonstrated moderate and distinct patterns of association with a range of conceptual boundary scales (e.g., relationship and sexual satisfaction, emotional support, negative conflict behavior, and frequency of sexual behavior) suggesting that these scales represent novel relationship processes. Consistent with this, analyses in the 862 respondents completing a 2-month follow-up assessment suggested that the meanings of sexual behavior predicted residual change in relationship satisfaction, even after controlling for frequency of sexual behavior within the relationships. Implications are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Mayo-Portland adaptability inventory: comparing psychometrics in cerebrovascular accident to traumatic brain injury.

PubMed

Malec, James F; Kean, Jacob; Altman, Irwin M; Swick, Shannon

2012-12-01

(1) To evaluate the measurement reliability and construct validity of the Mayo-Portland Adaptability Inventory, 4th revision (MPAI-4) in a sample consisting exclusively of patients with cerebrovascular accident (CVA) using single parameter (Rasch) item-response methods; (2) to examine the differential item functioning (DIF) by sex within the CVA population; and (3) to examine DIF and differential test functioning (DTF) across traumatic brain injury (TBI) and CVA samples. Retrospective psychometric analysis of rating scale data. Home- and community-based brain injury rehabilitation program. Individuals post-CVA (n=861) and individuals with TBI (n=603). Not applicable. MPAI-4. Item data on admission to community-based rehabilitation were submitted to Rasch, DIF, and DTF analyses. The final calibration in the CVA sample revealed satisfactory reliability/separation for persons (.91/3.16) and items (1.00/23.64). DIF showed that items for pain, anger, audition, and memory were associated with higher levels of disability for CVA than TBI patients; whereas, self-care, mobility, and use of hands indicated greater overall disability for TBI patients. DTF analyses showed a high degree of association between the 2 sets of items (R=.92; R(2)=.85) and, at most, a 3.7 point difference in raw scores. The MPAI-4 demonstrates satisfactory psychometric properties for use with individuals with CVA applying for interdisciplinary posthospital rehabilitation. DIF reveals clinically meaningful differences between CVA and TBI groups that should be considered in results at the item and subscale level. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning.

PubMed

Kim, Kyong-Jee; Hwang, Jee-Young

2016-03-01

Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students' experience with ubiquitous testing and its impact on student learning. A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students' experiences of ubiquitous testing. The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings.
The Pieper-Zulkowski pressure ulcer knowledge test.

PubMed

Pieper, Barbara; Zulkowski, Karen

2014-09-01

To describe the development and initial testing of the Pieper-Zulkowski Pressure Ulcer Knowledge Test (PZ-PUKT). Cross-sectional, instrument testing. Hospital association pressure ulcer educational program conference. Pressure ulcer research and guidelines from the last 5 years were examined for test item content. The initial PZ-PUKT had 115 items; response options were "true," "false," and "don't know." Registered nurses (N = 108) were randomly divided into 2 groups to take either the 60 prevention/risk and staging items or the 55 wound description items. Analyses of these responses resulted in 72 items, which were administered in total to a second cohort of 98 nurses for reliability. Cronbach's α was .80 for the 72-item PZ-PUKT. Cronbach's α values for the subscales were as follows: staging, .67; wound description, .64; and prevention/risk, .56. The mean correct scores were as follows: total, 80%; prevention, 77%; staging, 86%; and wound description, 77%. Nurses with wound care certification scored significantly higher on the PZ-PUKT than did nurses with other clinical certifications or with nurses who lacked certification. The PZ-PUKT has updated content about pressure ulcer prevention/risk, staging, and wound description. Reliability values are highest for the total test. Further use of the instrument in diverse settings will add to reliability testing and may provide direction for determination of a passing cutoff score.
Construction of a memory battery for computerized administration, using item response theory.

PubMed

Ferreira, Aristides I; Almeida, Leandro S; Prieto, Gerardo

2012-10-01

In accordance with Item Response Theory, a computer memory battery with six tests was constructed for use in the Portuguese adult population. A factor analysis was conducted to assess the internal structure of the tests (N = 547 undergraduate students). According to the literature, several confirmatory factor models were evaluated. Results showed better fit of a model with two independent latent variables corresponding to verbal and non-verbal factors, reproducing the initial battery organization. Internal consistency reliability for the six tests were alpha = .72 to .89. IRT analyses (Rasch and partial credit models) yielded good Infit and Outfit measures and high precision for parameter estimation. The potential utility of these memory tasks for psychological research and practice willbe discussed.
Old-fashioned responses in an updating memory task.

PubMed

Ruiz, M; Elosúa, M R; Lechuga, M T

2005-07-01

Errors in a running memory task are analysed. Participants were presented with a variable-length list of items and were asked to report the last four items. It has been proposed (Morris & Jones, 1990) that this task requires two mechanisms: the temporal storage of the target set by the articulatory loop and its updating by the central executive. Two implicit assumptions in this proposal are (a) the preservation of serial order, and (b) participants' capacity to discard earlier items from the target subset as list presentation is running, and new items are appended. Order preservation within the updated target list and the inhibition of the outdated list items should imply a relatively higher rate of location errors for items from the medial positions of the target list and a lower rate of intrusion errors from the outdated and inhibited items from the pretarget positions. Contrary to these expectations, for both consonants (Experiment 1) and words (Experiment 2) we found recency effects and a relatively high rate of intrusions from the final pretarget positions, most of them from the very last. Similar effects were apparent with the embedded four-item lists for catch trials. These results are clearly at odds with the presumed updating by the central executive.
Testing whether the DSM-5 personality disorder trait model can be measured with a reduced set of items: An item response theory investigation of the Personality Inventory for DSM-5.

PubMed

Maples, Jessica L; Carter, Nathan T; Few, Lauren R; Crego, Cristina; Gore, Whitney L; Samuel, Douglas B; Williamson, Rachel L; Lynam, Donald R; Widiger, Thomas A; Markon, Kristian E; Krueger, Robert F; Miller, Joshua D

2015-12-01

The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) includes an alternative model of personality disorders (PDs) in Section III, consisting in part of a pathological personality trait model. To date, the 220-item Personality Inventory for DSM-5 (PID-5; Krueger, Derringer, Markon, Watson, & Skodol, 2012) is the only extant self-report instrument explicitly developed to measure this pathological trait model. The present study used item response theory-based analyses in a large sample (n = 1,417) to investigate whether a reduced set of 100 items could be identified from the PID-5 that could measure the 25 traits and 5 domains. This reduced set of PID-5 items was then tested in a community sample of adults currently receiving psychological treatment (n = 109). Across a wide range of criterion variables including NEO PI-R domains and facets, DSM-5 Section II PD scores, and externalizing and internalizing outcomes, the correlational profiles of the original and reduced versions of the PID-5 were nearly identical (rICC = .995). These results provide strong support for the hypothesis that an abbreviated set of PID-5 items can be used to reliably, validly, and efficiently assess these personality disorder traits. The ability to assess the DSM-5 Section III traits using only 100 items has important implications in that it suggests these traits could still be measured in settings in which assessment-related resources (e.g., time, compensation) are limited. (c) 2015 APA, all rights reserved).
The Effect of Repeaters on Equating

ERIC Educational Resources Information Center

Kim, HeeKyoung; Kolen, Michael J.

2010-01-01

Test equating might be affected by including in the equating analyses examinees who have taken the test previously. This study evaluated the effect of including such repeaters on Medical College Admission Test (MCAT) equating using a population invariance approach. Three-parameter logistic (3-PL) item response theory (IRT) true score and…
Dimensions of Acculturation in Native American College Students

ERIC Educational Resources Information Center

Reynolds, Amy L.; Sodano, Sandro M.; Ecklund, Timothy R.; Guyker, Wendy

2012-01-01

Exploratory and confirmatory factor analyses were applied to the responses of two respective independent samples of Native American college students on the Native American Acculturation Scale (NAAS). Three correlated dimensions were found to underlie NAAS items and these dimensions may also comprise a broader higher order dimension of Native…
Validity Study of the Thinking Styles Inventory

ERIC Educational Resources Information Center

Black, Anne C.; McCoach, D. Betsy

2008-01-01

This article examines the psychometric properties of the 104-item Thinking Styles Inventory (TSI; Sternberg & Wagner, 1992) using responses from 789 students from 4 high schools in Connecticut. Twelve of the 13 subscales identified in mental self-government (MSG) theory (Sternberg, 1988, 1997) were included in all analyses. Both subscale- and…
Reporting and methodological quality of meta-analyses in urological literature

PubMed Central

Xu, Jing

2017-01-01

Purpose To assess the overall quality of published urological meta-analyses and identify predictive factors for high quality. Materials and Methods We systematically searched PubMed to identify meta-analyses published from January 1st, 2011 to December 31st, 2015 in 10 predetermined major paper-based urology journals. The characteristics of the included meta-analyses were collected, and their reporting and methodological qualities were assessed by the PRISMA checklist (27 items) and AMSTAR tool (11 items), respectively. Descriptive statistics were used for individual items as a measure of overall compliance, and PRISMA and AMSTAR scores were calculated as the sum of adequately reported domains. Logistic regression was used to identify predictive factors for high qualities. Results A total of 183 meta-analyses were included. The mean PRISMA and AMSTAR scores were 22.74 ± 2.04 and 7.57 ± 1.41, respectively. PRISMA item 5, protocol and registration, items 15 and 22, risk of bias across studies, items 16 and 23, additional analysis had less than 50% adherence. AMSTAR item 1, “a priori” design, item 5, list of studies and item 10, publication bias had less than 50% adherence. Logistic regression analyses showed that funding support and “a priori” design were associated with superior reporting quality, following PRISMA guideline and “a priori” design were associated with superior methodological quality. Conclusions Reporting and methodological qualities of recently published meta-analyses in major paper-based urology journals are generally good. Further improvement could potentially be achieved by strictly adhering to PRISMA guideline and having “a priori” protocol. PMID:28439452
The Nature of Science Instrument-Elementary (NOSI-E): Using Rasch principles to develop a theoretically grounded scale to measure elementary student understanding of the nature of science

NASA Astrophysics Data System (ADS)

Peoples, Shelagh

The purpose of this study was to determine which of three competing models will provide, reliable, interpretable, and responsive measures of elementary students' understanding of the nature of science (NOS). The Nature of Science Instrument-Elementary (NOSI-E), a 28-item Rasch-based instrument, was used to assess students' NOS understanding. The NOS construct was conceptualized using five construct dimensions (Empirical, Inventive, Theory-laden, Certainty and Socially & Culturally Embedded). The competing models represent three internal models for the NOS construct. One postulate is that the NOS construct is unidimensional where one latent construct explains the relationship between the 28 items of the NOSI-E. Alternatively, the NOS construct is composed of five independent unidimensional constructs (the consecutive approach). Lastly, the NOS construct is multidimensional and composed of five inter-related but separate dimensions. A validity argument was developed that hypothesized that the internal structure of the NOS construct is best represented by the multidimensional Rasch model. Four sets of analyses were performed in which the three representations were compared. These analyses addressed five validity aspects (content, substantive, generalizability, structural and external) of construct validity. The vast body of evidence supported the claim that the NOS construct is composed of five separate but inter-related dimensions that is best represented by the multidimensional Rasch model. The results of the multidimensional analyses indicated that the items of the five subscales were of excellent technical quality, exhibited no differential item functioning (based on gender), had an item hierarchy that conformed to theoretical expectations; and together formed subscales of reasonable reliability (> 0.7 on each subscale) that were responsive to change in the construct. Theory-laden scores from the multidimensional model predicted students' science achievement with scores from all five NOS dimensions significantly predicting students' perceptions of the constructivist nature of their classroom learning environment. The NOSI-E instrument is a theoretically grounded scale that can measure elementary students' NOS understanding and appears suitable for use in science education research.
Development and validation of a patient-reported outcome measure for stroke patients.

PubMed

Luo, Yanhong; Yang, Jie; Zhang, Yanbo

2015-05-08

Family support and patient satisfaction with treatment are crucial for aiding in the recovery from stroke. However, current validated stroke-specific questionnaires may not adequately capture the impact of these two variables on patients undergoing clinical trials of new drugs. Therefore, the aim of this study was to develop and evaluate a new stroke patient-reported outcome measure (Stroke-PROM) instrument for capturing more comprehensive effects of stroke on patients participating in clinical trials of new drugs. A conceptual framework and a pool of items for the preliminary Stroke-PROM were generated by consulting the relevant literature and other questionnaires created in China and other countries, and interviewing 20 patients and 4 experts to ensure that all germane parameters were included. During the first item-selection phase, classical test theory and item response theory were applied to an initial scale completed by 133 patients with stroke. During the item-revaluation phase, classical test theory and item response theory were used again, this time with 475 patients with stroke and 104 healthy participants. During the scale assessment phase, confirmatory factor analysis was applied to the final scale of the Stroke-PROM using the same study population as in the second item-selection phase. Reliability, validity, responsiveness and feasibility of the final scale were tested. The final scale of Stroke-PROM contained 46 items describing four domains (physiology, psychology, society and treatment). These four domains were subdivided into 10 subdomains. Cronbach's α coefficients for the four domains ranged from 0.861 to 0.908. Confirmatory factor analysis supported the validity of the final scale, and the model fit index satisfied the criterion. Differences in the Stroke-PROM mean scores were significant between patients with stroke and healthy participants in nine subdomains (P < 0.001), indicating that the scale showed good responsiveness. The Stroke-PROM is a patient-reported outcome multidimensional questionnaire developed especially for clinical trials of new drugs and is focused on issues of family support and patient satisfaction with treatment. Extensive data analyses supported the validity, reliability and responsiveness of the Stroke-PROM.
"Don't know" responses to risk perception measures: implications for underserved populations.

PubMed

Waters, Erika A; Hay, Jennifer L; Orom, Heather; Kiviniemi, Marc T; Drake, Bettina F

2013-02-01

Risk perceptions are legitimate targets for behavioral interventions because they can motivate medical decisions and health behaviors. However, some survey respondents may not know (or may not indicate) their risk perceptions. The scope of "don't know" (DK) responding is unknown. Examine the prevalence and correlates of responding DK to items assessing perceived risk of colorectal cancer. Two nationally representative, population-based, cross-sectional surveys (2005 National Health Interview Survey [NHIS]; 2005 Health Information National Trends Survey [HINTS]), and one primary care clinic-based survey comprised of individuals from low-income communities. Analyses included 31,202 (NHIS), 1,937 (HINTS), and 769 (clinic) individuals. Five items assessed perceived risk of colorectal cancer. Four of the items differed in format and/or response scale: comparative risk (NHIS, HINTS); absolute risk (HINTS, clinic), and "likelihood" and "chance" response scales (clinic). Only the clinic-based survey included an explicit DK response option. "Don't know" responding was 6.9% (NHIS), 7.5% (HINTS-comparative), and 8.7% (HINTS-absolute). "Don't know" responding was 49.1% and 69.3% for the "chance" and "likely" response options (clinic). Correlates of DK responding were characteristics generally associated with disparities (e.g., low education), but the pattern of results varied among samples, question formats, and response scales. The surveys were developed independently and employed different methodologies and items. Consequently, the results were not directly comparable. There may be multiple explanations for differences in the magnitude and characteristics of DK responding. "Don't know" responding is more prevalent in populations affected by health disparities. Either not assessing or not analyzing DK responses could further disenfranchise these populations and negatively affect the validity of research and the efficacy of interventions seeking to eliminate health disparities.

A symptom profile of depression among Asian Americans: is there evidence for differential item functioning of depressive symptoms?

PubMed

Kalibatseva, Z; Leong, F T L; Ham, E H

2014-09-01

Theoretical and clinical publications suggest the existence of cultural differences in the expression and experience of depression. Measurement non-equivalence remains a potential methodological explanation for the lower prevalence of depression among Asian Americans compared to European Americans. This study compared DSM-IV depressive symptoms among Asian Americans and European Americans using secondary data analysis of the Collaborative Psychiatric Epidemiology Surveys (CPES). The Composite International Diagnostic Interview (CIDI) was used for the assessment of depressive symptoms. Of the entire sample, 310 Asian Americans and 1974 European Americans reported depressive symptoms and were included in the analyses. Measurement variance was examined with an item response theory differential item functioning (IRT DIF) analysis. χ2 analyses indicated that, compared to Asian Americans, European American participants more frequently endorsed affective symptoms such as 'feeling depressed', 'feeling discouraged' and 'cried more often'. The IRT analysis detected DIF for four out of the 15 depression symptom items. At equal levels of depression, Asian Americans endorsed feeling worthless and appetite changes more easily than European Americans, and European Americans endorsed feeling nervous and crying more often than Asian Americans. Asian Americans did not seem to over-report somatic symptoms; however, European Americans seemed to report more affective symptoms than Asian Americans. The results suggest that there was measurement variance in a few of the depression items.
Psychometric assessment of the Behavior and Attitudes Questionnaire for Healthy Habits: measuring parents' views on food and physical activity.

PubMed

Henry, Beverly W; Smith, Thomas J; Ahmad, Saadia

2014-05-01

To assess parents' perspectives of their home environments to establish the validity of scores from the Behavior and Attitudes Questionnaire for Healthy Habits (BAQ-HH). In the present descriptive study, we surveyed a cross-sectional sample of parents of pre-school children. Questionnaire items developed in an iterative process with community-based programming addressed parents' knowledge/awareness, attitudes/concerns and behaviours about healthy foods and physical activity habits with 6-point rating scales. Exploratory and confirmatory factor analyses were used to psychometrically evaluate scores from the scales. English and Spanish versions of the BAQ-HH were administered at parent-teacher conferences for pre-school children at ten Head Start centres across a five-county agency in autumn 2010. From 672 families with pre-school children, 532 parents provided responses to the BAQ-HH (79 % response rate). The majority was female (83 %), Hispanic (66 %) or white (16 %), and ages ranged from 20 to 39 years (85 %). Exploratory and confirmatory analyses revealed a knowledge scale (seven items), an attitude scale (four items) and three behaviour subscales (three items each). Correlations were identified between parents' perceptions of home activities and reports of children's habits. Differences were identified by gender and ethnicity groupings. As a first step in psychometric testing, the dimensionality of each of the three scales (Knowledge, Attitudes and Behaviours) was identified and scale scores were related to other indicators of child behaviours and parents' demographic characteristics. This questionnaire offers a method to measure parents' views to inform planning and monitoring of obesity-prevention education programmes.
The impact of intrinsic and extrinsic factors on the job satisfaction of dentists.

PubMed

Goetz, K; Campbell, S M; Broge, B; Dörfer, C E; Brodowski, M; Szecsenyi, J

2012-10-01

The Two-Factor Theory of job satisfaction distinguishes between intrinsic-motivation (i.e. recognition, responsibility) and extrinsic-hygiene (i.e. job security, salary, working conditions) factors. The presence of intrinsic-motivation facilitates higher satisfaction and performance, whereas the absences of extrinsic factors help mitigate against dissatisfaction. The consideration of these factors and their impact on dentists' job satisfaction is essential for the recruitment and retention of dentists. The objective of the study is to assess the level of job satisfaction of German dentists and the factors that are associated with it. This cross-sectional study was based on a job satisfaction survey. Data were collected from 147 dentists working in 106 dental practices. Job satisfaction was measured with the 10-item Warr-Cook-Wall job satisfaction scale. Organizational characteristics were measured with two items. Linear regression analyses were performed in which each of the nine items of the job satisfaction scale (excluding overall satisfaction) were handled as dependent variables. A stepwise linear regression analysis was performed with overall job satisfaction as the dependent outcome variable, the nine items of job satisfaction and the two items of organizational characteristics controlled for age and gender as predictors. The response rate was 95.0%. Dentists were satisfied with 'freedom of working method' and mostly dissatisfied with their 'income'. Both variables are extrinsic factors. The regression analyses identified five items that were significantly associated with each item of the job satisfaction scale: 'age', 'mean weekly working time', 'period in the practice', 'number of dentist's assistant' and 'working atmosphere'. Within the stepwise linear regression analysis the intrinsic factor 'opportunity to use abilities' (β = 0.687) showed the highest score of explained variance (R(2) = 0.468) regarding overall job satisfaction. With respect to the Two-Factor Theory of job satisfaction both components, intrinsic and extrinsic, are essential for dentists but the presence of intrinsic motivating factors like the opportunity to use abilities has most positive impact on job satisfaction. The findings of this study will be helpful for further activities to improve the working conditions of dentists and to ensure quality of care. © 2012 John Wiley & Sons A/S.
Responses to Three USARIEM Job Analysis Questionnaires (JAQs) Conducted with Cavalry Scouts and Armor Crewmen (MOSs 19D and 19K)

DTIC Science & Technology

2016-11-18

researchers from the U.S. Army Research Institute of Environmental Medicine (USARIEM) designed and conducted a total of three web - administered job...USARIEM) and Human Performance Systems, Inc. designed three web -administered job analyses questionnaires JAQs to be completed by Army cavalry scouts and...responses from Soldiers in many Army MOSs. This may have affected the quality of some item responses. 3) This survey was web -administered, and
Examining the Measurement Precision and Invariance of the Revised Get Ready to Read!

PubMed Central

Farrington, Amber L.; Lonigan, Christopher J.

2016-01-01

Children's emergent literacy skills are highly predictive of later reading abilities. To determine which children have weaker emergent literacy skills and are in need of intervention, it is necessary to assess emergent literacy skills accurately and reliably. In this study, 1,351 children were administered the Revised Get Ready to Read! (GRTR-R), and an item response theory analysis was used to evaluate the item-level reliability of the measure. Differential item functioning (DIF) analyses were conducted to examine whether items function similarly between subpopulations of children. The GRTR-R had acceptable reliability for children whose ability level was just below the mean. DIF for a small number of items was present for only two comparisons—children who were older versus younger and children who were White versus African American. These results demonstrate that the GRTR-R has acceptable reliability and limited DIF, enabling the screener to identify those at risk for developing reading problems. PMID:23851136
The (mis)measurement of the Dark Triad Dirty Dozen: exploitation at the core of the scale

PubMed Central

Kajonius, Petri J.; Persson, Björn N.; Rosenberg, Patricia

2016-01-01

Background. The dark side of human character has been conceptualized in the Dark Triad Model: Machiavellianism, psychopathy, and narcissism. These three dark traits are often measured using single long instruments for each one of the traits. Nevertheless, there is a necessity of short and valid personality measures in psychological research. As an independent research group, we replicated the factor structure, convergent validity and item response for one of the most recent and widely used short measures to operationalize these malevolent traits, namely, Jonason’s Dark Triad Dirty Dozen. We aimed to expand the understanding of what the Dirty Dozen really captures because the mixed results on construct validity in previous research. Method. We used the largest sample to date to respond to the Dirty Dozen (N = 3,698). We firstly investigated the factor structure using Confirmatory Factor Analysis and an exploratory distribution analysis of the items in the Dirty Dozen. Secondly, using a sub-sample (n = 500) and correlation analyses, we investigated the Dirty Dozen dark traits convergent validity to Machiavellianism measured by the Mach-IV, psychopathy measured by Eysenck’s Personality Questionnaire Revised, narcissism using the Narcissism Personality Inventory, and both neuroticism and extraversion from the Eysenck’s questionnaire. Finally, besides these Classic Test Theory analyses, we analyzed the responses for each Dirty Dozen item using Item Response Theory (IRT). Results. The results confirmed previous findings of a bi-factor model fit: one latent core dark trait and three dark traits. All three Dirty Dozen traits had a striking bi-modal distribution, which might indicate unconcealed social undesirability with the items. The three Dirty Dozen traits did converge too, although not strongly, with the contiguous single Dark Triad scales (r between .41 and .49). The probabilities of filling out steps on the Dirty Dozen narcissism-items were much higher than on the Dirty Dozen items for Machiavellianism and psychopathy. Overall, the Dirty Dozen instrument delivered the most predictive value with persons with average and high Dark Triad traits (theta > −0.5). Moreover, the Dirty Dozen scale was better conceptualized as a combined Machiavellianism-psychopathy factor, not narcissism, and is well captured with item 4: ‘I tend to exploit others towards my own end.’ Conclusion. The Dirty Dozen showed a consistent factor structure, a relatively convergent validity similar to that found in earlier studies. Narcissism measured using the Dirty Dozen, however, did not contribute with information to the core of the Dirty Dozen construct. More importantly, the results imply that the core of the Dirty Dozen scale, a manipulative and anti-social trait, can be measured by a Single Item Dirty Dark Dyad (SIDDD). PMID:26966673
The (mis)measurement of the Dark Triad Dirty Dozen: exploitation at the core of the scale.

PubMed

Kajonius, Petri J; Persson, Björn N; Rosenberg, Patricia; Garcia, Danilo

2016-01-01

Background. The dark side of human character has been conceptualized in the Dark Triad Model: Machiavellianism, psychopathy, and narcissism. These three dark traits are often measured using single long instruments for each one of the traits. Nevertheless, there is a necessity of short and valid personality measures in psychological research. As an independent research group, we replicated the factor structure, convergent validity and item response for one of the most recent and widely used short measures to operationalize these malevolent traits, namely, Jonason's Dark Triad Dirty Dozen. We aimed to expand the understanding of what the Dirty Dozen really captures because the mixed results on construct validity in previous research. Method. We used the largest sample to date to respond to the Dirty Dozen (N = 3,698). We firstly investigated the factor structure using Confirmatory Factor Analysis and an exploratory distribution analysis of the items in the Dirty Dozen. Secondly, using a sub-sample (n = 500) and correlation analyses, we investigated the Dirty Dozen dark traits convergent validity to Machiavellianism measured by the Mach-IV, psychopathy measured by Eysenck's Personality Questionnaire Revised, narcissism using the Narcissism Personality Inventory, and both neuroticism and extraversion from the Eysenck's questionnaire. Finally, besides these Classic Test Theory analyses, we analyzed the responses for each Dirty Dozen item using Item Response Theory (IRT). Results. The results confirmed previous findings of a bi-factor model fit: one latent core dark trait and three dark traits. All three Dirty Dozen traits had a striking bi-modal distribution, which might indicate unconcealed social undesirability with the items. The three Dirty Dozen traits did converge too, although not strongly, with the contiguous single Dark Triad scales (r between .41 and .49). The probabilities of filling out steps on the Dirty Dozen narcissism-items were much higher than on the Dirty Dozen items for Machiavellianism and psychopathy. Overall, the Dirty Dozen instrument delivered the most predictive value with persons with average and high Dark Triad traits (theta > -0.5). Moreover, the Dirty Dozen scale was better conceptualized as a combined Machiavellianism-psychopathy factor, not narcissism, and is well captured with item 4: 'I tend to exploit others towards my own end.' Conclusion. The Dirty Dozen showed a consistent factor structure, a relatively convergent validity similar to that found in earlier studies. Narcissism measured using the Dirty Dozen, however, did not contribute with information to the core of the Dirty Dozen construct. More importantly, the results imply that the core of the Dirty Dozen scale, a manipulative and anti-social trait, can be measured by a Single Item Dirty Dark Dyad (SIDDD).
More to it than meets the eye: how eye movements can elucidate the development of episodic memory.

PubMed

Pathman, Thanujeni; Ghetti, Simona

2016-07-01

The ability to recognise past events along with the contexts in which they occurred is a hallmark of episodic memory, a critical capacity. Eye movements have been shown to track veridical memory for the associations between events and their contexts (relational binding). Such eye-movement effects emerge several seconds before, or in the absence of, explicit response, and are linked to the integrity and function of the hippocampus. Drawing from research from infancy through late childhood, and by comparing to investigations from typical adults, patient populations, and animal models, it seems increasingly clear that eye movements reflect item-item, item-temporal, and item-spatial associations in developmental populations. We analyse this line of work, identify missing pieces in the literature and outline future avenues of research, in order to help elucidate the development of episodic memory.
Clinical vs. Self-report Versions of the Quick Inventory of Depressive Symptomatology in a Public Sector Sample

PubMed Central

Bernstein, Ira H.; Rush, A. John; Carmody, Thomas J.; Woo, Ada; Trivedi, Madhukar H.

2007-01-01

Objectives Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR16) and clinician-rated (QIDS-C16) versions of the 16-item Quick Inventory of Depressive Symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. Methods The QIDS-SR16 and QIDS-C16 were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR16 and QIDS-C16. Results The nine symptom domains in the QIDS-SR16 and QIDS-C16 related well to overall depression. The slopes of the item response functions a), which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, bi) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C16 and QIDS-SR16. Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. Conclusion In this less educated, socially disadvantaged sample, differences between the QIDS-C16 and QIDS-SR16 were minor. The QIDS-SR16 is a satisfactory substitute for the more time-consuming QIDS-C16 in a broad range of adult, nonpsychotic, depressed outpatients. PMID:16716351
Clinical vs. self-report versions of the quick inventory of depressive symptomatology in a public sector sample.

PubMed

Bernstein, Ira H; Rush, A John; Carmody, Thomas J; Woo, Ada; Trivedi, Madhukar H

2007-01-01

Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR(16)) and clinician-rated (QIDS-C(16)) versions of the 16-item quick inventory of depressive symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. The QIDS-SR(16) and QIDS-C(16) were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR(16) and QIDS-C(16). The nine symptom domains in the QIDS-SR(16) and QIDS-C(16) related well to overall depression. The slopes of the item response functions, a, which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, b(i) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C(16) and QIDS-SR(16). Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. In this less educated, socially disadvantaged sample, differences between the QIDS-C(16) and QIDS-SR(16) were minor. The QIDS-SR(16) is a satisfactory substitute for the more time-consuming QIDS-C(16) in a broad range of adult, nonpsychotic, depressed outpatients.
Cross-cultural adaptation of the Work Role Functioning Questionnaire 2.0 to Norwegian and Danish.

PubMed

Johansen, Thomas; Lund, Thomas; Jensen, Chris; Momsen, Anne-Mette Hedeager; Eftedal, Monica; Øyeflaten, Irene; Braathen, Tore N; Stapelfeldt, Christina M; Amick, Ben; Labriola, Merete

2018-01-01

A healthy and productive working life has attracted attention owing to future employment and demographic challenges. The aim was to translate and adapt the Work Role Functioning Questionnaire (WRFQ) 2.0 to Norwegian and Danish. The WRFQ is a self-administered tool developed to identify health-related work limitations. Standardised cross-cultural adaptation procedures were followed in both countries' translation processes. Direct translation, synthesis, back translation and consolidation were carried out successfully. A pre-test among 78 employees who had returned to work after sickness absence found idiomatic issues requiring reformulation in the instructions, four items in the Norwegian version, and three items in the Danish version, respectively. In the final versions, seven items were adjusted in each country. Psychometric properties were analysed for the Norwegian sample (n = 40) and preliminary Cronbach's alpha coefficients were satisfactory. A final consensus process was performed to achieve similar titles and introductions. The WRFQ 2.0 cross-cultural adaptation to Norwegian and Danish was performed and consensus was obtained. Future validation studies will examine validity, reliability, responsiveness and differential item response. The WRFQ can be used to elucidate both individual and work environmental factors leading to a more holistic approach in work rehabilitation.
Testing measurement invariance of the patient-reported outcomes measurement information system pain behaviors score between the US general population sample and a sample of individuals with chronic pain.

PubMed

Chung, Hyewon; Kim, Jiseon; Cook, Karon F; Askew, Robert L; Revicki, Dennis A; Amtmann, Dagmar

2014-02-01

In order to test the difference between group means, the construct measured must have the same meaning for all groups under investigation. This study examined the measurement invariance of responses to the patient-reported outcomes measurement information system (PROMIS) pain behavior (PB) item bank in two samples: the PROMIS calibration sample (Wave 1, N = 426) and a sample recruited from the American Chronic Pain Association (ACPA, N = 750). The ACPA data were collected to increase the number of participants with higher levels of pain. Multi-group confirmatory factor analysis (MG-CFA) and two item response theory (IRT)-based differential item functioning (DIF) approaches were employed to evaluate the existence of measurement invariance. MG-CFA results supported metric invariance of the PROMIS-PB, indicating unstandardized factor loadings with equal across samples. DIF analyses revealed that impact of 6 DIF items was negligible. Based on the results of both MG-CFA and IRT-based DIF approaches, we recommend retaining the original parameter estimates obtained from the combined samples based on the results of MG-CFA.
African media coverage of tobacco industry corporate social responsibility initiatives.

PubMed

McDaniel, Patricia A; Cadman, Brie; Malone, Ruth E

2018-02-01

Guidelines for implementing the World Health Organization's Framework Convention on Tobacco Control (FCTC) recommend prohibiting tobacco industry corporate social responsibility (CSR) initiatives, but few African countries have done so. We examined African media coverage of tobacco industry CSR initiatives to understand whether and how such initiatives were presented to the public and policymakers. We searched two online media databases (Lexis Nexis and Access World News) for all news items published from 1998 to 2013, coding retrieved items through a collaborative, iterative process. We analysed the volume, type, provenance, slant and content of coverage, including the presence of tobacco control or tobacco interest themes. We found 288 news items; most were news stories published in print newspapers. The majority of news stories relied solely on tobacco industry representatives as news sources, and portrayed tobacco industry CSR positively. When public health voices and tobacco control themes were included, news items were less likely to have a positive slant. This suggests that there is a foundation on which to build media advocacy efforts. Drawing links between implementing the FCTC and prohibiting or curtailing tobacco industry CSR programmes may result in more public dialogue in the media about the negative impacts of tobacco company CSR initiatives.
Guideline appraisal with AGREE II: online survey of the potential influence of AGREE II items on overall assessment of guideline quality and recommendation for use.

PubMed

Hoffmann-Eßer, Wiebke; Siering, Ulrich; Neugebauer, Edmund A M; Brockhaus, Anne Catharina; McGauran, Natalie; Eikermann, Michaela

2018-02-27

The AGREE II instrument is the most commonly used guideline appraisal tool. It includes 23 appraisal criteria (items) organized within six domains. AGREE II also includes two overall assessments (overall guideline quality, recommendation for use). Our aim was to investigate how strongly the 23 AGREE II items influence the two overall assessments. An online survey of authors of publications on guideline appraisals with AGREE II and guideline users from a German scientific network was conducted between 10th February 2015 and 30th March 2015. Participants were asked to rate the influence of the AGREE II items on a Likert scale (0 = no influence to 5 = very strong influence). The frequencies of responses and their dispersion were presented descriptively. Fifty-eight of the 376 persons contacted (15.4%) participated in the survey and the data of the 51 respondents with prior knowledge of AGREE II were analysed. Items 7-12 of Domain 3 (rigour of development) and both items of Domain 6 (editorial independence) had the strongest influence on the two overall assessments. In addition, Items 15-17 (clarity of presentation) had a strong influence on the recommendation for use. Great variations were shown for the other items. The main limitation of the survey is the low response rate. In guideline appraisals using AGREE II, items representing rigour of guideline development and editorial independence seem to have the strongest influence on the two overall assessments. In order to ensure a transparent approach to reaching the overall assessments, we suggest the inclusion of a recommendation in the AGREE II user manual on how to consider item and domain scores. For instance, the manual could include an a-priori weighting of those items and domains that should have the strongest influence on the two overall assessments. The relevance of these assessments within AGREE II could thereby be further specified.
International field testing of the psychometric properties of an EORTC quality of life module for oral health: the EORTC QLQ-OH15.

PubMed

Hjermstad, Marianne J; Bergenmar, Mia; Bjordal, Kristin; Fisher, Sheila E; Hofmeister, Dirk; Montel, Sébastien; Nicolatou-Galitis, Ourania; Pinto, Monica; Raber-Durlacher, Judith; Singer, Susanne; Tomaszewska, Iwona M; Tomaszewski, Krzysztof A; Verdonck-de Leeuw, Irma; Yarom, Noam; Winstanley, Julie B; Herlofson, Bente B

2016-09-01

This international EORTC validation study (phase IV) is aimed at testing the psychometric properties of a quality of life (QoL) module related to oral health problems in cancer patients. The phase III module comprised 17 items with four hypothesized multi-item scales and three single items. In phase IV, patients with mixed cancers, in different treatment phases from 10 countries completed the EORTC QLQ-C30, the QLQ-OH module, and a debriefing interview. The hypothesized structure was tested using combinations of classical test theory and item response theory, following EORTC guidelines. Test-retest assessments and responsiveness to change analysis (RCA) were performed after 2 weeks. Five hundred seventy-two patients (median age 60.3, 54 % females) were analyzed. Completion took <10 min for 84 %, 40 % expressed satisfaction that these issues were addressed. Analyses suggested a revision of the phase III hypothesized scale structure. Two items were deleted based on a high degree of item misfit, together with negative patient feedback. The remaining 15 items formed one eight-item scale named OH-QoL score, a two-item information scale, a two-item scale regarding dentures, and three single items (sticky saliva/mouth soreness/sensitivity to food/drink). Face and convergent validity and internal consistency were confirmed. Test-retest reliability (n = 60) was demonstrated as was RCA for patients undergoing chemotherapy (n = 117; p = 0.06). The resulting QLQ-OH15 discriminated between clinically distinct patient groups, e.g., low performance status vs. higher (p < 000.1), and head-and-neck cancer versus other cancers (p < 0.03). The EORTC module QLQ-OH15 is a short, well-accepted assessment tool focusing on oral problems and QoL to improve clinical management. ClinicalTrials.gov Identifier: NCT01724333.
Rasch analysis of the Chedoke-McMaster Attitudes towards Children with Handicaps scale.

PubMed

Armstrong, Megan; Morris, Christopher; Tarrant, Mark; Abraham, Charles; Horton, Mike C

2017-02-01

Aim To assess whether the Chedoke-McMaster Attitudes towards Children with Handicaps (CATCH) 36-item total scale and subscales fit the unidimensional Rasch model. Method The CATCH was administered to 1881 children, aged 7-16 years in a cross-sectional survey. Data were used from a random sample of 416 for the initial Rasch analysis. The analysis was performed on the 36-item scale and then separately for each subscale. The analysis explored fit to the Rasch model in terms of overall scale fit, individual item fit, item response categories, and unidimensionality. Item bias for gender and school level was also assessed. Revised scales were then tested on an independent second random sample of 415 children. Results Analyses indicated that the 36-item overall scale was not unidimensional and did not fit the Rasch model. Two scales of affective attitudes and behavioural intention were retained after four items were removed from each due to misfit to the Rasch model. Additionally, the scaling was improved when the two most negative response categories were aggregated. There was no item bias by gender or school level on the revised scales. Items assessing cognitive attitudes did not fit the Rasch model and had low internal consistency as a scale. Conclusion Affective attitudes and behavioural intention CATCH sub-scales should be treated separately. Caution should be exercised when using the cognitive subscale. Implications for Rehabilitation The 36-item Chedoke-McMaster Attitudes towards Children with Handicaps (CATCH) scale as a whole did not fit the Rasch model; thus indicating a multi-dimensional scale. Researchers should use two revised eight-item subscales of affective attitudes and behavioural intentions when exploring interventions aiming to improve children's attitudes towards disabled people or factors associated with those attitudes. Researchers should use the cognitive subscale with caution, as it did not create a unidimensional and internally consistent scale. Therefore, conclusions drawn from this scale may not accurately reflect children's attitudes.
A New Approach to Response Sets in Analysis of a Test of Motivation to Achieve. A Section of the Final Report for 1969-70.

ERIC Educational Resources Information Center

Adkins, Dorothy C.; Ballif, Bonnie L.

Gumpgookies, an objective-projective test of school achievement motivation for children 3 1/2 to 8 year, was reduced from 100 to 75 items following extensive factor analyses. This revised test attempted to dissipate the effects of response sets of the subjects and was prepared in three versions--an individual form, a group form for non-readers,…
Development of the Abbreviated Masculine Gender Role Stress Scale

PubMed Central

Swartout, Kevin M.; Parrott, Dominic J.; Cohn, Amy M.; Hagman, Brett T.; Gallagher, Kathryn E.

2014-01-01

Data gathered from six independent samples (n = 1,729) that assessed men’s masculine gender role stress in college and community males were aggregated used to determine the reliability and validity of an abbreviated version of the Masculine Gender Role Stress Scale (MGRS scale). The 15 items with the highest item-to-total scale correlations were used to create an abbreviated MGRS scale. Psychometric properties of each of the 15-items were examined with Item Response Theory (IRT) analysis, using the discrimination and threshold parameters. IRT results showed that the abbreviated scale may hold promise at capturing the same amount of information as the full 40-item scale. Relative to the 40-item scale, the total score of the abbreviated MGRS scale demonstrated comparable convergent validity using the measurement domains of masculine identity, hyper-masculinity, trait anger, anger expression, and alcohol involvement. An abbreviated MGRS scale may be recommended for use in clinical practice and research settings to reduce cost, time, and patient/participant burden. Additionally, IRT analyses identified items with higher discrimination and threshold parameters that may be used to screen for problematic gender role stress in men who may be seen in routine clinical or medical practice. PMID:25528163
Development of the Abbreviated Masculine Gender Role Stress Scale.

PubMed

Swartout, Kevin M; Parrott, Dominic J; Cohn, Amy M; Hagman, Brett T; Gallagher, Kathryn E

2015-06-01

Data gathered from 6 independent samples (n = 1,729) that assessed men's masculine gender role stress in college and community males were aggregated used to determine the reliability and validity of an abbreviated version of the Masculine Gender Role Stress (MGRS) Scale. The 15 items with the highest item-to-total scale correlations were used to create an abbreviated MGRS Scale. Psychometric properties of each of the 15 items were examined with item response theory (IRT) analysis, using the discrimination and threshold parameters. IRT results showed that the abbreviated scale may hold promise at capturing the same amount of information as the full 40-item scale. Relative to the 40-item scale, the total score of the abbreviated MGRS Scale demonstrated comparable convergent validity using the measurement domains of masculine identity, hypermasculinity, trait anger, anger expression, and alcohol involvement. An abbreviated MGRS Scale may be recommended for use in clinical practice and research settings to reduce cost, time, and patient/participant burden. Additionally, IRT analyses identified items with higher discrimination and threshold parameters that may be used to screen for problematic gender role stress in men who may be seen in routine clinical or medical practice. (c) 2015 APA, all rights reserved).
[Instruments for evaluating oral health knowledge, attitudes and practice for parents /caregivers of small children].

PubMed

Martignon, Stefania; Bautista-Mendoza, Gloria; González-Carrera, María; Lafaurie-Villamil, Gloria; Morales, Veicy; Santamaría, Ruth

2008-01-01

Designing three instruments for evaluating oral health knowledge, attitudes and practice in parents/caregivers of low social-economic status 0-5 year-olds. Evaluating the instruments' reliability in terms of internal consistency and analysing items. Three instruments were constructed for evaluating low social-economic status 0-5 year-olds' parents/caregivers' oral health knowledge, attitudes and practice in the municipality of Usaquén , Bogotá , Colombia . 47 parents/caregivers were given a test establishing the instrument's reliability in terms of internal consistency and the adults' level of knowledge, attitudes and practice. A sub-sample was qualitatively analysed (content verification and understanding). Reliability was evaluated using Cronbach's alpha coefficient. Items were analysed for improving constructing and understanding the questions, taking four criteria into account: corrected homogeneity index (CHI), response trend, correlation between items and qualitative analysis. Cronbach's alpha coefficient for knowledge, attitudes and practice was 0,82, 0,80 and 0,62, respectively. Participants' level of knowledge, attitudes and practice was acceptable (60 %, 55 % and 91 %, respectively). This study found two out of the three evaluated instruments to be reliable (knowledge and attitudes); all three of them were then redesigned. The resulting instruments represent a valuable tool which can be used in future studies for describing and evaluating preventative programmes.

Associations between Prior Disability-Focused Training and Disability-Related Attitudes and Perceptions among University Faculty

ERIC Educational Resources Information Center

Murray, Christopher; Lombardi, Allison; Wren, Carol T.; Keys, Christopher

2009-01-01

This investigation examined the relationship between prior disability-focused training and university faculty members' attitudes towards students with learning disabilities (LD). A survey containing items designed to measure faculty attitudes was sent to all full-time faculty at one university. Analyses of 198 responses indicated that faculty who…
Revisiting the Factor Structure of the Strengths and Difficulties Questionnaire: United States, 2001.

ERIC Educational Resources Information Center

Dickey, Wayne C.; Blumberg, Stephen J.

2004-01-01

Objective: The Strengths and Difficulties Questionnaire is a 25-item instrument developed to assess emotional and behavioral problems. The current study attempted to replicate previous European structural analyses and to describe the latent dimensions that underlie responses to the parent-reported version of the Strengths and Difficulties…
Using Rasch Analysis to Identify Uncharacteristic Responses to Undergraduate Assessments

ERIC Educational Resources Information Center

Edwards, Antony; Alcock, Lara

2010-01-01

Rasch Analysis is a statistical technique that is commonly used to analyse both test data and Likert survey data, to construct and evaluate question item banks, and to evaluate change in longitudinal studies. In this article, we introduce the dichotomous Rasch model, briefly discussing its assumptions. Then, using data collected in an…
The Dimensionality of Cognitive Structure: A MIRT Approach and the Use of Subscores

ERIC Educational Resources Information Center

Cheng, Yi-Ling

2016-01-01

The present study explored the dimensionality of cognitive structure from two approaches. The first approach used a famous relation between Visual Spatial Working Memory (VSWM) and calculation to demonstrate the multidimensional item response analyses when true dimensions are unknown. The second approach explored the detectability of dimensions by…
1999 Survey of Active Duty Personnel: Administration, Datasets, and Codebook. Appendix G: Frequency and Percentage Distributions for Variables in the Survey Analysis Files.

DTIC Science & Technology

2000-12-01

A SKIP FLAG INDICATING THE RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP...RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5, NOTE 2, IN APPENDIX D. G-52...RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning

PubMed Central

Kim, Kyong-Jee; Hwang, Jee-Young

2016-01-01

Purpose: Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students’ experience with ubiquitous testing and its impact on student learning. Methods: A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students’ experiences of ubiquitous testing. Results: The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Conclusion: Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings. PMID:26838569
Detecting When “Quality of Life” Has Been “Enhanced”: Estimating Change in Quality of Life Ratings

PubMed Central

Tractenberg, Rochelle E.; Yumoto, Futoshi; Aisen, Paul S.

2015-01-01

Objective To demonstrate challenges in the estimation of change in quality of life (QOL). Methods Data were taken from a completed clinical trial with negative results. Responses to 13 QOL items were obtained 12 months apart from 258 persons with Alzheimer’s disease (AD) participating in a randomized, placebo-controlled clinical trial with two treatment arms. Two analyses to estimate whether “change” in QOL occurred over 12 months are described. A simple difference (later - earlier) was calculated from total scores (standard approach). A Qualified Change algorithm (novel approach) was applied to each item: differences in ratings were classified as either: improved, worsened, stayed poor, or stayed “positive” (fair, good, excellent). The strengths of evidence supporting a claim that “QOL changed”, derived from the two analyses, were compared by considering plausible alternative explanations for, and interpretations of, results obtained under each approach. Results Total score approach: QOL total scores decreased, on average, in the two treatment (both −1.0, p < 0.05), but not the placebo (=−0.59, p > 0.3) groups. Qualified change approach: Roughly 60% of all change in QOL items was worsening in every arm; 17% - 42% of all subjects experienced change in each item. Conclusions Totalling the subjective QOL item ratings collapses over items, and suggests a potentially misleading “overall” level of change (or no change, as in the placebo arm). Leaving the items as individual components of “quality” of life they were intended to capture, and qualifying the direction and amount of change in each, suggests that at least 17% of any group experienced change on every item, with 60% of all observed change being worsening. Discussion Summarizing QOL item ratings as a total “score” collapses over the face-valid, multi-dimensional components of the construct “quality of life”. Qualified Change provides robust evidence of changes to QOL or “enhancements of” life quality. PMID:26213645
Psychometric analyses and internal consistency of the PHEEM questionnaire to measure the clinical learning environment in the clerkship of a Medical School in Chile.

PubMed

Riquelme, Arnoldo; Herrera, Cristian; Aranis, Carolina; Oporto, Jorge; Padilla, Oslando

2009-06-01

The Spanish version of the Postgraduate Hospital Educational Environment Measure (PHEEM) was evaluated in this study to determine its psychometric properties, validity and internal consistency to measure the clinical learning environment in the hospital setting of Pontificia Universidad Católica de Chile Medical School's Internship. The 40-item PHEEM questionnaire was translated from English to Spanish and retranslated to English. Content validity was tested by a focus group and minor differences in meaning were adjusted. The PHEEM was administered to clerks in years 6 and 7. Construct validity was carried out using exploratory factor analysis followed by a Varimax rotation. Internal consistency was measured using Cronbach's alpha. A total of 125 out of 220 students responded to the PHEEM. The overall response rate was 56.8% and compliances with each item ranged from 99.2% to 100%. Analyses indicate that five factors instrument accounting for 58% of the variance and internal consistency of the 40-item questionnaire is 0.955 (Cronbach's alpha). The 40-item questionnaire had a mean score of 98.21 +/- 21.2 (maximum score of 160). The Spanish version of PHEEM is a multidimensional, valid and highly reliable instrument measuring the educational environment among undergraduate medical students working in hospital-based clerkships.
Capturing specific abilities as a window into human individuality: the example of face recognition.

PubMed

Wilmer, Jeremy B; Germine, Laura; Chabris, Christopher F; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken

2012-01-01

Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality.
Development of an item bank for computerized adaptive test (CAT) measurement of pain.

PubMed

Petersen, Morten Aa; Aaronson, Neil K; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Hammerlid, Eva; Hjermstad, Marianne J; Kaasa, Stein; Loge, Jon H; Velikova, Galina; Young, Teresa; Groenvold, Mogens

2016-01-01

Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured by the EORTC QLQ-C30 questionnaire. The development process consisted of four steps: (1) literature search, (2) formulation of new items and expert evaluations, (3) pretesting and (4) field-testing and psychometric analyses for the final selection of items. In step 1, we identified 337 pain items from the literature. Twenty-nine new items fitting the QLQ-C30 item style were formulated in step 2 that were reduced to 26 items by expert evaluations. Based on interviews with 31 patients from Denmark, France and the UK, the list was further reduced to 21 items in step 3. In phase 4, responses were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25% compared to using the QLQ-C30 pain scale. We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ-C30 pain items. The EORTC pain CAT is currently available for "experimental" purposes.
Attention! Can choices for low value food over high value food be trained?

PubMed

Zoltak, Michael J; Veling, Harm; Chen, Zhang; Holland, Rob W

2018-05-01

People choose high value food items over low value food items, because food choices are guided by the comparison of values placed upon choice alternatives. This value comparison process is also influenced by the amount of attention people allocate to different items. Recent research shows that choices for food items can be increased by training attention toward these items, with a paradigm named cued-approach training (CAT). However, previous work till now has only examined the influence of CAT on choices between two equally valued items. It has remained unclear whether CAT can increase choices for low value items when people choose between a low and high value food item. To address this question in the current study participants were cued to make rapid responses in CAT to certain low and high value items. Next, they made binary choices between low and high value items, where we systematically varied whether the low and high value items were cued or uncued. In two experiments, we found that participants overall preferred high over low value food items for real consumption. More important, their choices for low value items increased when only the low value item had been cued in CAT compared to when both low and high value items had not been cued. Exploratory analyses revealed that this effect was more pronounced for participants with a relatively small value difference between low and high value items. The present research thus suggests that CAT may be used to boost the choice and consumption of low value items via enhanced attention toward these items, as long as the value difference is not too large. Implications for facilitating choices for healthy food are discussed. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rasch measurement: the Arm Activity measure (ArmA) passive function sub-scale.

PubMed

Ashford, Stephen; Siegert, Richard J; Alexandrescu, Roxana

2016-01-01

To evaluate the conformity of the Arm Activity measure (ArmA) passive function sub-scale to the Rasch model. A consecutive cohort of patients (n = 92) undergoing rehabilitation, including upper limb rehabilitation and spasticity management, at two specialist rehabilitation units were included. Rasch analysis was used to examine scaling and conformity to the model. Responses were analysed using Rasch unidimensional measurement models (RUMM 2030). The following aspects were considered: overall model and individual item fit statistics and fit residuals, internal reliability, item response threshold ordering, item bias, local dependency and unidimensionality. ArmA contains both active and passive function sub-scales, but in this analysis only the passive function sub-scale was considered. Four of the seven items in the ArmA passive function sub-scale initially had disordered thresholds. These items were rescored to four response options, which resulted in ordered thresholds for all items. Once the items with disordered thresholds had been rescored, item bias was not identified for age, global disability level or diagnosis, but with a small difference in difficulty between males and females for one item of the scale. Local dependency was not observed and the unidimensionality of the sub-scale was supported and good fit to the Rasch model was identified. The person separation index (PSI) was 0.95 indicating that the scale is able to reliably differentiate at least two groups of patients. The ArmA passive function sub-scale was shown in this evaluation to conform to the Rasch model once disordered thresholds had been addressed. Using the logit scores produced by the Rasch model it was possible to convert this back to the original scale range. Implications for Rehabilitation The ArmA passive function sub-scale was shown, in this evaluation, to conform to the Rasch model once disordered thresholds had been addressed and therefore to be a clinically applicable and potentially useful hierarchical measure. Using Rasch logit scores it has be possible to convert back to the original ordinal scale range and provide an indication of real change to enable evaluation of clinical outcome of importance to patients and clinicians.
Response Mixture Modeling: Accounting for Heterogeneity in Item Characteristics across Response Times.

PubMed

Molenaar, Dylan; de Boeck, Paul

2018-06-01

In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses.

PubMed

Massof, Robert W

2014-10-01

A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Evaluation and performance of a newly developed patient-reported outcome instrument for diarrhea-predominant irritable bowel syndrome in a clinical study population

PubMed Central

Delgado-Herrera, Leticia; Lasch, Kathryn; Zeiher, Bernhardt; Lembo, Anthony J.; Drossman, Douglas A.; Banderas, Benjamin; Rosa, Kathleen; Lademacher, Christopher; Arbuckle, Rob

2017-01-01

Background: To evaluate the psychometric properties of the newly developed seven-item Irritable Bowel Syndrome – Diarrhea predominant (IBS-D) Daily Symptom Diary and four-item Event Log using phase II clinical trial safety and efficacy data in patients with IBS-D. This instrument measures diarrhea (stool frequency and stool consistency), abdominal pain related to IBS-D (stomach pain, abdominal pain, abdominal cramps), immediate need to have a bowel movement (immediate need and accident occurrence), bloating, pressure, gas, and incomplete evacuation. Methods: Psychometric properties and responsiveness of the instrument were evaluated in a clinical trial population [ClinicalTrials.gov identifier: NCT01494233]. Results: A total of 434 patients were included in the analyses. Significant differences were found among severity groups (p < 0.01) defined by IBS Patient Global Impression of Severity (PGI-S) and IBS Patient Global Impression of Change (PGI-C). Severity scores for each Diary and Event Log item score and five-item, four-item, and three-item summary scores were calculated. Between-group differences in changes over time were significant for all summary scores in groups stratified by changes in PGI-S (p < 0.05), two of six Diary items, and three of four Event Log items; a one-grade change in PGI-S was considered a meaningful difference with mean change scores on all Diary items −0.13 to −0.86 [standard deviation (SD) 0.79–1.39]. Similarly, for patients who reported being ‘slightly improved’ (considered a clinically meaningful difference) on the PGI-C, mean change scores on Diary items ranged from −0.45 to −1.55 (SD 0.69–1.39). All estimates of clinically important change for each item and all summary scores were small and should be considered preliminary. These results are aligned with the previous standalone psychometric study regarding reliability and validity tests. Conclusions: These analyses provide evidence of the psychometric properties of the IBS-D Daily Symptom Diary and Event Log in a clinical trial population. PMID:28932269
The Piper Fatigue Scale-12 (PFS-12): psychometric findings and item reduction in a cohort of breast cancer survivors.

PubMed

Reeve, Bryce B; Stover, Angela M; Alfano, Catherine M; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B; Piper, Barbara F

2012-11-01

Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study's primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Breast cancer survivors (n = 799; stages in situ through IIIa; ages 29-86 years) were recruited through three SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has four subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale's content validity, items' relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87 to 0.89, compared to 0.90-0.94 prior to item removal. The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African-Americans and Caucasian breast cancer survivors. Further testing in diverse populations is warranted.
Confirmatory factor analysis and measurement invariance of the Child Feeding Questionnaire in low-income Hispanic and African-American mothers with preschool-age children.

PubMed

Kong, Angela; Vijayasiri, Ganga; Fitzgibbon, Marian L; Schiffer, Linda A; Campbell, Richard T

2015-07-01

Validation work of the Child Feeding Questionnaire (CFQ) in low-income minority samples suggests a need for further conceptual refinement of this instrument. Using confirmatory factor analysis, this study evaluated 5- and 6-factor models on a large sample of African-American and Hispanic mothers with preschool-age children (n = 962). The 5-factor model included: 'perceived responsibility', 'concern about child's weight', 'restriction', 'pressure to eat', and 'monitoring' and the 6-factor model also tested 'food as a reward'. Multi-group analysis assessed measurement invariance by race/ethnicity. In the 5-factor model, two low-loading items from 'restriction' and one low-variance item from 'perceived responsibility' were dropped to achieve fit. Only removal of the low-variance item was needed to achieve fit in the 6-factor model. Invariance analyses demonstrated differences in factor loadings. This finding suggests African-American and Hispanic mothers may vary in their interpretation of some CFQ items and use of cognitive interviews could enhance item interpretation. Our results also demonstrated that 'food as a reward' is a plausible construct among a low-income minority sample and adds to the evidence that this factor resonates conceptually with parents of preschoolers; however, further testing is needed to determine the validity of this factor with older age groups. Copyright © 2015 Elsevier Ltd. All rights reserved.
The SF-8 Spanish Version for Health-Related Quality of Life Assessment: Psychometric Study with IRT and CFA Models.

PubMed

Tomás, José M; Galiana, Laura; Fernández, Irene

2018-03-22

The aim of current research is to analyze the psychometric properties of the Spanish version of the SF-8, overcoming previous shortcomings. A double line of analyses was used: competitive structural equations models to establish factorial validity, and Item Response theory to analyze item psychometric characteristics and information. 593 people aged 60 years or older, attending long life learning programs at the University were surveyed. Their age ranged from 60 to 92 years old. 67.6% were women. The survey included scales on personality dimensions, attitudes, perceptions, and behaviors related to aging. Competitive confirmatory models pointed out two-factors (physical and mental health) as the best representation of the data: χ2(13) = 72.37 (p < .01); CFI = .99; TLI = .98; RMSEA = .08 (.06, .10). Item 5 was removed because of unreliability and cross-loading. Graded response models showed appropriate fit for two-parameter logistic model both the physical and the mental dimensions. Item Information Curves and Test Information Functions pointed out that the SF-8 was more informative for low levels of health. The Spanish SF-8 has adequate psychometric properties, being better represented by two dimensions, once Item 5 is removed. Gathering evidence on patient-reported outcome measures is of crucial importance, as this type of measurement instruments are increasingly used in clinical arena.
Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks.

PubMed

Zhao, Yue

2017-03-01

In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation. Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen's summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items. The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant. Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.
"When I saw walking I just kind of took it as wheeling": interpretations of mobility-related items in generic, preference-based health state instruments in the context of spinal cord injury.

PubMed

Michel, Yvonne Anne; Engel, Lidia; Rand-Hendriksen, Kim; Augestad, Liv Ariane; Whitehurst, David Gt

2016-11-28

In health economic analyses, health states are typically valued using instruments with few items per dimension. Due to the generic (and often reductionist) nature of such instruments, certain groups of respondents may experience challenges in describing their health state. This study is concerned with generic, preference-based health state instruments that provide information for decisions about the allocation of resources in health care. Unlike physical measurement instruments, preference-based health state instruments provide health state values that are dependent on how respondents interpret the items. This study investigates how individuals with spinal cord injury (SCI) interpret mobility-related items contained within six preference-based health state instruments. Secondary analysis of focus group transcripts originally collected in Vancouver, Canada, explored individuals' perceptions and interpretations of mobility-related items contained within the 15D, Assessment of Quality of Life 8-dimension (AQoL-8D), EQ-5D-5L, Health Utilities Index (HUI), Quality of Well-Being Scale Self-Administered (QWB-SA), and the 36-item Short Form health survey version 2 (SF-36v2). Ritchie and Spencer's 'Framework Approach' was used to perform thematic analysis that focused on participants' comments concerning the mobility-related items only. Fifteen individuals participated in three focus groups (five per focus group). Four themes emerged: wording of mobility (e.g., 'getting around' vs 'walking'), reference to aids and appliances, lack of suitable response options, and reframing of items (e.g., replacing 'walking' with 'wheeling'). These themes reflected item features that respondents perceived as relevant in enabling them to describe their mobility, and response strategies that respondents could use when faced with inaccessible items. Investigating perceptions to mobility-related items within the context of SCI highlights substantial variation in item interpretation across six preference-based health state instruments. Studying respondents' interpretations of items can help to understand discrepancies in the health state descriptions and values obtained from different instruments. This line of research warrants closer attention in the health economics and quality of life literature.

Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI).

PubMed

Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur

2016-01-01

We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non-expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI's robustness and sensitivity in capturing useful data relating to the students' conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. © 2016 T. Deane et al. CBE—Life Sciences Education © 2016 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
Developing an Experiential Definition of Recovery: Participatory Research with Recovering Substance Abusers from Multiple Pathways

PubMed Central

Borkman, Thomasina J.; Stunz, Aina; Kaskutas, Lee Ann

2016-01-01

Background The What is Recovery? (WIR) study identified specific elements of a recovery definition that people in substance abuse recovery from multiple pathways would endorse. Objectives To explain how participatory research contributed to the development of a comprehensive pool of items defining recovery; and to identify the commonality between the specific items endorsed by participants as defining recovery and the abstract components of recovery found in four important broad recovery definitions Methods A four-step, mixed-methods, iterative process was used to develop and pretest items (August 2010 to February 2012). Online survey recruitment (n=238) was done via email lists of individuals in recovery and electronic advertisements; 54 were selected for in-depth telephone interviews. Analyses using experientially-based and survey research criteria resulted in a revised item pool of 47 refined and specific items. The WIR items were matched with the components of four important definitions. Results Recovering participants (1) proposed and validated new items; (2) developed an alternative response category to the Likert; (3) suggested criteria for eliminating items irrelevant to recovery. The matching of WIR items with the components of important abstract definitions revealed extensive commonality. Conclusions, importance The WIR items define recovery as ways of being, as a growth and learning process involving internal values and self-awareness with moral dimensions. This is the first wide-scale research identifying specific items defining recovery, which can be used to guide service provision in Recovery-Oriented Systems of Care. PMID:27159851
Developing an Experiential Definition of Recovery: Participatory Research With Recovering Substance Abusers From Multiple Pathways.

PubMed

Borkman, Thomasina Jo; Stunz, Aina; Kaskutas, Lee Ann

2016-07-28

The What is Recovery? (WIR) study identified specific elements of a recovery definition that people in substance abuse recovery from multiple pathways would endorse. To explain how participatory research contributed to the development of a comprehensive pool of items defining recovery; and to identify the commonality between the specific items endorsed by participants as defining recovery and the abstract components of recovery found in four important broad recovery definitions. A four-step, mixed-methods, iterative process was used to develop and pretest items (August 2010 to February 2012). Online survey recruitment (n = 238) was done via email lists of individuals in recovery and electronic advertisements; 54 were selected for in-depth telephone interviews. Analyses using experientially-based and survey research criteria resulted in a revised item pool of 47 refined and specific items. The WIR items were matched with the components of four important definitions. Recovering participants (1) proposed and validated new items; (2) developed an alternative response category to the Likert; (3) suggested criteria for eliminating items irrelevant to recovery. The matching of WIR items with the components of important abstract definitions revealed extensive commonality. The WIR items define recovery as ways of being, as a growth and learning process involving internal values and self-awareness with moral dimensions. This is the first wide-scale research identifying specific items defining recovery, which can be used to guide service provision in Recovery-Oriented Systems of Care.
Representation of item position in immediate serial recall: Evidence from intrusion errors.

PubMed

Fischer-Baum, Simon; McCloskey, Michael

2015-09-01

In immediate serial recall, participants are asked to recall novel sequences of items in the correct order. Theories of the representations and processes required for this task differ in how order information is maintained; some have argued that order is represented through item-to-item associations, while others have argued that each item is coded for its position in a sequence, with position being defined either by distance from the start of the sequence, or by distance from both the start and the end of the sequence. Previous researchers have used error analyses to adjudicate between these different proposals. However, these previous attempts have not allowed researchers to examine the full set of alternative proposals. In the current study, we analyzed errors produced in 2 immediate serial recall experiments that differ in the modality of input (visual vs. aural presentation of words) and the modality of output (typed vs. spoken responses), using new analysis methods that allow for a greater number of alternative hypotheses to be considered. We find evidence that sequence positions are represented relative to both the start and the end of the sequence, and show a contribution of the end-based representation beyond the final item in the sequence. We also find limited evidence for item-to-item associations, suggesting that both a start-end positional scheme and item-to-item associations play a role in representing item order in immediate serial recall. (c) 2015 APA, all rights reserved).
Applications of computerized adaptive testing (CAT) to the assessment of headache impact.

PubMed

Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew

2003-12-01

To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.
Development and Psychometric Properties of the Math and Me Survey: Measuring Third through Sixth Graders' Attitudes toward Mathematics

ERIC Educational Resources Information Center

Adelson, Jill L.; McCoach, D. Betsy

2011-01-01

The Math and Me Survey was designed to measure elementary students' attitudes toward mathematics. The authors conducted content validation, exploratory factor analysis, confirmatory factor analysis, item response theory, reliability, and external validity analyses to improve it and to test its psychometric properties. The final Math and Me Survey…
Using IRT Trait Estimates versus Summated Scores in Predicting Outcomes

ERIC Educational Resources Information Center

Xu, Ting; Stone, Clement A.

2012-01-01

It has been argued that item response theory trait estimates should be used in analyses rather than number right (NR) or summated scale (SS) scores. Thissen and Orlando postulated that IRT scaling tends to produce trait estimates that are linearly related to the underlying trait being measured. Therefore, IRT trait estimates can be more useful…
A Framework for Analysing Quality in Education Settings

ERIC Educational Resources Information Center

Mahapatra, S. S.; Khan, M. S.

2007-01-01

In this paper, an attempt has been made to propose a measuring instrument known as EduQUAL for evaluation of quality in Technical Education System (TES). Factor analysis has been carried out on responses obtained through cross-sectional questionnaire survey on various items to validate dimensionality of the instrument and it is found that 28 items…
Impact of Aging on the Dynamics of Memory Retrieval: A Time-Course Analysis

ERIC Educational Resources Information Center

Oztekin, Ilke; Gungor, Nur Zeynep; Badre, David

2012-01-01

The response-signal speed-accuracy trade-off (SAT) procedure was used to provide an in-depth investigation of the impact of aging on the dynamics of short-term memory retrieval. Young and older adults studied sequentially presented 3-item lists, immediately followed by a recognition probe. Analyses of composite list and serial position SAT…
Using Person Response Functions to Investigate Areas of Person Misfit Related to Item Characteristics

ERIC Educational Resources Information Center

Walker, A. Adrienne; Jennings, Jeremy Kyle; Engelhard, George, Jr.

2018-01-01

Individual person fit analyses provide important information regarding the validity of test score inferences for an "individual" test taker. In this study, we use data from an undergraduate statistics test (N = 1135) to illustrate a two-step method that researchers and practitioners can use to examine individual person fit. First, person…
Developing and testing the CHORDS: Characteristics of Responsible Drinking Survey.

PubMed

Barry, Adam E; Goodson, Patricia

2011-01-01

Report on the development and psychometric testing of a theoretically and evidence-grounded instrument, the Characteristics of Responsible Drinking Survey (CHORDS). Instrument subjected to four phases of pretesting (cognitive validity, cognitive and motivational qualities, pilot test, and item evaluation) and a final posttest implementation. Large public university in Texas. Randomly selected convenience sample (n = 729) of currently enrolled students. This 78-item questionnaire measures individuals' responsible drinking beliefs, motivations, intentions, and behaviors. Cronbach α, split-half reliability, principal components analysis and Spearman ρ were conducted to investigate reliability, stability, and validity. Measures in the CHORDS exhibited high internal consistency reliability and strong correlations of split-half reliability. Factor analyses indicated five distinct scales were present, as proposed in the theoretical model. Subscale composite scores also exhibited a correlation to alcohol consumption behaviors, indicating concurrent validity. The CHORDS represents the first instrument specifically designed to assess responsible drinking beliefs and behaviors. It was found to elicit valid and reliable data among a college student sample. This instrument holds much promise for practitioners who desire to empirically investigate dimensions of responsible drinking.
Psychometric properties of the Danish student well-being questionnaire assessed in >250,000 student responders.

PubMed

Niclasen, Janni; Keilow, Maria; Obel, Carsten

2018-05-01

Well-being is considered a prerequisite for learning. The Danish Ministry of Education initiated the development of a new 40-item student well-being questionnaire in 2014 to monitor well-being among all Danish public school students on a yearly basis. The aim of this study was to investigate the basic psychometric properties of this questionnaire. We used the data from the 2015 Danish student well-being survey for 268,357 students in grades 4-9 (about 85% of the study population). Descriptive statistics, exploratory factor analyses, confirmatory factor analyses and Cronbach's α reliability measures were used in the analyses. The factor analyses did not unambiguously support one particular factor structure. However, based on the basic descriptive statistics, exploratory factor analyses, confirmatory factor analyses, the semantics of the individual items and Cronbach's α, we propose a four-factor structure including 27 of the 40 items originally proposed. The four scales measure school connectedness, learning self-efficacy, learning environment and classroom management. Two bullying items and two psychosomatic items should be considered separately, leaving 31 items in the questionnaire. The proposed four-factor structure addresses central aspects of well-being, which, if used constructively, may support public schools' work to increase levels of student well-being.
The views of healthcare professionals, drug developers and regulators on information about older people needed for rational drug prescription.

PubMed

Beers, Erna; Egberts, Toine C G; Leufkens, Hubert G M; Jansen, Paul A F

2013-01-01

The ICH E7 guideline intends to improve the knowledge about medicines in geriatric patients. As a legislative document, it might not reflect the needs of healthcare professionals. This study investigated what information healthcare professionals, regulatory agencies and pharmaceutical industries consider necessary for rational drug prescribing to older individuals. A 29-item-questionnaire was composed, considering the representation in trials, pharmacokinetics, efficacy, safety, and convenience of use in older individuals, with space for additions. Forty-three European professionals with an interest in medication for older individuals were included. In order to investigate their relevance, five items were included in a second questionnaire, with 11 control items. Median scores, differences between clinical and non-clinical respondents and response consistency were analysed. Consistency was present in 10 control items. Therefore, all items of the first questionnaire and the five additional items were analysed. Thirty-seven (86%) respondents returned the first questionnaire; 31/37 (84%) the second. Information about age-related differences in adverse events, locomotor effects, drug-disease interactions, dosing instructions, and information about the proportion of included 65+ patients was considered necessary by most respondents. Clinicians considered information significantly more important than the non-clinical respondents about the inclusion of 75+, time-until-benefit in older people, anticholinergic effects, drug-disease interactions, and convenience of use. Main study limitations are the focus on information for daily practice, while the ICH E7 guideline is a legislative document focused on market approval of a new medicine. Also, a questionnaire with a Likert scale has its limitations; this was addressed by providing space for comments. This study reveals that items considered necessary are currently not included in the ICH E7 guideline. Also, clinicians' and non-clinicians' opinions differed significantly in 15% of the items. Therefore, all stakeholders should collaborate to improve the availability of information for the rational prescribing to older individuals.
The Views of Healthcare Professionals, Drug Developers and Regulators on Information about Older People Needed for Rational Drug Prescription

PubMed Central

Beers, Erna; Egberts, Toine C. G.; Leufkens, Hubert G. M.; Jansen, Paul A. F.

2013-01-01

Background The ICH E7 guideline intends to improve the knowledge about medicines in geriatric patients. As a legislative document, it might not reflect the needs of healthcare professionals. This study investigated what information healthcare professionals, regulatory agencies and pharmaceutical industries consider necessary for rational drug prescribing to older individuals. Methods and Findings A 29-item-questionnaire was composed, considering the representation in trials, pharmacokinetics, efficacy, safety, and convenience of use in older individuals, with space for additions. Forty-three European professionals with an interest in medication for older individuals were included. In order to investigate their relevance, five items were included in a second questionnaire, with 11 control items. Median scores, differences between clinical and non-clinical respondents and response consistency were analysed. Consistency was present in 10 control items. Therefore, all items of the first questionnaire and the five additional items were analysed. Thirty-seven (86%) respondents returned the first questionnaire; 31/37 (84%) the second. Information about age-related differences in adverse events, locomotor effects, drug-disease interactions, dosing instructions, and information about the proportion of included 65+ patients was considered necessary by most respondents. Clinicians considered information significantly more important than the non-clinical respondents about the inclusion of 75+, time-until-benefit in older people, anticholinergic effects, drug-disease interactions, and convenience of use. Main study limitations are the focus on information for daily practice, while the ICH E7 guideline is a legislative document focused on market approval of a new medicine. Also, a questionnaire with a Likert scale has its limitations; this was addressed by providing space for comments. Conclusions This study reveals that items considered necessary are currently not included in the ICH E7 guideline. Also, clinicians’ and non-clinicians’ opinions differed significantly in 15% of the items. Therefore, all stakeholders should collaborate to improve the availability of information for the rational prescribing to older individuals. PMID:23977208
Development of a brief tool for monitoring aberrant behaviours among patients receiving long-term opioid therapy: The Opioid-Related Behaviours In Treatment (ORBIT) scale.

PubMed

Larance, Briony; Bruno, Raimondo; Lintzeris, Nicholas; Degenhardt, Louisa; Black, Emma; Brown, Amanda; Nielsen, Suzanne; Dunlop, Adrian; Holland, Rohan; Cohen, Milton; Mattick, Richard P

2016-02-01

Early identification of problems is essential in minimising the unintended consequences of opioid therapy. This study aimed to develop a brief scale that identifies and quantifies recent aberrant behaviour among diverse patient populations receiving long-term opioid treatment. 40 scale items were generated via literature review and expert panel (N=19) and tested in surveys of: (i) N=41 key experts, and (ii) N=426 patients prescribed opioids >3 months (222 pain patients and 204 opioid substitution therapy (OST) patients). We employed item and scale psychometrics (exploratory factor analyses, confirmatory factor analyses and item-response theory statistics) to refine items to a brief scale. Following removal of problematic items (poor retest-reliability or wording, semantic redundancy, differential item functioning, collinearity or rarity) iterative factor analytic procedures identified a 10-item unifactorial scale with good model fit in the total sample (N=426; CFI=0.981, TLI=0.975, RMSEA=0.057), and among pain (CFI=0.969, TLI=0.960, RMSEA=0.062) and OST subgroups (CFI=0.989, TFI=0.986, RMSEA=0.051). The 10 items provided good discrimination between groups, demonstrated acceptable test-retest reliability (ICC 0.80, 95% CI 0.60-0.89; Cronbach's alpha=0.89), were moderately correlated with related constructs, including opioid dependence (SDS), depression and stress (DASS subscales) and Social Relationships and Environment domains of the WHO-QoL, and had strong face validity among advising clinicians. The Opioid-Related Behaviours In Treatment (ORBIT) scale is brief, reliable and validated for use in diverse patient groups receiving opioids. The ORBIT has potential applications as a checklist to prompt clinical discussions and as a tool to quantify aberrant behaviour and assess change over time. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Using existing questionnaires in latent class analysis: should we use summary scores or single items as input? A methodological study using a cohort of patients with low back pain.

PubMed

Nielsen, Anne Molgaard; Vach, Werner; Kent, Peter; Hestbaek, Lise; Kongsted, Alice

2016-01-01

Latent class analysis (LCA) is increasingly being used in health research, but optimal approaches to handling complex clinical data are unclear. One issue is that commonly used questionnaires are multidimensional, but expressed as summary scores. Using the example of low back pain (LBP), the aim of this study was to explore and descriptively compare the application of LCA when using questionnaire summary scores and when using single items to subgrouping of patients based on multidimensional data. Baseline data from 928 LBP patients in an observational study were classified into four health domains (psychology, pain, activity, and participation) using the World Health Organization's International Classification of Functioning, Disability, and Health framework. LCA was performed within each health domain using the strategies of summary-score and single-item analyses. The resulting subgroups were descriptively compared using statistical measures and clinical interpretability. For each health domain, the preferred model solution ranged from five to seven subgroups for the summary-score strategy and seven to eight subgroups for the single-item strategy. There was considerable overlap between the results of the two strategies, indicating that they were reflecting the same underlying data structure. However, in three of the four health domains, the single-item strategy resulted in a more nuanced description, in terms of more subgroups and more distinct clinical characteristics. In these data, application of both the summary-score strategy and the single-item strategy in the LCA subgrouping resulted in clinically interpretable subgroups, but the single-item strategy generally revealed more distinguishing characteristics. These results 1) warrant further analyses in other data sets to determine the consistency of this finding, and 2) warrant investigation in longitudinal data to test whether the finer detail provided by the single-item strategy results in improved prediction of outcomes and treatment response.
Rasch-family models are more valuable than score-based approaches for analysing longitudinal patient-reported outcomes with missing data.

PubMed

de Bock, Élodie; Hardouin, Jean-Benoit; Blanchin, Myriam; Le Neel, Tanguy; Kubis, Gildas; Bonnaud-Antignac, Angélique; Dantan, Étienne; Sébille, Véronique

2016-10-01

The objective was to compare classical test theory and Rasch-family models derived from item response theory for the analysis of longitudinal patient-reported outcomes data with possibly informative intermittent missing items. A simulation study was performed in order to assess and compare the performance of classical test theory and Rasch model in terms of bias, control of the type I error and power of the test of time effect. The type I error was controlled for classical test theory and Rasch model whether data were complete or some items were missing. Both methods were unbiased and displayed similar power with complete data. When items were missing, Rasch model remained unbiased and displayed higher power than classical test theory. Rasch model performed better than the classical test theory approach regarding the analysis of longitudinal patient-reported outcomes with possibly informative intermittent missing items mainly for power. This study highlights the interest of Rasch-based models in clinical research and epidemiology for the analysis of incomplete patient-reported outcomes data. © The Author(s) 2013.
Screening for adolescents' internalizing symptoms in primary care: item response theory analysis of the behavior health screen depression, anxiety, and suicidal risk scales.

PubMed

Bevans, Katherine B; Diamond, Guy; Levy, Suzanne

2012-05-01

To apply a modern psychometric approach to validate the Behavioral Health Screen (BHS) Depression, Anxiety, and Suicidal Risk Scales among adolescents in primary care. Psychometric analyses were conducted using data collected from 426 adolescents aged 12 to 21 years (mean = 15.8, SD = 2.2). Rasch-Masters partial credit models were fit to the data to determine whether items supported the comprehensive measurement of internalizing symptoms with minimal gaps and redundancies. Scales were reduced to ensure that they measured singular dimensions of generalized anxiety, depressed affect, and suicidal risk both comprehensively and efficiently. Although gender bias was observed for some depression and anxiety items, differential item functioning did not impact overall subscale scores. Future revisions to the BHS should include additional items that assess low-level internalizing symptoms. The BHS is an accurate and efficient tool for identifying adolescents with internalizing symptoms in primary care settings. Access to psychometrically sound and cost-effective behavioral health screening tools is essential for meeting the increasing demands for adolescent behavioral health screening in primary/ambulatory care.
Development of a survey instrument to measure connectivity to evaluate national public health preparedness and response performance.

PubMed

Dorn, Barry C; Savoia, Elena; Testa, Marcia A; Stoto, Michael A; Marcus, Leonard J

2007-01-01

Survey instruments for evaluating public health preparedness have focused on measuring the structure and capacity of local, state, and federal agencies, rather than linkages among structure, process, and outcomes. To focus evaluation on the latter, we evaluated the linkages among individuals, organizations, and systems using the construct of "connectivity" and developed a measurement instrument. Results from focus groups of emergency preparedness first responders generated 62 items used in the development sample of 187 respondents. Item reduction and factors analyses were conducted to confirm the scale's components. The 62 items were reduced to 28. Five scales explained 70% of the total variance (number of items, percent variance explained, Cronbach's alpha) including connectivity with the system (8, 45%, 0.94), coworkers (7, 7%, 0.91), organization (7, 12%, 0.93), and perceptions (6, 6%, 0.90). Discriminant validity was found to be consistent with the factor structure. We developed a Connectivity Measurement Tool for the public health workforce consisting of a 34-item questionnaire found to be a reliable measure of connectivity with preliminary evidence of construct validity.
Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

PubMed

Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

2013-07-01

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

Development and initial validation of the Pharmacist Frequency of Interprofessional Collaboration Instrument (FICI-P) in primary care.

PubMed

Van, Connie; Costa, Daniel; Mitchell, Bernadette; Abbott, Penny; Krass, Ines

2012-01-01

Existing validated measures of pharmacist-physician collaboration focus on measuring attitudes toward collaboration and do not measure frequency of collaborative interactions. To develop and validate an instrument to measure the frequency of collaboration between pharmacists and general practitioners (GPs) from the pharmacist's perspective. An 11-item Pharmacist Frequency of Interprofessional Collaboration Instrument (FICI-P) was developed and administered to 586 pharmacists in 8 divisions of general practice in New South Wales, Australia. The initial items were informed by a review of the literature in addition to interviews of pharmacists and GPs. Items were subjected to principal component and Rasch analyses to determine each item's and the overall measure's psychometric properties and for any needed refinements. Two hundred and twenty four (38%) of pharmacist surveys were completed and returned. Principal component analysis suggested removal of 1 item for a final 1-factor solution. The refined 10-item FICI-P demonstrated internal consistency reliability at Cronbach's alpha=0.90. After collapsing the original 5-point response scale to a 4-point response scale, the refined FICI-P demonstrated fit to the Rasch model. Criterion validity of the FICI-P was supported by the correlation of FICI-P scores with scores on a previously validated Physician-Pharmacist Collaboration Instrument. Validity was also supported by predicted differences in FICI-P scores between subgroups of respondents stratified on age, colocation with GPs, and interactions during the intern-training period. The refined 10-item FICI-P was shown to have good internal consistency, criterion validity, and fit to the Rasch model. The creation of such a tool may allow for the measure of impact in the evaluation of interventions designed to improve interprofessional collaboration between GPs and pharmacists. Copyright © 2012 Elsevier Inc. All rights reserved.
Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study

PubMed Central

2010-01-01

Background Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared. Methods Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified. Results When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods. Conclusion Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula. PMID:20338031
Does dependency make a difference? The role of convenience, social influence, facilitating condition and self-efficacy on student's purchase behaviour of smartphone

NASA Astrophysics Data System (ADS)

Jaganathan, Mathivannan; Mustapa, Azrain Nasyrah; Hasan, Wan Azlina Wan; Mat, Nik Kamariah Nik; Alekam, Jamal Mohammed Esmail

2014-12-01

It is an undeniable fact that penetration level and usage and sales of Smartphone dramatically increased past few years, whereby; it has increased to almost 60 percent of total population. Despite the high penetration of smartphone, previous studies have exhibited inconsistent findings towards understanding the behavioural intention to use smartphone especially among university students. Thus, the purpose of this study is to examine purchasing behaviour of Smartphone among students. From the literature, five antecedents of purchasing behaviour were identified. Each variable is measured using 7-point Likert scale: convenience (10 items), social influence (6 items), self-efficacy (10 items), facilitating condition (11 items), dependency (14 items) and purchasing behaviour (4 items). Using the primary data collection method, 400 questionnaires were distributed to the target respondents of one of the public higher education in the northern region. The responses collected were 350 completed questionnaires representing 87.5 percent response rate. The data were analysed using Structural Equation Modeling (SEM) using AMOS. Confirmatory factor analysis of measurement models indicates adequate goodness or fit after few items were eliminated through modification indices verifications. Therefore, goodness of fit for the generated structural model shows the adequate fit. This study has established four direct significant causal effects and two significant mediating effects: (1) convenience and dependency, (2) social influence and dependency, (3) facilitating condition and purchase behaviour, (4), dependency and purchase behaviour. The significant mediating results are: (1). Dependency mediates the relationship between convenience and purchase behaviour; (2) dependency mediates social influence and purchase behaviour. Thus, findings suggested that convenience, social influence and dependency play a role in determining students purchase behaviour of smartphone. The researchers hope the findings of this study will contribute theoretically and practically to scholars, marketers and smartphone manufacturers.
The Research Identity Scale: Psychometric Analyses and Scale Refinement

ERIC Educational Resources Information Center

Jorgensen, Maribeth F.; Schweinle, William E.

2018-01-01

The 68-item Research Identity Scale (RIS) was informed through qualitative exploration of research identity development in master's-level counseling students and practitioners. Classical psychometric analyses revealed the items had strong validity and reliability and a single factor. A one-parameter Rasch analysis and item review was used to…
Detecting Item Drift in Large-Scale Testing

ERIC Educational Resources Information Center

Guo, Hongwen; Robin, Frederic; Dorans, Neil

2017-01-01

The early detection of item drift is an important issue for frequently administered testing programs because items are reused over time. Unfortunately, operational data tend to be very sparse and do not lend themselves to frequent monitoring analyses, particularly for on-demand testing. Building on existing residual analyses, the authors propose…
An NCME Instructional Module on Polytomous Item Response Theory Models

ERIC Educational Resources Information Center

Penfield, Randall David

2014-01-01

A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…
Personality assessment of adolescents: an analysis of the Junior Self-Monitoring Scale.

PubMed

Howells, G N; Fishfader, V L

1995-04-01

The factor structure and reliability of Graziano, Musser, Leone, and Lautenschlager's 1987 Junior Self-monitoring Scale was examined using the responses of 1279 students in Grades 6 to 9. Analyses suggested that the scale contains two main factors which represent Concern for Social Appropriateness and Ability to Modify Self-presentation. We suggest using a 20-item version of the scale (rather than the original 24-item version) to provide increased reliability and that the scale may be more appropriate than the Adolescent Self-monitoring Scale by Pledger for use with younger children because it is easier to read and has abundant situational cues.
Scaling of theory-of-mind tasks.

PubMed

Wellman, Henry M; Liu, David

2004-01-01

Two studies address the sequence of understandings evident in preschoolers' developing theory of mind. The first, preliminary study provides a meta-analysis of research comparing different types of mental state understandings (e.g., desires vs. beliefs, ignorance vs. false belief). The second, primary study tests a theory-of-mind scale for preschoolers. In this study 75 children (aged 2 years, 11 months to 6 years, 6 months) were tested on 7 tasks tapping different aspects of understanding persons' mental states. Responses formed a consistent developmental progression, where for most children if they passed a later item they passed all earlier items as well, as confirmed by Guttman and Rasch measurement model analyses.
Applying Item Response Theory methods to design a learning progression-based science assessment

NASA Astrophysics Data System (ADS)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all the defined boundaries. This ensures the accuracy of the classification. Third, when item threshold parameters vary a bit, the scoring rubrics and the items need to be reviewed to make the threshold parameters similar across items. This is because one important design criterion of the learning progression-based items is that ideally, a student should be at the same level across items, which means that the item threshold parameters (d1, d 2 and d3) should be similar across items. To design a learning progression-based science assessment, we need to understand whether the assessment measures a single construct or several constructs and how items are associated with the constructs being measured. Results from dimension analyses indicate that items of different carbon transforming processes measure different aspects of the carbon cycle construct. However, items of different practices assess the same construct. In general, there are high correlations among different processes or practices. It is not clear whether the strong correlations are due to the inherent links among these process/practice dimensions or due to the fact that the student sample does not show much variation in these process/practice dimensions. Future data are needed to examine the dimensionalities in terms of process/practice in detail. Finally, based on item characteristics analysis, recommendations are made to write more discriminative CR items and better OMC, MTF options. Item writers can follow these recommendations to write better learning progression-based items.
The Curiosity and Exploration Inventory-II: Development, Factor Structure, and Psychometrics

PubMed Central

Kashdan, Todd B.; Gallagher, Matthew W.; Silvia, Paul J.; Winterstein, Beate P.; Breen, William E.; Terhar, Daniel; Steger, Michael F.

2009-01-01

Given curiosity’s fundamental role in motivation, learning, and well-being, we sought to refine the measurement of trait curiosity with an improved version of the Curiosity and Exploration Inventory (CEI; Kashdan, Rose, & Fincham, 2004). A preliminary pool of 36 items was administered to 311 undergraduate students, who also completed measures of emotion, emotion regulation, personality, and well-being. Factor analyses indicated a two factor model—motivation to seek out knowledge and new experiences (Stretching; 5 items) and a willingness to embrace the novel, uncertain, and unpredictable nature of everyday life (Embracing; 5 items). In two additional samples (ns = 150 and 119), we cross-validated this factor structure and provided initial evidence for construct validity. This includes positive correlations with personal growth, openness to experience, autonomy, purpose in life, self-acceptance, psychological flexibility, positive affect, and positive social relations, among others. Applying item response theory (IRT) to these samples (n = 578), we showed that the items have good discrimination and a desirable breadth of difficulty. The item information functions and test information function were centered near zero, indicating that the scale assesses the mid-range of the latent curiosity trait most reliably. The findings thus far provide good evidence for the psychometric properties of the 10-item CEI-II. PMID:20160913
Assessing whether parents and children perceive the meaning of the items in the PedsQLTM 4.0 quality of life instrument consistently: a differential item functioning analysis.

PubMed

Jafari, Peyman; Bagheri, Zahra; Hashemi, Seyyedeh Zahra; Shalileh, Keivan

2013-06-06

Limited studies have examined the effect of differential item functioning (DIF) on comparing health related quality of life (HRQoL) scores across child self-reports and parent proxy-reports. This study aims to determine whether parents and children respond differently to the items in the Persian version of the PedsQoLTM 4.0 measure. The PedsQLTM 4.0 Generic Core Scales was completed by 938 child-parent dyads. The graded response model (GRM) was used to detect DIF between parents and children. The IRT analyses were conducted using IRTPRO 2.1.On the whole, our findings showed that 50% (4 out of 8) of the items in the physical subscale and 40% (2 out of 5) in both emotional and school subscales were flagged with DIF. Among the DIF items, 62.5% (5 out of 8) were uniform and the remaining 37.5% (3 out of 8) were non-uniform. Parents and children interpret certain items of the PedsQLTM 4.0 in a different ways, except for the social subscale. Hence, we should be cautious about using parent proxy-report as a substitute for a child's ratings.
The Health Education Impact Questionnaire (heiQ): an outcomes and evaluation measure for patient education and self-management interventions for people with chronic conditions.

PubMed

Osborne, Richard H; Elsworth, Gerald R; Whitfield, Kathryn

2007-05-01

This paper describes the development and validation of the Health Education Impact Questionnaire (heiQ). The aim was to develop a user-friendly, relevant, and psychometrically sound instrument for the comprehensive evaluation of patient education programs, which can be applied across a broad range of chronic conditions. Item development for the heiQ was guided by a Program Logic Model, Concept Mapping, interviews with stakeholders and psychometric analyses. Construction (N=591) and confirmatory (N=598) samples were drawn from consumers of patient education programs and hospital outpatients. The properties of the heiQ were investigated using item response theory and structural equation modeling. Over 90 candidate items were generated, with 42 items selected for inclusion in the final scale. Eight independent dimensions were derived: Positive and Active Engagement in Life (five items, Cronbach's alpha (alpha)=0.86); Health Directed Behavior (four items, alpha=0.80); Skill and Technique Acquisition (five items, alpha=0.81); Constructive Attitudes and Approaches (five items, alpha=0.81); Self-Monitoring and Insight (seven items, alpha=0.70); Health Service Navigation (five items, alpha=0.82); Social Integration and Support (five items, alpha=0.86); and Emotional Wellbeing (six items, alpha=0.89). The heiQ has high construct validity and is a reliable measure of a broad range of patient education program benefits. The heiQ will provide valuable information to clinicians, researchers, policymakers and other stakeholders about the value of patient education programs in chronic disease management.
Science Teachers' Thinking About the Nature of Science: A New Methodological Approach to Its Assessment

NASA Astrophysics Data System (ADS)

Vázquez-Alonso, Ángel; García-Carmona, Antonio; Manassero-Mas, María Antonia; Bennàssar-Roig, Antoni

2013-04-01

This paper describes Spanish science teachers' thinking about issues concerning the nature of science (NOS) and the relationships connecting science, technology, and society (STS). The sample consisted of 774 in-service and pre-service teachers. The participants responded to a selection of items from the Questionnaire of Opinions on Science, Technology & Society in a multiple response model. These data were processed to generate the invariant indices that are used as the bases for subsequent quantitative and qualitative analyses. The overall results reflect moderately informed conceptions, and a detailed analysis by items, categories, and positions reveals a range of positive and negative conceptions about the topics of NOS dealt with in the questionnaire items. The implications of the findings for teaching and teacher training on the themes of NOS are discussed.
Decision analysis for a data collection system of patient-controlled analgesia with a multi-attribute utility model.

PubMed

Lee, I-Jung; Huang, Shih-Yu; Tsou, Mei-Yung; Chan, Kwok-Hon; Chang, Kuang-Yi

2010-10-01

Data collection systems are very important for the practice of patient-controlled analgesia (PCA). This study aimed to evaluate 3 PCA data collection systems and selected the most favorable system with the aid of multiattribute utility (MAU) theory. We developed a questionnaire with 10 items to evaluate the PCA data collection system and 1 item for overall satisfaction based on MAU theory. Three systems were compared in the questionnaire, including a paper record, optic card reader and personal digital assistant (PDA). A pilot study demonstrated a good internal and test-retest reliability of the questionnaire. A weighted utility score combining the relative importance of individual items assigned by each participant and their responses to each question was calculated for each system. Sensitivity analyses with distinct weighting protocols were conducted to evaluate the stability of the final results. Thirty potential users of a PCA data collection system were recruited in the study. The item "easy to use" had the highest median rank and received the heaviest mean weight among all items. MAU analysis showed that the PDA system had a higher utility score than that in the other 2 systems. Sensitivity analyses revealed that both inverse and reciprocal weighting processes favored the PDA system. High correlations between overall satisfaction and MAU scores from miscellaneous weighting protocols suggested a good predictive validity of our MAU-based questionnaire. The PDA system was selected as the most favorable PCA data collection system by the MAU analysis. The item "easy to use" was the most important attribute of the PCA data collection system. MAU theory can evaluate alternatives by taking into account individual preferences of stakeholders and aid in better decision-making. Copyright © 2010 Elsevier. Published by Elsevier B.V. All rights reserved.
Psychometric Evaluation of the Adolescent Health Promotion Scale in Chile: Differences by Socioeconomic Status and Gender.

PubMed

Rojas-Barahona, Cristian A; Gaete, Jorge; Olivares, Esterbina; Förster, Carla E; Chandia, Eugenio; Chen, Mei-Yen

2017-12-01

The promotion of healthy behaviors is a relevant issue worldwide, especially among adolescent populations, as this is the developmental stage where most unhealthy behaviors become ingrained. The aim of this study was to analyze the psychometric properties of the Spanish version of the Adolescent Health Promotion Scale (AHPS) in a Chilean sample of early adolescents. The sample was composed of 1,156 adolescents aged 10-14 years from schools in San Felipe, Chile. Item structure was assessed using exploratory and confirmatory factor analyses; reliability was measured using Cronbach's alpha; and differences in terms of gender, age, and socioeconomic status (SES) were established using analysis of variance. The analyses of item structure identified all of the six original factors (nutrition behaviors, health responsibility, social support, life appreciation, stress management and exercise behavior) as significant. However, eight items did not fit the Chilean population well. Therefore, the AHPS in Chile has been reduced to 32 items. The Cronbach's alpha of the 32-item Chilean AHPS was .95, with the subscale coefficients ranging from .76 to .94. In addition, female subjects performed better than male subjects, and individuals of higher SES scored higher than the middle and lower socioeconomic groups. No differences on AHPS scores were found in different age groups. The AHPS appears to have good psychometric properties in terms of item structure and reliability. Consistent with studies carried out in other countries, health promotion behavioral differences were observed in association with gender and SES. The results support the Chilean version of the AHPS as an appropriate instrument for measuring the health promotion behaviors of early adolescents in Chile and for comparing results with those from other countries.
On the validity of measuring change over time in routine clinical assessment: a close examination of item-level response shifts in psychosomatic inpatients.

PubMed

Nolte, S; Mierke, A; Fischer, H F; Rose, M

2016-06-01

Significant life events such as severe health status changes or intensive medical treatment often trigger response shifts in individuals that may hamper the comparison of measurements over time. Drawing from the Oort model, this study aims at detecting response shift at the item level in psychosomatic inpatients and evaluating its impact on the validity of comparing repeated measurements. Complete pretest and posttest data were available from 1188 patients who had filled out the ICD-10 Symptom Rating (ISR) scale at admission and discharge, on average 24 days after intake. Reconceptualization, reprioritization, and recalibration response shifts were explored applying tests of measurement invariance. In the item-level approach, all model parameters were constrained to be equal between pretest and posttest. If non-invariance was detected, these were linked to the different types of response shift. When constraining across-occasion model parameters, model fit worsened as indicated by a significant Satorra-Bentler Chi-square difference test suggesting potential presence of response shifts. A close examination revealed presence of two types of response shift, i.e., (non)uniform recalibration and both higher- and lower-level reconceptualization response shifts leading to four model adjustments. Our analyses suggest that psychosomatic inpatients experienced some response shifts during their hospital stay. According to the hierarchy of measurement invariance, however, only one of the detected non-invariances is critical for unbiased mean comparisons over time, which did not have a substantial impact on estimating change. Hence, the use of the ISR can be recommended for outcomes assessment in clinical routine, as change score estimates do not seem hampered by response shift effects.
Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model

ERIC Educational Resources Information Center

Woods, Carol M.

2008-01-01

In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
Using the Nominal Response Model to Evaluate Response Category Discrimination in the PROMIS Emotional Distress Item Pools

ERIC Educational Resources Information Center

Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.

2011-01-01

The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…
Depressive symptomatology among Mexican-American adults: an examination with the CES-D Scale.

PubMed

Garcia, M; Marks, G

1989-02-01

The presence and persistence of specific depressive symptomatology among a large sample of Mexican-American adults (n = 3,084) were examined with the Center for Epidemiologic Studies Depression (CES-D) Scale. Compared to studies of Anglos, a substantially larger percentage reported persistent hopelessness about the future (29%), self-depreciation (21%), and lack of enjoyment out of life (14%). The prevalence of these symptoms was higher among those who had not adapted to mainstream American society and among older participants. Women were generally more distressed than men. Factor analyses of the items demonstrated a slightly different factor structure than previously obtained with Anglos. For both sexes and for those under age 30 and ages 30-59, the items "loneliness," "sadness," and "crying" loaded on a common factor. The tendency for these items to group together was stronger for those exhibiting a low or medium degree of cultural adaptation than for those exhibiting a high degree of adaptation. Discussion focuses on the cultural variation of response to items on the CES-D.
An event-related potential study of deception to self preferences.

PubMed

Tu, Shen; Li, Hong; Jou, Jerwen; Zhang, Qinglin; Wang, Ting; Yu, Caiyun; Qiu, Jiang

2009-01-09

The spatiotemporal analysis of brain activation during the execution of deceptive decision-making was performed in 14 normal young adult subjects by using high-density event-related brain potentials (ERPs) with a delayed-response paradigm (subjects were required to hide their true attitudes for a moment). Our results showed that between 400 and 700 ms after stimulus onset, Deceptive items elicited a more negative ERP deflection (N400-700) than Truthful items, and between 1000 and 2000 ms, Deceptive items elicited a more positive ERP deflection (P1000-2000) than Truthful items. Analyses using dipole locations indicated that: (1) the generators of N400-700 were localized in the medial frontal gyrus (GFM) and middle temporal gyrus (GTM), which might be involved in conflict detection and control during deceptive decision-making; and (2) the generators of P1000-2000 were localized near the cuneus (CU) and the cingulate gyrus, which might be involved in conflict coordination in working memory due to deception.

Analysis of the construct of dignity and content validity of the patient dignity inventory

PubMed Central

2011-01-01

Background Maintaining dignity, the quality of being worthy of esteem or respect, is considered as a goal of palliative care. The aim of this study was to analyse the construct of personal dignity and to assess the content validity of the Patient Dignity Inventory (PDI) in people with an advance directive in the Netherlands. Methods Data were collected within the framework of an advance directives cohort study. This cohort study is aiming to get a better insight into how decisions are made at the end of life with regard to advance directives in the Netherlands. One half of the cohort (n = 2404) received an open-ended question concerning factors relevant to dignity. Content labels were assigned to issues mentioned in the responses to the open-ended question. The other half of the cohort (n = 2537) received a written questionnaire including the PDI. The relevance and comprehensiveness of the PDI items were assessed with the COSMIN checklist ('COnsensus-based Standards for the selection of health status Measurement INstruments'). Results The majority of the PDI items were found to be relevant for the construct to be measured, the study population, and the purpose of the study but the items were not completely comprehensive. The responses to the open-ended question indicated that communication and care-related aspects were also important for dignity. Conclusions This study demonstrated that the PDI items were relevant for people with an advance directive in the Netherlands. The comprehensiveness of the items can be improved by including items concerning communication and care. PMID:21682924
Analysis of the construct of dignity and content validity of the patient dignity inventory.

PubMed

Albers, Gwenda; Pasman, H Roeline W; Rurup, Mette L; de Vet, Henrica C W; Onwuteaka-Philipsen, Bregje D

2011-06-19

Maintaining dignity, the quality of being worthy of esteem or respect, is considered as a goal of palliative care. The aim of this study was to analyse the construct of personal dignity and to assess the content validity of the Patient Dignity Inventory (PDI) in people with an advance directive in the Netherlands. Data were collected within the framework of an advance directives cohort study. This cohort study is aiming to get a better insight into how decisions are made at the end of life with regard to advance directives in the Netherlands. One half of the cohort (n = 2404) received an open-ended question concerning factors relevant to dignity. Content labels were assigned to issues mentioned in the responses to the open-ended question. The other half of the cohort (n = 2537) received a written questionnaire including the PDI. The relevance and comprehensiveness of the PDI items were assessed with the COSMIN checklist ('COnsensus-based Standards for the selection of health status Measurement INstruments'). The majority of the PDI items were found to be relevant for the construct to be measured, the study population, and the purpose of the study but the items were not completely comprehensive. The responses to the open-ended question indicated that communication and care-related aspects were also important for dignity. This study demonstrated that the PDI items were relevant for people with an advance directive in the Netherlands. The comprehensiveness of the items can be improved by including items concerning communication and care.
Emotional Intelligence and Nurse Recruitment: Rasch and confirmatory factor analysis of the trait emotional intelligence questionnaire short form.

PubMed

Snowden, Austyn; Watson, Roger; Stenhouse, Rosie; Hale, Claire

2015-12-01

To examine the construct validity of the Trait Emotional Intelligence Questionnaire Short form. Emotional intelligence involves the identification and regulation of our own emotions and the emotions of others. It is therefore a potentially useful construct in the investigation of recruitment and retention in nursing and many questionnaires have been constructed to measure it. Secondary analysis of existing dataset of responses to Trait Emotional Intelligence Questionnaire Short form using concurrent application of Rasch analysis and confirmatory factor analysis. First year undergraduate nursing and computing students completed Trait Emotional Intelligence Questionnaire-Short Form in September 2013. Responses were analysed by synthesising results of Rasch analysis and confirmatory factor analysis. Participants (N = 938) completed Trait Emotional Intelligence Questionnaire Short form. Rasch analysis showed the majority of the Trait Emotional Intelligence Questionnaire-Short Form items made a unique contribution to the latent trait of emotional intelligence. Five items did not fit the model and differential item functioning (gender) accounted for this misfit. Confirmatory factor analysis revealed a four-factor structure consisting of: self-confidence, empathy, uncertainty and social connection. All five misfitting items from the Rasch analysis belonged to the 'social connection' factor. The concurrent use of Rasch and factor analysis allowed for novel interpretation of Trait Emotional Intelligence Questionnaire Short form. Much of the response variation in Trait Emotional Intelligence Questionnaire Short form can be accounted for by the social connection factor. Implications for practice are discussed. © 2015 John Wiley & Sons Ltd.
Upper-extremity and mobility subdomains from the Patient-Reported Outcomes Measurement Information System (PROMIS) adult physical functioning item bank.

PubMed

Hays, Ron D; Spritzer, Karen L; Amtmann, Dagmar; Lai, Jin-Shei; Dewitt, Esi Morgan; Rothrock, Nan; Dewalt, Darren A; Riley, William T; Fries, James F; Krishnan, Eswar

2013-11-01

To create upper-extremity and mobility subdomain scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) physical functioning adult item bank. Expert reviews were used to identify upper-extremity and mobility items from the PROMIS item bank. Psychometric analyses were conducted to assess empirical support for scoring upper-extremity and mobility subdomains. Data were collected from the U.S. general population and multiple disease groups via self-administered surveys. The sample (N=21,773) included 21,133 English-speaking adults who participated in the PROMIS wave 1 data collection and 640 Spanish-speaking Latino adults recruited separately. Not applicable. We used English- and Spanish-language data and existing PROMIS item parameters for the physical functioning item bank to estimate upper-extremity and mobility scores. In addition, we fit graded response models to calibrate the upper-extremity items and mobility items separately, compare separate to combined calibrations, and produce subdomain scores. After eliminating items because of local dependency, 16 items remained to assess upper extremity and 17 items to assess mobility. The estimated correlation between upper extremity and mobility was .59 using existing PROMIS physical functioning item parameters (r=.60 using parameters calibrated separately for upper-extremity and mobility items). Upper-extremity and mobility subdomains shared about 35% of the variance in common, and produced comparable scores whether calibrated separately or together. The identification of the subset of items tapping these 2 aspects of physical functioning and scored using the existing PROMIS parameters provides the option of scoring these subdomains in addition to the overall physical functioning score. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Reliability and Validity of the Telephone-Based eHealth Literacy Scale Among Older Adults: Cross-Sectional Survey.

PubMed

Stellefson, Michael; Paige, Samantha R; Tennant, Bethany; Alber, Julia M; Chaney, Beth H; Chaney, Don; Grossman, Suzanne

2017-10-26

Only a handful of studies have examined reliability and validity evidence of scores produced by the 8-item eHealth literacy Scale (eHEALS) among older adults. Older adults are generally more comfortable responding to survey items when asked by a real person rather than by completing self-administered paper-and-pencil or online questionnaires. However, no studies have explored the psychometrics of this scale when administered to older adults over the telephone. The objective of our study was to examine the reliability and internal structure of eHEALS data collected from older adults aged 50 years or older responding to items over the telephone. Respondents (N=283) completed eHEALS as part of a cross-sectional landline telephone survey. Exploratory structural equation modeling (E-SEM) analyses examined model fit of eHEALS scores with 1-, 2-, and 3-factor structures. Subsequent analyses based on the partial credit model explored the internal structure of eHEALS data. Compared with 1- and 2-factor models, the 3-factor eHEALS structure showed the best global E-SEM model fit indices (root mean square error of approximation=.07; comparative fit index=1.0; Tucker-Lewis index=1.0). Nonetheless, the 3 factors were highly correlated (r range .36 to .65). Item analyses revealed that eHEALS items 2 through 5 were overfit to a minor degree (mean square infit/outfit values <1.0; t statistics less than -2.0), but the internal structure of Likert scale response options functioned as expected. Overfitting eHEALS items (2-5) displayed a similar degree of information for respondents at similar points on the latent continuum. Test information curves suggested that eHEALS may capture more information about older adults at the higher end of the latent continuum (ie, those with high eHealth literacy) than at the lower end of the continuum (ie, those with low eHealth literacy). Item reliability (value=.92) and item separation (value=11.31) estimates indicated that eHEALS responses were reliable and stable. Results support administering eHEALS over the telephone when surveying older adults regarding their use of the Internet for health information. eHEALS scores best captured 3 factors (or subscales) to measure eHealth literacy in older adults; however, statistically significant correlations between these 3 factors suggest an overarching unidimensional structure with 3 underlying dimensions. As older adults continue to use the Internet more frequently to find and evaluate health information, it will be important to consider modifying the original eHEALS to adequately measure societal shifts in online health information seeking among aging populations. ©Michael Stellefson, Samantha R Paige, Bethany Tennant, Julia M Alber, Beth H Chaney, Don Chaney, Suzanne Grossman. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 26.10.2017.
A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

ERIC Educational Resources Information Center

Fukuhara, Hirotaka; Kamata, Akihito

2011-01-01

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Item Response Models for Examinee-Selected Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei

2012-01-01

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

ERIC Educational Resources Information Center

Lee, Woo-yeol; Cho, Sun-Joo

2017-01-01

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

ERIC Educational Resources Information Center

Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol

2016-01-01

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
Capturing specific abilities as a window into human individuality: The example of face recognition

PubMed Central

Wilmer, Jeremy B.; Germine, Laura; Chabris, Christopher F.; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken

2013-01-01

Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality. PMID:23428079
Factor analyses of an Adult Epilepsy Self-Management Measurement Instrument (AESMMI).

PubMed

Escoffery, Cam; Bamps, Yvan; LaFrance, W Curt; Stoll, Shelley; Shegog, Ross; Buelow, Janice; Shafer, Patricia; Thompson, Nancy J; McGee, Robin E; Hatfield, Katherine

2015-09-01

The purpose of this study was to test the psychometric properties of an enhanced Adult Epilepsy Self-Management Measurement Instrument (AESMMI). An instrument of 113 items, covering 10 a priori self-management domains, was generated through a multiphase process, based on a review of the literature, validated epilepsy and other chronic condition self-management scales and expert input. Reliability and exploratory factor analyses were conducted on data collected from 422 adults with epilepsy. The instrument was reduced to 65 items, converging on 11 factors: Health-care Communication, Coping, Treatment Management, Seizure Tracking, Social Support, Seizure Response, Wellness, Medication Adherence, Safety, Stress Management, and Proactivity. Exploratory factors supported the construct validity for 6 a priori domains, albeit with significant changes in the retained items or in their scope and 3 new factors. One a priori domain was split in 2 subscales pertaining to treatment. The configuration of the 11 factors provides additional insight into epilepsy self-management behaviors. Internal consistency reliability of the 65-item instrument was high (α=.935). Correlations with independent measures of health status, quality of life, depression, seizure severity, and life impact of epilepsy further validated the instrument. This instrument shows potential for use in research and clinical settings and for assessing intervention outcomes and self-management behaviors in adults with epilepsy. Copyright © 2015 Elsevier Inc. All rights reserved.
Forced-Choice Assessment of Work-Related Maladaptive Personality Traits: Preliminary Evidence From an Application of Thurstonian Item Response Modeling.

PubMed

Guenole, Nigel; Brown, Anna A; Cooper, Andrew J

2018-06-01

This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
The tinnitus functional index: development of a new clinical measure for chronic, intrusive tinnitus.

PubMed

Meikle, Mary B; Henry, James A; Griest, Susan E; Stewart, Barbara J; Abrams, Harvey B; McArdle, Rachel; Myers, Paula J; Newman, Craig W; Sandridge, Sharon; Turk, Dennis C; Folmer, Robert L; Frederick, Eric J; House, John W; Jacobson, Gary P; Kinney, Sam E; Martin, William H; Nagler, Stephen M; Reich, Gloria E; Searchfield, Grant; Sweetow, Robert; Vernon, Jack A

2012-01-01

Chronic subjective tinnitus is a prevalent condition that causes significant distress to millions of Americans. Effective tinnitus treatments are urgently needed, but evaluating them is hampered by the lack of standardized measures that are validated for both intake assessment and evaluation of treatment outcomes. This work was designed to develop a new self-report questionnaire, the Tinnitus Functional Index (TFI), that would have documented validity both for scaling the severity and negative impact of tinnitus for use in intake assessment and for measuring treatment-related changes in tinnitus (responsiveness) and that would provide comprehensive coverage of multiple tinnitus severity domains. To use preexisting knowledge concerning tinnitus-related problems, an Item Selection Panel (17 expert judges) surveyed the content (175 items) of nine widely used tinnitus questionnaires. From those items, the Panel identified 13 separate domains of tinnitus distress and selected 70 items most likely to be responsive to treatment effects. Eliminating redundant items while retaining good content validity and adding new items to achieve the recommended minimum of 3 to 4 items per domain yielded 43 items, which were then used for constructing TFI Prototype 1.Prototype 1 was tested at five clinics. The 326 participants included consecutive patients receiving tinnitus treatment who provided informed consent-constituting a convenience sample. Construct validity of Prototype 1 as an outcome measure was evaluated by measuring responsiveness of the overall scale and its individual items at 3 and 6 mo follow-up with 65 and 42 participants, respectively. Using a predetermined list of criteria, the 30 best-functioning items were selected for constructing TFI Prototype 2.Prototype 2 was tested at four clinics with 347 participants, including 155 and 86 who provided 3 and 6 mo follow-up data, respectively. Analyses were the same as for Prototype 1. Results were used to select the 25 best-functioning items for the final TFI. Both prototypes and the final TFI displayed strong measurement properties, with few missing data, high validity for scaling of tinnitus severity, and good reliability. All TFI versions exhibited the same eight factors characterizing tinnitus severity and negative impact. Responsiveness, evaluated by computing effect sizes for responses at follow-up, was satisfactory in all TFI versions.In the final TFI, Cronbach's alpha was 0.97 and test-retest reliability 0.78. Convergent validity (r = 0.86 with Tinnitus Handicap Inventory [THI]; r = 0.75 with Visual Analog Scale [VAS]) and discriminant validity (r = 0.56 with Beck Depression Inventory-Primary Care [BDI-PC]) were good. The final TFI was successful at detecting improvement from the initial clinic visit to 3 mo with moderate to large effect sizes and from initial to 6 mo with large effect sizes. Effect sizes for the TFI were generally larger than those obtained for the VAS and THI. After careful evaluation, a 13-point reduction was considered a preliminary criterion for meaningful reduction in TFI outcome scores. The TFI should be useful in both clinical and research settings because of its responsiveness to treatment-related change, validity for scaling the overall severity of tinnitus, and comprehensive coverage of multiple domains of tinnitus severity.
Cross-cultural validity of four quality of life scales in persons with spinal cord injury

PubMed Central

2010-01-01

Background Quality of life (QoL) in persons with spinal cord injury (SCI) has been found to differ across countries. However, comparability of measurement results between countries depends on the cross-cultural validity of the applied instruments. The study examined the metric quality and cross-cultural validity of the Satisfaction with Life Scale (SWLS), the Life Satisfaction Questionnaire (LISAT-9), the Personal Well-Being Index (PWI) and the 5-item World Health Organization Quality of Life Assessment (WHOQoL-5) across six countries in a sample of persons with spinal cord injury (SCI). Methods A cross-sectional multi-centre study was conducted and the data of 243 out-patients with SCI from study centers in Australia, Brazil, Canada, Israel, South Africa, and the United States were analyzed using Rasch-based methods. Results The analyses showed high reliability for all 4 instruments (person reliability index .78-.92). Unidimensionality of measurement was supported for the WHOQoL-5 (Chi2 = 16.43, df = 10, p = .088), partially supported for the PWI (Chi2 = 15.62, df = 16, p = .480), but rejected for the LISAT-9 (Chi2 = 50.60, df = 18, p = .000) and the SWLS (Chi2 = 78.54, df = 10, p = .000) based on overall and item-wise Chi2 tests, principal components analyses and independent t-tests. The response scales showed the expected ordering for the WHOQoL-5 and the PWI, but not for the other two instruments. Using differential item functioning (DIF) analyses potential cross-country bias was found in two items of the SWLS and the WHOQoL-5, three items of the LISAT-9 and four items of the PWI. However, applying Rasch-based statistical methods, especially subtest analyses, it was possible to identify optimal strategies to enhance the metric properties and the cross-country equivalence of the instruments post-hoc. Following the post-hoc procedures the WHOQOL-5 and the PWI worked in a consistent and expected way in all countries. Conclusions QoL assessment using the summary scores of the WHOQOL-5 and the PWI appeared cross-culturally valid in persons with SCI. In contrast, summary scores of the LISAT-9 and the SWLS have to be interpreted with caution. The findings of the current study can be especially helpful to select instruments for international research projects in SCI. PMID:20815864
Semantics bias in cross-national comparative analyses: is it good or bad to have "fair" health?

PubMed

Schnohr, Christina W; Gobina, Inese; Santos, Teresa; Mazur, Joanna; Alikasifuglu, Mujgan; Välimaa, Raili; Corell, Maria; Hagquist, Curt; Dalmasso, Paola; Movseyan, Yeva; Cavallo, Franco; van Dorsselaer, Saskia; Torsheim, Torbjørn

2016-05-04

The Health Behavior in School-aged Children is a cross-national study collecting data on social and health indicators on adolescents in 43 countries. The study provides comparable data on health behaviors and health outcomes through the use of a common protocol, which have been a back bone of the study sine its initiation in 1983. Recent years, researchers within the study have noticed a questionable comparability on the widely used item on self-rated health. One of the four response categories to the item "Would you say your health is….?" showed particular variation, as the response category "Fair" varied from 20 % in Latvia and Moldova to 3-4 % in Bulgaria and Macedonia. A qualitative mini-survey of the back-translations showed that the response category "Fair" had a negative slant in 25 countries, a positive slant in 10 countries and was considered neutral in 9 countries. This finding indicates that there are what may be called semantic issues affecting comparability in international studies, since the same original word (in an English original) is interpreted differently across countries and cultures. The paper test and discuss a few possible explanations to this, however, only leaving to future studies to hold a cautious approach to international comparisons if working with the self-rated health item with four response categories.
When Interference Helps: Increasing Executive Load to Facilitate Deception Detection in the Concealed Information Test

PubMed Central

Visu-Petra, George; Varga, Mihai; Miclea, Mircea; Visu-Petra, Laura

2013-01-01

The possibility to enhance the detection efficiency of the Concealed Information Test (CIT) by increasing executive load was investigated, using an interference design. After learning and executing a mock crime scenario, subjects underwent three deception detection tests: an RT-based CIT, an RT-based CIT plus a concurrent memory task (CITMem), and an RT-based CIT plus a concurrent set-shifting task (CITShift). The concealed information effect, consisting in increased RT and lower response accuracy for probe items compared to irrelevant items, was evidenced across all three conditions. The group analyses indicated a larger difference between RTs to probe and irrelevant items in the dual-task conditions, but this difference was not translated in a significantly increased detection efficiency at an individual level. Signal detection parameters based on the comparison with a simulated innocent group showed accurate discrimination for all conditions. Overall response accuracy on the CITMem was highest and the difference between response accuracy to probes and irrelevants was smallest in this condition. Accuracy on the concurrent tasks (Mem and Shift) was high, and responses on these tasks were significantly influenced by CIT stimulus type (probes vs. irrelevants). The findings are interpreted in relation to the cognitive load/dual-task interference literature, generating important insights for research on the involvement of executive functions in deceptive behavior. PMID:23543918
[Measuring job satisfaction: development of a multidimensional scale].

PubMed

Faraci, Palmira; Valenti, Giusy

2016-01-01

Although numerous studies have been done on the topic ofjob satisfaction, as regards the Italian research, the construction of specific psychometric instruments is lacking. The present paper is aimed to develop a scale to measure job satisfaction referring to our cultural context. Participants were 222 workers (36.5% males, 63.5% females) with an average age of 38.39 years (SD = 10.91). The formulated items were selected from a large item pool on the basis of the evaluation by a group of expert judges, and the item analysis procedure. In order to establish test validity, the following instruments were also administered: Occupational Stress Indicator, Satisfaction With Life Scale, Rosenberg Self-Esteem Scale, Multidimensional Scale of Perceived Social Support, and Beck Depression Inventory. Both exploratory and confirmatory factor analyses highlighted a 6-factor structure. Those factors were responsible for 51.30% of the total variance. Reliability analyses indicated satisfying internal consistency (ranging from alpha = .73 to alpha = .86). Construct validity was supported by results obtained calculating correlations with the theoretically associated variables. Our findings suggest promising psychometric properties for the presented measure. The instrument could be used in specific programs developed to promote well-being conditions in work settings.
The Survey of Treatment Entry Pressures (STEP): identifying client's reasons for entering substance abuse treatment.

PubMed

Dugosh, Karen Leggett; Festinger, David S; Lynch, Kevin G; Marlowe, Douglas B

2014-10-01

Systematically identifying reasons that clients enter substance abuse treatment may allow clinicians to immediately focus on issues of greatest relevance to the individual and enhance treatment engagement. We developed the Survey of Treatment Entry Pressures (STEP) to identify the specific factors that precipitated an individual's treatment entry. The instrument contains 121 items from 6 psychosocial domains (i.e., family, financial, social, medical, psychiatric, legal). The current study examined the STEP's psychometric properties. A total of 761 participants from various treatment settings and modalities completed the STEP prior to treatment admission and 4-7 days later. Analyses were performed to examine the instrument's psychometric properties including item response rates, test-retest reliability, internal consistency, and factor structure. The items displayed adequate test-retest reliability and internal consistency within each psychosocial domain. Generally, results from exploratory and confirmatory factor analyses support a 2-factor structure reflecting type of reinforcement schedule. The study provides preliminary support for the psychometric properties of the STEP. The STEP may provide a reliable way for clinicians to characterize and capitalize on a client's treatment motivation early on which may serve to improve treatment retention and therapeutic outcomes. © 2014 Wiley Periodicals, Inc.
Preliminary development and psychometric evaluation of an unmet needs measure for adolescents and young adults with cancer: the Cancer Needs Questionnaire - Young People (CNQ-YP).

PubMed

Clinton-McHarg, Tara; Carey, Mariko; Sanson-Fisher, Rob; D'Este, Catherine; Shakeshaft, Anthony

2012-01-30

Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken.
Preliminary development and psychometric evaluation of an unmet needs measure for adolescents and young adults with cancer: the Cancer Needs Questionnaire - Young People (CNQ-YP)

PubMed Central

2012-01-01

Background Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Methods Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. Results The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. Conclusions The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken. PMID:22284545

Item Response Theory Analyses of Parent and Teacher Ratings of the ADHD Symptoms for Recoded Dichotomous Scores

ERIC Educational Resources Information Center

Gomez, Rapson; Vance, Alasdair; Gomez, Andre

2011-01-01

Objective: The two-parameter logistic model (2PLM) was used to evaluate the psychometric properties of the inattention (IA) and hyperactivity/impulsivity (HI) symptoms. Method: To accomplish this, parents and teachers completed the Disruptive Behavior Rating Scale (DBRS) for a group of 934 primary school-aged children. Results: The results for the…
The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations.

PubMed

Hutton, Brian; Salanti, Georgia; Caldwell, Deborah M; Chaimani, Anna; Schmid, Christopher H; Cameron, Chris; Ioannidis, John P A; Straus, Sharon; Thorlund, Kristian; Jansen, Jeroen P; Mulrow, Cynthia; Catalá-López, Ferrán; Gøtzsche, Peter C; Dickersin, Kay; Boutron, Isabelle; Altman, Douglas G; Moher, David

2015-06-02

The PRISMA statement is a reporting guideline designed to improve the completeness of reporting of systematic reviews and meta-analyses. Authors have used this guideline worldwide to prepare their reviews for publication. In the past, these reports typically compared 2 treatment alternatives. With the evolution of systematic reviews that compare multiple treatments, some of them only indirectly, authors face novel challenges for conducting and reporting their reviews. This extension of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) statement was developed specifically to improve the reporting of systematic reviews incorporating network meta-analyses. A group of experts participated in a systematic review, Delphi survey, and face-to-face discussion and consensus meeting to establish new checklist items for this extension statement. Current PRISMA items were also clarified. A modified, 32-item PRISMA extension checklist was developed to address what the group considered to be immediately relevant to the reporting of network meta-analyses. This document presents the extension and provides examples of good reporting, as well as elaborations regarding the rationale for new checklist items and the modification of previously existing items from the PRISMA statement. It also highlights educational information related to key considerations in the practice of network meta-analysis. The target audience includes authors and readers of network meta-analyses, as well as journal editors and peer reviewers.
A Quasi-Parametric Method for Fitting Flexible Item Response Functions

ERIC Educational Resources Information Center

Liang, Longjuan; Browne, Michael W.

2015-01-01

If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
Point and Click, Carefully: Investigating Inconsistent Response Styles in Middle School and College Students Involved in Web-Based Longitudinal Substance Use Research

PubMed Central

Wardell, Jeffrey D.; Rogers, Michelle L.; Simms, Leonard J.; Jackson, Kristina M.; Read, Jennifer P.

2014-01-01

This study investigated inconsistent responding to survey items by participants involved in longitudinal, web-based substance use research. We also examined cross-sectional and prospective predictors of inconsistent responding. Middle school (N = 1,023) and college students (N = 995) from multiple sites in the United States responded to online surveys assessing substance use and related variables in three waves of data collection. We applied a procedure for creating an index of inconsistent responding at each wave that involved identifying pairs of items with considerable redundancy and calculating discrepancies in responses to these items. Inconsistent responding was generally low in the Middle School sample and moderate in the College sample, with individuals showing only modest stability in inconsistent responding over time. Multiple regression analyses identified several baseline variables—including demographic, personality, and behavioral variables—that were uniquely associated with inconsistent responding both cross-sectionally and prospectively. Alcohol and substance involvement showed some bivariate associations with inconsistent responding, but these associations largely were accounted for by other factors. The results suggest that high levels of carelessness or inconsistency do not appear to characterize participants’ responses to longitudinal web-based surveys of substance use and support the use of inconsistency indices as a tool for identifying potentially problematic responders. PMID:24092819
Reading Ability and Print Exposure: Item Response Theory Analysis of the Author Recognition Test

PubMed Central

Moore, Mariah; Gordon, Peter C.

2015-01-01

In the Author Recognition Test (ART) participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, with this predictive ability generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. This large-scale study (1012 college student participants) used Item Response Theory (IRT) to analyze item (author) characteristics to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and to optimize scoring of the ART. Factor analysis suggests a potential two factor structure of the ART differentiating between literary vs. popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of time spent encoding words as measured using eye-tracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Further, they show that frequency data can be used to select items of appropriate difficulty and that frequency data from corpora based on particular time periods and types of text may allow test adaptation for different populations. PMID:25410405
Reading ability and print exposure: item response theory analysis of the author recognition test.

PubMed

Moore, Mariah; Gordon, Peter C

2015-12-01

In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
Age and gender differences in depression across adolescence: real or 'bias'?

PubMed

van Beek, Yolanda; Hessen, David J; Hutteman, Roos; Verhulp, Esmée E; van Leuven, Mirande

2012-09-01

Since developmental psychologists are interested in explaining age and gender differences in depression across adolescence, it is important to investigate to what extent these observed differences can be attributed to measurement bias. Measurement bias may arise when the phenomenology of depression varies with age or gender, i.e., when younger versus older adolescents or girls versus boys differ in the way depression is experienced or expressed. The Children's Depression Inventory (CDI) was administered to a large school population (N = 4048) aged 8-17 years. A 4-factor model was selected by means of factor analyses for ordered categorical measures. For each of the four factor scales measurement invariance with respect to gender and age (late childhood, early and middle adolescence) was tested using item response theory analyses. Subsequently, to examine which items contributed to measurement bias, all items were studied for differential item functioning (DIF). Finally, it was investigated how developmental patterns changed if measurement biases were accounted for. For each of the factors Self-Deprecation, Dysphoria, School Problems, and Social Problems measurement bias with respect to both gender and age was found and many items showed DIF. Developmental patterns changed profoundly when measurement bias was taken into account. The CDI seemed to particularly overestimate depression in late childhood, and underestimate depression in middle adolescent boys. For scientific as well as clinical use of the CDI, measurement bias with respect to gender and age should be accounted for. © 2012 The Authors. Journal of Child Psychology and Psychiatry © 2012 Association for Child and Adolescent Mental Health.
Psychometric characteristics of daily diaries for the Patient-Reported Outcomes Measurement Information System (PROMIS®): a preliminary investigation.

PubMed

Schneider, Stefan; Choi, Seung W; Junghaenel, Doerte U; Schwartz, Joseph E; Stone, Arthur A

2013-09-01

The Patient-Reported Outcomes (PRO) Measurement Information System (PROMIS(®)) has developed assessment tools for numerous PROs, most using a 7-day recall format. We examined whether modifying the recall period for use in daily diary research would affect the psychometric characteristics of several PROMIS measures. Daily versions of short-forms for three PROMIS domains (pain interference, fatigue, depression) were administered to a general population sample (n = 100) for 28 days. Analyses used multilevel item response theory (IRT) models. We examined differential item functioning (DIF) across recall periods by comparing the IRT parameters from the daily data with the PROMIS 7-day recall IRT parameters. Additionally, we examined whether the IRT parameters for day-to-day within-person changes are invariant to those for between-person (cross-sectional) differences in PROs. Dimensionality analyses of the daily data suggested a single dimension for each PRO domain, consistent with PROMIS instruments. One-third of the daily items showed uniform DIF when compared with PROMIS 7-day recall, but the impact of DIF on the scale level was minor. IRT parameters for within-person changes differed from between-person parameters for 3 depression items, which were more sensitive for measuring change than between-person differences, but not for pain interference and fatigue items. Notably, mean scores from daily diaries were significantly lower than the PROMIS 7-day recall norms. The results provide initial evidence supporting the adaptation of PROMIS measures for daily diary research. However, scores from daily diaries cannot be directly interpreted on PROMIS norms established for 7-day recall.
Is the General Self-Efficacy Scale a Reliable Measure to be used in Cross-Cultural Studies? Results from Brazil, Germany and Colombia.

PubMed

Damásio, Bruno F; Valentini, Felipe; Núñes-Rodriguez, Susana I; Kliem, Soeren; Koller, Sílvia H; Hinz, Andreas; Brähler, Elmar; Finck, Carolyn; Zenger, Markus

2016-05-26

This study evaluated cross-cultural measurement invariance for the General Self-efficacy Scale (GSES) in a large Brazilian (N = 2.394) and representative German (N = 2.046) and Colombian (N = 1.500) samples. Initially, multiple-indicators multiple-causes (MIMIC) analyses showed that sex and age were biasing items responses on the total sample (2 and 10 items, respectively). After controlling for these two covariates, a multigroup confirmatory factor analysis (MGCFA) was employed. Configural invariance was attested. However, metric invariance was not supported for five items, in a total of 10, and scalar invariance was not supported for all items. We also evaluated the differences between the latent scores estimated by two models: MIMIC and MGCFA unconstraining the non-equivalent parameters across countries. The average difference was equal to |.07| on the estimation of the latent scores, and 22.8% of the scores were biased in at least .10 standardized points. Bias effects were above the mean for the German group, which the average difference was equal to |.09|, and 33.7% of the scores were biased in at least .10. In synthesis, the GSES did not provide evidence of measurement invariance to be employed in this cross-cultural study. More than that, our results showed that even when controlling for sex and age effects, the absence of control on items parameters in the MGCFA analyses across countries would implicate in bias of the latent scores estimation, with a higher effect for the German population.
Relationship between Item Responses of Negative Affect Items and the Distribution of the Sum of the Item Scores in the General Population

PubMed Central

Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka

2016-01-01

Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132
Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

PubMed

Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

2006-11-01

We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Stochastic Approximation Methods for Latent Regression Item Response Models

ERIC Educational Resources Information Center

von Davier, Matthias; Sinharay, Sandip

2010-01-01

This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Validation of the cardiac health behavior scale for Korean adults with cardiovascular risks or diseases.

PubMed

Song, Rhayun; Oh, Hyunkyoung; Ahn, Sukhee; Moorhead, Sue

2018-02-01

The purpose of this study was to validate the Cardiac Health Behavior Scale for Korean adults (CHB-K) to determine its validity and reliability. Cardiovascular diseases (CVDs) are one of the most important chronic diseases due to their high prevalence and mortality rates. Patients with cardiovascular risks or diseases need to perform appropriate cardiac health behaviors that help to prevent the progression of the disease and improve their health status. This secondary analysis obtained data from two clinical trials of cardiac rehabilitation. Data from 298 patients with cardiovascular risks or diseases were analyzed for validation. Data analyses included correlation coefficients, t-tests, and exploratory and confirmatory factor analyses using SPSS (version WIN 22.0) and AMOS (version 20.0). The Self-Efficacy Scale was used to assess convergent validity, while reliability was assessed using Cronbach's alpha coefficients. Five main factors were verified: health responsibility, physical activity, diet habit (eating habit and food choice), stress management, and smoking cessation. A set of 21 items from the 25-item scale was verified after performing item analysis, factor analyses, and critical evaluation of the statistical results. The 21-item CHB-K (CHB-K21) exhibited acceptable validity, and the model of the CHB-K21 provided a good fit to the data. Most of the factors were found to be moderately correlated with SES scores (r=0.45-0.52, p<0.001). The CHB-K21 also demonstrated acceptable reliability (Cronbach's alpha=0.83). The CHB-K21 demonstrates strong validity and reliability. It can be used to assess cardiac health behaviors in Korean adults with cardiovascular risks or diseases. Copyright © 2017 Elsevier Inc. All rights reserved.
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

ERIC Educational Resources Information Center

Aybek, Eren Can; Demirtasli, R. Nukhet

2017-01-01

This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
Measuring Collaboration and Communication to Increase Implementation of Evidence-Based Practices: The Cultural Exchange Inventory

ERIC Educational Resources Information Center

Palinkas, Lawrence A.; Garcia, Antonio; Aarons, Gregory; Finno-Velasquez, Megan; Fuentes, Dahlia; Holloway, Ian; Chamberlain, Patricia

2018-01-01

The Cultural Exchange Inventory (CEI) is a 15-item instrument designed to measure the process (7 items) and outcomes (8 items) of exchanges of knowledge, attitudes and practices between members of different organisations collaborating in implementing evidence-based practice. We conducted principal axis factor analyses and parallel analyses of data…
An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

ERIC Educational Resources Information Center

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
Psychometric properties and podiatric medical student perceptions of USMLE-style items in a general anatomy course.

PubMed

D'Antoni, Anthony V; DiLandro, Anthony C; Chusid, Eileen D; Trepal, Michael J

2012-01-01

In 2010, the New York College of Podiatric Medicine general anatomy course was redesigned to emphasize clinical anatomy. Over a 2-year period, United States Medical Licensing Examination (USMLE)-style items were used in lecture assessments with two cohorts of students (N =200). Items were single-best-answer and extended-matching formats. Psychometric properties of items and assessments were evaluated, and anonymous student post-course surveys were administered. Mean grades for each assessment were recorded over time and compared between cohorts using analysis of variance. Correlational analyses were used to investigate the relationship between final course grades and lecture examinations. Post-course survey response rates for the cohorts were 71 of 97 (73%) and 81 of 103 (79%). The USMLE-style items had strong psychometric properties. Point biserial correlations were 0.20 and greater, and the range of students answering the items correctly was 25% to 75%. Examinations were highly reliable, with Kuder-Richardson 20 coefficients of 0.71 to 0.76. Students (>80%) reported that single-best-answer items were easier than extended-matching items. Students (>76%) believed that the items on the quizzes/examinations were similar to those found on USMLE Step 1. Most students (>84%) believed that they would do well on the anatomy section of their boards (American Podiatric Medical Licensing Examination [APMLE] Part I). Students valued USMLE-style items. These data, coupled with the psychometric data, suggest that USMLE-style items can be successfully incorporated into a basic science course in podiatric medical education. Outcomes from students who recently took the APMLE Part I suggest that incorporation of USMLE-style items into the general anatomy course was a successful measure and prepared them well.
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.

ERIC Educational Resources Information Center

Kim, Seock-Ho; Cohen, Allan S.

Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
Writing, Evaluating and Assessing Data Response Items in Economics.

ERIC Educational Resources Information Center

Trotman-Dickenson, D. I.

1989-01-01

Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…
Item Response Modeling with Sum Scores

ERIC Educational Resources Information Center

Johnson, Timothy R.

2013-01-01

One of the distinctions between classical test theory and item response theory is that the former focuses on sum scores and their relationship to true scores, whereas the latter concerns item responses and their relationship to latent scores. Although item response theory is often viewed as the richer of the two theories, sum scores are still…

Direct Care Workers in the National Drug Abuse Treatment Clinical Trials Network: Characteristics, Opinions, and Beliefs

PubMed Central

McCarty, Dennis; Fuller, Bret E.; Arfken, Cynthia; Miller, Michael; Nunes, Edward V.; Edmundson, Eldon; Copersino, Marc; Floyd, Anthony; Forman, Robert; Laws, Reesa; Magruder, Kathy M.; Oyama, Mark; Sindelar, Jody; Wendt, William W.

2010-01-01

Objective Individuals with direct care responsibilities in 348 drug abuse treatment units were surveyed to obtain a description of the workforce and to assess support for evidence-based therapies. Methods Surveys were distributed to 112 programs participating in the National Drug Abuse Treatment Clinical Trials Network (CTN). Descriptive analyses characterized the workforce. Analyses of covariance tested the effects of job category (counselors, medical staff, manager-supervisors, and support staff) on opinions about evidence-based practices and controlled for the effects of education, modality (outpatient or residential), race, and gender. Results Women made up two-thirds of the CTN workforce. One-third of the workforce had a master’s or doctoral degree. Responses from 1,757 counselors, 908 support staff, 522 managers-supervisors, and 511 medical staff (71% of eligible participants) suggested that the variables that most consistently influenced responses were job category (19 of 22 items) and education (20 of 22 items). Managers-supervisors were the most supportive of evidence-based therapies, and support staff were the least supportive. Generally, individuals with graduate degrees had more positive opinions about evidence-based therapies. Support for using medications and contingency management was modest across job categories. Conclusions The relatively traditional beliefs of support staff could inhibit the introduction of evidence-based practices. Programs initiating changes in therapeutic approaches may benefit from including all employees in change efforts. PMID:17287373
Item-level informant discrepancies across obese-overweight children and their parents on the PedsQL™ 4.0 instrument: an iterative hybrid ordinal logistic regression.

PubMed

Jafari, Peyman; Allahyari, Elahe; Salarzadeh, Mina; Bagheri, Zahra

2016-01-01

Child obesity has become a major health concern worldwide. In order to provide successful intervention strategies, it is necessary to understand how obese-overweight children and their parents perceive obesity and its consequences on child's health-related quality of life (HRQoL). This study aimed to assess measurement equivalence of the PedsQL™ 4.0 across obese-overweight children and their parents. The items in the PedsQL™ 4.0 were analysed for differential item functioning (DIF) across obese-overweight children and their parents using an iterative hybrid ordinal logistic regression/item response theory approach. The sample included 647 overweight-obese children and their parents, who completed child and parent reports of the PedsQL™ 4.0, respectively. Overall, 17 out of 23 (74%) items were flagged with DIF across two groups: eight items exhibited uniform DIF and nine items non-uniform DIF. In addition, parents of obese children rated the child's HRQoL significantly lower than their children in all domains of the PedsQL™ 4.0, and this finding did not change whether or not items with uniform DIF were included. Although obese-overweight children and their parents interpret items of the PedsQL™ 4.0 in a conceptually different manner, removing or retaining DIF items in the subscales had no significant effects on group differences. Accordingly, it appears that observed differences in HRQoL scores across child and parent reports are a true difference and not a reflection of measurement artefact.
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means

ERIC Educational Resources Information Center

Polak, Marike; De Rooij, Mark; Heiser, Willem J.

2012-01-01

In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Using the Patient Health Questionnaire-9 to measure depression among racially and ethnically diverse primary care patients.

PubMed

Huang, Frederick Y; Chung, Henry; Kroenke, Kurt; Delucchi, Kevin L; Spitzer, Robert L

2006-06-01

The Patient Health Questionnaire depression scale (PHQ-9) is a well-validated, Diagnostic and Statistical Manual of Mental Disorders- Fourth Edition (DSM-IV) criterion-based measure for diagnosing depression, assessing severity and monitoring treatment response. The performance of most depression scales including the PHQ-9, however, has not been rigorously evaluated in different racial/ethnic populations. Therefore, we compared the factor structure of the PHQ-9 between different racial/ethnic groups as well as the rates of endorsement and differential item functioning (DIF) of the 9 items of the PHQ-9. The presence of DIF would indicate that responses to an individual item differ significantly between groups, controlling for the level of depression. A combined dataset from 2 separate studies of 5,053 primary care patients including non-Hispanic white (n=2,520), African American (n=598), Chinese American (n=941), and Latino (n=974) patients was used for our analysis. Exploratory principal components factor analysis was used to derive the factor structure of the PHQ-9 in each of the 4 racial/ethnic groups. A generalized Mantel-Haenszel statistic was used to test for DIF. One main factor that included all PHQ-9 items was found in each racial/ethnic group with alpha coefficients ranging from 0.79 to 0.89. Although endorsement rates of individual items were generally similar among the 4 groups, evidence of DIF was found for some items. Our analyses indicate that in African American, Chinese American, Latino, and non-Hispanic white patient groups the PHQ-9 measures a common concept of depression and can be effective for the detection and monitoring of depression in these diverse populations.
Assessing the impact of growth hormone deficiency and treatment in adults: development of a new disease-specific measure.

PubMed

Brod, Meryl; Højbjerre, Lise; Adalsteinsson, Johan Erpur; Rasmussen, Michael Højby

2014-04-01

Approximately 50 000 adults in the United States are diagnosed with GH deficiency, which has negative impacts on cognitive functioning, psychological well-being, and quality of life. This paper presents development and validation of a patient-reported outcome measure (PRO), the Treatment-Related Impact Measure-Adult Growth Hormone Deficiency (TRIM-AGHD). The TRIM-AGHD was developed to measure the impact of GH deficiency and its treatment. The development and validation of the TRIM-AGHD was conducted according to the Food and Drug Administration guidance on the development of PROs. Concept elicitation, conducted in three countries included interviews with patients, clinical experts, and literature review. Qualitative data were analyzed based on grounded theory principles, and draft items were cognitively debriefed. The measure underwent psychometric validation in a US clinic-based population. An a priori statistical analysis plan included assessment of the measurement model, reliability, and validity. Item functioning was reviewed using item response theory analyses. Forty-eight patients and six clinical experts participated in concept elicitation and 169 patients completed the validation study. TRIM-AGHD was measured. Factor analysis resulted in four domains: energy level, physical health, emotional health, and cognitive ability. The item response theory confirmed adequate item fit and placement within their domain. Internal consistency ranged from 0.82 to 0.95 and test-retest ranged from 0.80 to 0.92. All prespecified hypotheses for convergent validity and all but two for discriminant validity were met. The final 26-item TRIM-AGHD can be considered a reliable and valid PRO of the impact of disease and treatment for adult GH deficiency.
Assessing the quality of life of adults with chronic respiratory diseases in routine primary care: construction and first validation of the 10-Item Respiratory Illness Questionnaire-monitoring 10 (RIQ-MON10).

PubMed

Jacobs, J E; Maillé, A R; Akkermans, R P; van Weel, C; Grol, R P T M

2004-08-01

As doctors' judgements about the burden of a disease often differ from patients' own assessments a manageable method to incorporate the latter into routine care might support patient-centered decision-making. For this purpose we shortened the 55-Item Quality of Life for Respiratory Illness Questionnaire (QoL-RIQ). Secondary analyses of the data of 3 controlled studies (n = 328, 502 and 555). inter-item correlations, scale distributions, Cronbach's alpha and factor analysis. Dyspnoea, forced expiratory volume in 1 s (FEV1), COOP/WONCA charts, the Medical Research Council-ECCS symptoms questionnaire and the MOS-SF 36 served as criteria to test validity and responsiveness. Item-reduction resulted in a 10-item short form (alpha's 0.87-0.90), consisting of 2 5-item factors: (1) physical and emotional complaints and (2) physical and social limitations. The correlations of the short form with dyspnoea (r from 0.57 to 0.60), the generic health status instruments (r from 0.39 to 0.59) and lung function (r from 0.10 to 0.15) fulfilled the criteria. FURTHER RESULTS: a clinical relevant score difference (> 0.5) between upper and lower quartiles of the convergent instruments, an intraclass correlation between repeated scores in a stable group of 0.82 and a standardised response mean of 0.86 in an improved group of patients. The short form (RIQ-MON10) maintained the psychometric properties of the original instrument and is promising for assessing quality of life (QoL) during routine primary care visits.
Identifying shortcomings in the measurement of service quality.

PubMed

Fogarty, G; Catts, R; Forlin, C

2000-01-01

SERVPEFR, the performance component of the Service Quality Scale (SERVQUAL), has been shown to measure five underlying dimensions corresponding to Tangibles, Reliability, Responsiveness, Assurance, and Empathy (Parasuraman, Zeithaml, & Berry, 1988). This paper describes three separate studies employing SERVPERF in an Australian context. In the first of these studies (N = 113), a shortened 15-item version of the SERVPERF scale (SERVPERF-R) was found to be suitable for use in an Australian small business setting. A five-factor structure was identifiable but the factors were highly correlated, suggesting that they were not clearly distinct. The tendency for marked negative skewness observed by other researchers was also noted here. A follow-up study involving three other small businesses (N = 212) used Rasch analysis to test assumptions about the spread of items on the underlying continuum. These analyses indicated that there is an even, though narrow, spread of items across the continuum. The Rasch analysis suggested that the items in both SERVPERF and SERVPERF-R are too easy to rate highly and that more "difficult" items need to be added to the scale. The third study (N = 122) was conducted using a version of SERVPERF-R that included seven new items intended to extend the range of the scale. The new items, however, did not achieve this desirable outcome. The implications for service quality assessment are discussed.
Prevalence of responsible hospitality policies in licensed premises that are associated with alcohol-related harm.

PubMed

Daly, Justine B; Campbell, Elizabeth M; Wiggers, John H; Considine, Robyn J

2002-06-01

This study aimed to determine the prevalence of responsible hospitality policies in a group of licensed premises associated with alcohol-related harm. During March 1999, 108 licensed premises with one or more police-identified alcohol-related incidents in the previous 3 months received a visit from a police officer. A 30-item audit checklist was used to determine the responsible hospitality policies being undertaken by each premises within eight policy domains: display required signage (three items); responsible host practices to prevent intoxication and under-age drinking (five items); written policies and guidelines for responsible service (three items); discouraging inappropriate promotions (three items); safe transport (two items); responsible management issues (seven items); physical environment (three items) and entry conditions (four items). No premises were undertaking all 30 items. Eighty per cent of the premises were undertaking 20 of the 30 items. All premises were undertaking at least 17 of the items. The proportion of premises undertaking individual items ranged from 16% to 100%. Premises were less likely to report having and providing written responsible hospitality documentation to staff, using door charges and having entry/re-entry rules. Significant differences between rural and urban premises were evident for four policies. Clubs were significantly more likely than hotels to have a written responsible service of alcohol policy and to clearly display codes of dress and conditions of entry. This study provides an indication of the extent and nature of responsible hospitality policies in a sample of licensed premises that are associated with a broad range of alcohol related harms. The finding that a large majority of such premises appear to adopt responsible hospitality policies suggests a need to assess the validity and reliability of tools used in the routine assessment of such policies, and of the potential for harm from licensed premises.
Item Response Data Analysis Using Stata Item Response Theory Package

ERIC Educational Resources Information Center

Yang, Ji Seung; Zheng, Xiaying

2018-01-01

The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
Item Response Models for Local Dependence among Multiple Ratings

ERIC Educational Resources Information Center

Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan

2014-01-01

Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…
Item response theory - A first approach

NASA Astrophysics Data System (ADS)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Consensus recommendations for improvement of unmet clinical needs--the example of chronic graft-versus-host disease: a systematic review and meta-analysis.

PubMed

Olivieri, Jacopo; Manfredi, Lucia; Postacchini, Laura; Tedesco, Silvia; Leoni, Pietro; Gabrielli, Armando; Rambaldi, Alessandro; Bacigalupo, Andrea; Olivieri, Attilio; Pomponio, Giovanni

2015-07-01

Consensus recommendations are used to improve the methodology of research about rare disorders, but their uptake is unknown. We studied the uptake of consensus recommendations in steroid-refractory chronic graft-versus-host disease (SR-cGVHD). Although in 2006 the National Institutes of Health (NIH) cGVHD consensus project produced recommendations for clinical trials, guidelines have emphasised the scarcity of valuable evidence for all tested interventions. We searched Medline (PubMed) between Jan 1, 1998, and Oct 1, 2013, for non-randomised studies of systemic treatment for SR-cGVHD. To measure adherence to NIH recommendations, we applied a 61 item checklist derived from the NIH consensus document. We did a meta-analysis to measure pooled effect size for overall response rate (ORR) and meta-regression analyses to measure the effect of deviations from NIH recommendations on pooled effect size. We included 82 studies related to nine interventions. Conformity to NIH recommendations was evenly low across the analysed timeframe (1998-2013), and did not change significantly after publication of NIH recommendations. The pooled effect size for ORR for systemic treatment of SR-cGVHD was 0.66 (95% CI 0.62-0.70). Increased adherence to NIH recommendations in a score of items defining correct response assessment was associated with a significant reduction in ORR (-4.2%, 95% CI -6.6 to -1.9; p=0.001). We recorded no significant association between ORR and sets of items related to correct diagnostic definition of SR-cGVHD (change in ORR -3.1%, 95% CI -7.7 to 1.5), specification of primary intervention (0, -3.8 to 3.6), or concomitant treatments (-1.6%, -5.4 to 2.3). The score of items defining correct response assessment increased after publication of NIH recommendations. Our findings show evidence of bias in the reported efficacy of treatment of SR-cGVHD. The overall effect of NIH recommendations in scientific literature is scarce; however, NIH recommendations improved assessment of response, possibly reducing the overestimation bias. Better implementation of NIH recommendations might reduce false expectations about new interventions, and thus prevent clinical studies with ineffective treatments. None. Copyright © 2015 Elsevier Ltd. All rights reserved.
Is it nutrients, food items, diet quality or eating behaviours that are responsible for the association of children's diet with sleep?

PubMed

Khan, Mohammad K A; Faught, Erin L; Chu, Yen Li; Ekwaru, John P; Storey, Kate E; Veugelers, Paul J

2017-08-01

Both diet quality and sleep duration of children have declined in the past decades. Several studies have suggested that diet and sleep are associated; however, it is not established which aspects of the diet are responsible for this association. Is it nutrients, food items, diet quality or eating behaviours? We surveyed 2261 grade 5 children on their dietary intake and eating behaviours, and their parents on their sleep duration and sleep quality. We performed factor analysis to identify and quantify the essential factors among 57 nutrients, 132 food items and 19 eating behaviours. We considered these essential factors along with a diet quality score in multivariate regression analyses to assess their independent associations with sleep. Nutrients, food items and diet quality did not exhibit independent associations with sleep, whereas two groupings of eating behaviours did. 'Unhealthy eating habits and environments' was independently associated with sleep. For each standard deviation increase in their factor score, children had 6 min less sleep and were 12% less likely to have sleep of good quality. 'Snacking between meals and after supper' was independently associated with sleep quality. For each standard deviation increase in its factor score, children were 7% less likely to have good quality sleep. This study demonstrates that eating behaviours are responsible for the associations of diet with sleep among children. Health promotion programmes aiming to improve sleep should therefore focus on discouraging eating behaviours such as eating alone or in front of the TV, and snacking between meals and after supper. © 2016 European Sleep Research Society.
Factor structure and convergent validity of the Derriford Appearance Scale-24 using standard scoring versus treating ‘not applicable’ responses as missing data: a Scleroderma Patient-centered Intervention Network (SPIN) cohort study

PubMed Central

Merz, Erin L; Kwakkenbos, Linda; Carrier, Marie-Eve; Gholizadeh, Shadi; Mills, Sarah D; Fox, Rina S; Jewett, Lisa R; Williamson, Heidi; Harcourt, Diana; Assassi, Shervin; Furst, Daniel E; Gottesman, Karen; Mayes, Maureen D; Moss, Tim P; Thombs, Brett D; Malcarne, Vanessa L

2018-01-01

Objective Valid measures of appearance concern are needed in systemic sclerosis (SSc), a rare, disfiguring autoimmune disease. The Derriford Appearance Scale-24 (DAS-24) assesses appearance-related distress related to visible differences. There is uncertainty regarding its factor structure, possibly due to its scoring method. Design Cross-sectional survey. Setting Participants with SSc were recruited from 27 centres in Canada, the USA and the UK. Participants who self-identified as having visible differences were recruited from community and clinical settings in the UK. Participants Two samples were analysed (n=950 participants with SSc; n=1265 participants with visible differences). Primary and secondary outcome measures The DAS-24 factor structure was evaluated using two scoring methods. Convergent validity was evaluated with measures of social interaction anxiety, depression, fear of negative evaluation, social discomfort and dissatisfaction with appearance. Results When items marked by respondents as ‘not applicable’ were scored as 0, per standard DAS-24 scoring, a one-factor model fit poorly; when treated as missing data, the one-factor model fit well. Convergent validity analyses revealed strong correlations that were similar across scoring methods. Conclusions Treating ‘not applicable’ responses as missing improved the measurement model, but did not substantively influence practical inferences that can be drawn from DAS-24 scores. Indications of item redundancy and poorly performing items suggest that the DAS-24 could be improved and potentially shortened. PMID:29511009
An item response theory analysis of DSM-IV criteria for hallucinogen abuse and dependence in adolescents

PubMed Central

Wu, Li-Tzy; Pan, Jeng-Jong; Yang, Chongming; Reeve, Bryce B.; Blazer, Dan G.

2009-01-01

Aim This study applied both item response theory (IRT) and multiple indicators–multiple causes (MIMIC) methods to evaluate item-level psychometric properties of diagnostic questions for hallucinogen use disorders (HUDs), differential item functioning (DIF), and predictors of latent HUD. Methods Data were drawn from 2004–2006 National Surveys on Drug Use and Health. Analyses were based on 1548 past-year hallucinogen users aged 12–17 years. Substance use and symptoms were assessed by audio computer-assisted self-interviewing methods. Results Abuse and dependence criteria empirically were arrayed along a single continuum of severity. All abuse criteria indicated middle-to-high severity on the IRT-defined HUD continuum, while dependence criteria captured a wider range from the lowest (tolerance and time spent) to the highest (taking larger amounts and inability to cut down) severity levels. There was indication of DIF by hallucinogen users’ age, gender, race/ethnicity, and ecstasy use status. Adjusting for DIF, ecstasy users (vs. non-ecstasy hallucinogen users), females (vs. males), and whites (vs. Hispanics) exhibited increased odds of HUD. Conclusions Symptoms of hallucinogen abuse and dependence empirically do not reflect two discrete conditions in adolescents. Trends and problems related to hallucinogen use among girls and whites should be examined further to inform the designs of effective gender-appropriate and culturally sensitive prevention programs. PMID:19896773
Development of a Research Participants’ Perception Survey to Improve Clinical Research

PubMed Central

Yessis, Jennifer L.; Kost, Rhonda G.; Lee, Laura M.; Coller, Barry S.; Henderson, David K.

2012-01-01

Abstract Introduction: Clinical research participants’ perceptions regarding their experiences during research protocols provide outcome‐based insights into the effectiveness of efforts to protect rights and safety, and opportunities to enhance participants’ clinical research experiences. Use of validated surveys measuring patient‐centered outcomes is standard in hospitals, yet no instruments exist to assess outcomes of clinical research processes. Methods: We derived survey questions from data obtained from focus groups comprised of research participants and professionals. We assessed the survey for face/content validity, and privacy/confidentiality protections and fielded it to research participants at 15 centers. We conducted analyses of response rates, sample characteristics, and psychometrics, including survey and item completion and analysis, internal consistency, item internal consistency, criterion‐related validity, and item usefulness. Responses were tested for fit into existing patient‐centered dimensions of care and new clinical research dimensions using Cronbach's alpha coefficient. Results: Surveys were mailed to 18,890 individuals; 4,961 were returned (29%). Survey completion was 89% overall; completion rates exceeded 90% for 88 of 93 evaluable items. Questions fit into three dimensions of patient‐centered care and two novel clinical research dimensions (Cronbach's alpha for dimensions: 0.69–0.85). Conclusions: The validated survey offers a new method for assessing and improving outcomes of clinical research processes. Clin Trans Sci 2012; Volume 5: 452–460 PMID:23253666
Validation of the French version of the Hospital Survey on Patient Safety Culture questionnaire.

PubMed

Occelli, P; Quenon, J-L; Kret, M; Domecq, S; Delaperche, F; Claverie, O; Castets-Fontaine, B; Amalberti, R; Auroy, Y; Parneix, P; Michel, P

2013-09-01

To assess the psychometric properties of the French version of the Hospital Survey on Patient Safety Culture questionnaire (HSOPSC) and study the hierarchical structure of the measured dimensions. Cross-sectional survey of the safety culture. 18 acute care units of seven hospitals in South-western France. Full- and part-time healthcare providers who worked in the units. None. Item responses measured with 5-point agreement or frequency scales. Data analyses A principal component analysis was used to identify the emerging components. Two structural equation modeling methods [LInear Structural RELations (LISREL) and Partial Least Square (PLS)] were used to verify the model and to study the relative importance of the dimensions. Internal consistency of the retained dimensions was studied. A test-retest was performed to assess reproducibility of the items. Overall response rate was 77% (n = 401). A structure in 40 items grouped in 10 dimensions was proposed. The LISREL approach showed acceptable data fit of the proposed structure. The PLS approach indicated that three dimensions had the most impact on the safety culture: 'Supervisor/manager expectations & actions promoting safety' 'Organizational learning-continuous improvement' and 'Overall perceptions of safety'. Internal consistency was above 0.70 for six dimensions. Reproducibility was considered good for four items. The French HSOPSC questionnaire showed acceptable psychometric properties. Classification of the dimensions should guide future development of safety culture improving action plans.
Examining sex differences in DSM-IV-TR narcissistic personality disorder symptom expression using Item Response Theory (IRT).

PubMed

Hoertel, Nicolas; Peyre, Hugo; Lavaud, Pierre; Blanco, Carlos; Guerin-Langlois, Christophe; René, Margaux; Schuster, Jean-Pierre; Lemogne, Cédric; Delorme, Richard; Limosin, Frédéric

2017-12-14

The limited published literature on the subject suggests that there may be differences in how females and males experience narcissistic personality disorder (NPD) symptoms. The aim of this study was to use methods based on item response theory to examine whether, when equating for levels of NPD symptom severity, there are sex differences in the likelihood of reporting DSM-IV-TR NPD symptoms. We conducted these analyses using a large, nationally representative sample from the USA (n=34,653), the second wave of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). There were statistically and clinically significant sex differences for 2 out of the 9 DSM-IV-TR NPD symptoms. We found that males were more likely to endorse the item 'lack of empathy' at lower levels of narcissistic personality disorder severity than females. The item 'being envious' was a better indicator of NPD severity in males than in females. There were no clinically significant sex differences on the remaining NPD symptoms. Overall, our findings indicate substantial sex differences in narcissistic personality disorder symptom expression. Although our results may reflect sex-bias in diagnostic criteria, they are consistent with recent views suggesting that narcissistic personality disorder may be underpinned by shared and sex-specific mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.
Frontoparietal network involved in successful retrieval from episodic memory. Spatial and temporal analyses using fMRI and ERP.

PubMed

Iidaka, Tetsuya; Matsumoto, Atsushi; Nogawa, Junpei; Yamamoto, Yukiko; Sadato, Norihiro

2006-09-01

The neural basis for successful recognition of previously studied items, referred to as "retrieval success," has been investigated using either neuroimaging or brain potentials; however, few studies have used both modalities. Our study combined event-related functional magnetic resonance imaging (fMRI) and event-related potential (ERP) in separate groups of subjects. The neural responses were measured while the subjects performed an old/new recognition task with pictures that had been previously studied in either a deep- or shallow-encoding condition. The fMRI experiment showed that among the frontoparietal regions involved in retrieval success, the inferior frontal gyrus and intraparietal sulcus were crucial to conscious recollection because the activity of these regions was influenced by the depth of memory at encoding. The activity of the right parietal region in response to a repeated item was modulated by the repetition lag, indicating that this area would be critical to familiarity-based judgment. The results of structural equation modeling revealed that the functional connectivity among the regions in the left hemisphere was more significant than that in the right hemisphere. The results of the ERP experiment and independent component analysis paralleled those of the fMRI experiment and demonstrated that the repeated item produced an earlier peak than the hit item by approximately 50 ms.
Development of a mobbing short scale in the Gutenberg Health Study.

PubMed

Garthus-Niegel, Susan; Nübling, Matthias; Letzel, Stephan; Hegewald, Janice; Wagner, Mandy; Wild, Philipp S; Blettner, Maria; Zwiener, Isabella; Latza, Ute; Jankowiak, Sylvia; Liebers, Falk; Seidler, Andreas

2016-01-01

Despite its highly detrimental potential, most standard questionnaires assessing psychosocial stress at work do not include mobbing as a risk factor. In the German standard version of COPSOQ, mobbing is assessed with a single item. In the Gutenberg Health Study, this version was used together with a newly developed short scale based on the Leymann Inventory of Psychological Terror. The purpose of the present study was to evaluate the psychometric properties of these two measures, to compare them and to test their differential impact on relevant outcome parameters. This analysis is based on a population-based sample of 1441 employees participating in the Gutenberg Health Study. Exploratory and confirmatory factor analyses and reliability analyses were used to assess the mobbing scale. To determine their predictive validities, multiple linear regression analyses with six outcome parameters and log-binomial regression models for two of the outcome aspects were run. Factor analyses of the five-item scale confirmed a one-factor solution, reliability was α = 0.65. Both the single-item and the five-item scales were associated with all six outcome scales. Effect sizes were similar for both mobbing measures. Mobbing is an important risk factor for health-related outcomes. For the purpose of psychosocial risk assessment in the workplace, both the single-item and the five-item constructs were psychometrically appropriate. Associations with outcomes were about equivalent. However, the single item has the advantage of parsimony, whereas the five-item construct depicts several distinct forms of mobbing.

Measuring and understanding the attitudes of Australian gay and bisexual men towards biomedical HIV prevention using cross-sectional data and factor analyses.

PubMed

Wilkinson, Anna L; Draper, Bridget L; Pedrana, Alisa E; Asselin, Jason; Holt, Martin; Hellard, Margaret E; Stoové, Mark

2017-11-21

Contemporary responses to HIV embrace biomedical prevention, particularly treatment as prevention (TasP) and pre-exposure prophylaxis (PrEP). However, large-scale implementation of biomedical prevention should be ideally preceded by assessments of their community acceptability. We aimed to understand contemporary attitudes of gay and bisexual men (GBM) in Australia towards biomedical-based HIV prevention and propose a framework for their measurement and ongoing monitoring. A cross-sectional, online survey of GBM ≥18 years has been conducted annually in Victoria, Australia, since 2008. In 2016, 35 attitudinal items on biomedical HIV prevention were added. Items were scored on five-point Likert scales. We used principal factor analysis to identify key constructs related to GBM's attitudes to biomedical HIV prevention and use these to characterise levels of support for TasP and PrEP. A total of 462 HIV-negative or HIV-status-unknown men, not using PrEP, provided valid responses for all 35 attitudinal items. We extracted four distinct and interpretable factors we named: 'Confidence in PrEP', 'Judicious approach to PrEP', 'Treatment as prevention optimism' and 'Support for early treatment'. High levels of agreement were seen across PrEP-related items; 77.9% of men agreed that PrEP prevented HIV acquisition and 83.6% of men agreed that users were protecting themselves. However, the agreement levels for HIV TasP items were considerably lower, with <20% of men agreeing treatment (undetectable viral load) reduced HIV transmission risk. Better understanding of community attitudes is crucial for shaping policy and informing initiatives that aim to improve knowledge, acceptance and uptake of biomedical prevention. Our analyses suggest confidence in, acceptability of and community support for PrEP among GBM. However, strategies to address scepticism towards HIV treatment when used for prevention may be needed to optimise combination biomedical HIV prevention. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Development and Validation of the Human Papillomavirus Attitudes and Beliefs Scale in a National Canadian Sample.

PubMed

Perez, Samara; Shapiro, Gilla K; Tatar, Ovidiu; Joyal-Desmarais, Keven; Rosberger, Zeev

2016-10-01

Parents' human papillomavirus (HPV) vaccination decision-making is strongly influenced by their attitudes and beliefs toward vaccination. To date, psychometrically evaluated HPV vaccination attitudes scales have been narrow in their range of measured beliefs and often limited to attitudes surrounding female HPV vaccination. The study aimed to develop a comprehensive, validated and reliable HPV vaccination attitudes and beliefs scale among parents of boys. Data were collected from Canadian parents of 9- to 16-year-old boys using an online questionnaire completed in 2 waves with a 7-month interval. Based on existing vaccination attitudes scales, a set of 61 attitude and belief items were developed. Exploratory and confirmatory factor analyses were conducted. Internal consistency was evaluated with Cronbach's α and stability over time with intraclass correlations. The HPV Attitudes and Beliefs Scale (HABS) was informed by 3117 responses at time 1 and 1427 at time 2. The HABS contains 46 items organized in 9 factors: Benefits (10 items), Threat (3 items), Influence (8 items), Harms (6 items), Risk (3 items), Affordability (3 items), Communication (5 items), Accessibility (4 items), and General Vaccination Attitudes (4 items). Model fit at time 2 were: χ/df = 3.13, standardized root mean square residual = 0.056, root mean square error approximation (confidence interval) = 0.039 (0.037-0.04), comparative fit index = 0.962 and Tucker-Lewis index = 0.957. Cronbach's αs were greater than 0.8 and intraclass correlations of factors were greater than 0.6. The HABS is the first psychometrically-tested scale of HPV attitude and beliefs among parents of boys available for use in English and French. Further testing among parents of girls and young adults and assessing predictive validity are warranted.
Design and validation of a questionnaire to assess organizational culture in French hospital wards.

PubMed

Saillour-Glénisson, F; Domecq, S; Kret, M; Sibe, M; Dumond, J P; Michel, P

2016-09-17

Although many organizational culture questionnaires have been developed, there is a lack of any validated multidimensional questionnaire assessing organizational culture at hospital ward level and adapted to health care context. Facing the lack of an appropriate tool, a multidisciplinary team designed and validated a dimensional organizational culture questionnaire for healthcare settings to be administered at ward level. A database of organizational culture items and themes was created after extensive literature review. Items were regrouped into dimensions and subdimensions (classification validated by experts). Pre-test and face validation was conducted with 15 health care professionals. In a stratified cluster random sample of hospitals, the psychometric validation was conducted in three phases on a sample of 859 healthcare professionals from 36 multidisciplinary medicine services: 1) the exploratory phase included a description of responses' saturation levels, factor and correlations analyses and an internal consistency analysis (Cronbach's alpha coefficient); 2) confirmatory phase used the Structural Equation Modeling (SEM); 3) reproducibility was studied by a test-retest. The overall response rate was 80 %; the completion average was 97 %. The metrological results were: a global Cronbach's alpha coefficient of 0.93, higher than 0.70 for 12 sub-dimensions; all Dillon-Goldstein's rho coefficients higher than 0.70; an excellent quality of external model with a Goodness of Fitness (GoF) criterion of 0.99. Seventy percent of the items had a reproducibility ranging from moderate (Intra-Class Coefficient between 50 and 70 % for 25 items) to good (ICC higher than 70 % for 33 items). COMEt (Contexte Organisationnel et Managérial en Etablissement de Santé) questionnaire is a validated multidimensional organizational culture questionnaire made of 6 dimensions, 21 sub-dimensions and 83 items. It is the first dimensional organizational culture questionnaire, specific to healthcare context, for a unit level assessment showing robust psychometric properties (validity and reliability). This tool is suited for research purposes, especially for assessing organizational context in research analysing the effectiveness of hospital quality improvement strategies. Our tool is also suited for an overall assessment of ward culture and could be a powerful trigger to improve management and clinical performance. Its psychometric properties in other health systems need to be tested.
Are adolescents with anorexia nervosa better at reading minds?

PubMed

Laghi, Fiorenzo; Pompili, Sara; Zanna, Valeria; Castiglioni, Maria Chiara; Criscuolo, Michela; Chianello, Ilenia; Baumgartner, Emma; Baiocco, Roberto

2015-01-01

The present study aimed to investigate mindreading abilities in female adolescent patients with AN compared to healthy controls (HCs), analysing differences for emotional valence of facial stimuli. The Eating Disorder Inventory) for evaluating psychological traits associated with eating disorders and the Children's version of the Reading the Mind in the Eyes Test for evaluating mindreading abilities were administered to 40 Italian female patients (mean age = 14.93; SD = 1.48) with restrictive diagnosis of anorexia nervosa (AN) and 40 healthy females (mean age = 14.88; SD = 0.56). No significant differences between the AN group and HCs for the Eyes Total score were found. Even when analysing emotional valence of the items, the two groups were equally successful in the facial recognition of positive, negative and neutral emotions. A significant difference was revealed for the percentage of correct responses of item 10 and item 15, where the AN group was less able to correctly identify the target descriptor (Not believing) over the foils than HCs. A significant difference was revealed in discriminating for affective emotions versus cognitive states; only for affective but not for cognitive states, patients with AN were found to perform better than controls on the mindreading task. Our study highlighted the importance of analysing and discriminating for different valences of facial stimuli when assessing mindreading abilities in adolescents with AN, so that more precise and specific treatment approaches could be developed for female adolescents with AN.
The five item Barthel index

PubMed Central

Hobart, J; Thompson, A

2001-01-01

OBJECTIVES—Routine data collection is now considered mandatory. Therefore, staff rated clinical scales that consist of multiple items should have the minimum number of items necessary for rigorous measurement. This study explores the possibility of developing a short form Barthel index, suitable for use in clinical trials, epidemiological studies, and audit, that satisfies criteria for rigorous measurement and is psychometrically equivalent to the 10 item instrument. METHODS—Data were analysed from 844 consecutive admissions to a neurological rehabilitation unit in London. Random half samples were generated. Short forms were developed in one sample (n=419), by selecting items with the best measurement properties, and tested in the other (n=418). For each of the 10 items of the BI, item total correlations and effect sizes were computed and rank ordered. The best items were defined as those with the lowest cross product of these rank orderings. The acceptability, reliability, validity, and responsiveness of three short form BIs (five, four, and three item) were determined and compared with the 10 item BI. Agreement between scores generated by short forms and 10 item BI was determined using intraclass correlation coefficients and the method of Bland and Altman. RESULTS—The five best items in this sample were transfers, bathing, toilet use, stairs, and mobility. Of the three short forms examined, the five item BI had the best measurement properties and was psychometrically equivalent to the 10 item BI. Agreement between scores generated by the two measures for individual patients was excellent (ICC=0.90) but not identical (limits of agreement=1.84±3.84). CONCLUSIONS—The five item short form BI may be a suitable outcome measure for group comparison studies in comparable samples. Further evaluations are needed. Results demonstrate a fundamental difference between assessment and measurement and the importance of incorporating psychometric methods in the development and evaluation of health measures.   PMID:11459898
Development and initial validation of the assessment of caregiver experience with neuromuscular disease.

PubMed

Matsumoto, Hiroko; Clayton-Krasinski, Debora A; Klinge, Stephen A; Gomez, Jaime A; Booker, Whitney A; Hyman, Joshua E; Roye, David P; Vitale, Michael G

2011-01-01

Orthopaedic intervention can have a wide range of functional and psychosocial effects on children with neuromuscular disease (NMD). In the multihandicapped child (Gross Motor Classification System IV/V), functional status, pain, psychosocial function, and health-related quality of life also have effects on the families of these child. The purpose of this study is to report the development and initial validation of an outcomes instrument specifically designed to assess the caregiver impact experienced by parents raising severely affected NMD children: the Assessment of Caregiver Experience with Neuromuscular Disease (ACEND). In the first part of this prospective study, 61 children with NMD and their parents were administered a range of earlier validated pediatric health measures. A framework technique was used to select the most appropriate and relevant subset of questions from this large set. Sensitivity analyses guided the development of a master question list measuring caregiver impact, excluding items with low relevance, and modifying unclear questions. In the second part of the study, the ACEND was administered to the caregivers of 46 children with moderate-to-severe NMD. Statistical analyses were conducted to determine validity of the instrument. The resulting ACEND instrument included 2 domains, 7 subdomains, and 41 items. Domain 1, examining physical impact, includes 4 subdomains: feeding/grooming/dressing (6 items), sitting/play (5 items), transfers (5 items), and mobility (7 items). Domain 2, which examines general caregiver impact, included 3 subdomains: time (4 items), emotion (9 items), and finance (5 items). Mean overall relevance rating was 6.21 ± 0.37 and clarity rating was 6.68 ± 0.52 (scale 0 to 7). Multiple floor effects in patients with GMFCS V and ceiling effects in patients with GMFCS III were identified almost exclusively in motor-based items. Virtually no floor or ceiling effects were identified in the time, emotion or finance domains across GMFCS level. The initial validation demonstrated that ACEND is a valid, disease-specific measure to quantify experience on caregivers of children with NMD. Larger groups of patients across NMD disease type are currently being tested to strengthen validity findings. Additionally, the ACEND is now being administered before and after orthopaedic interventions to determine responsiveness, which is critical to health outcomes research. LEVEL OF EVIDENCE/RELEVANCE: IIc.
Development and Initial Validation of the Five-Factor Model Adolescent Personality Questionnaire (FFM-APQ).

PubMed

Rogers, Mary E; Glendon, A Ian

2018-01-01

This research reports on the 4-phase development of the 25-item Five-Factor Model Adolescent Personality Questionnaire (FFM-APQ). The purpose was to develop and determine initial evidence for validity of a brief adolescent personality inventory using a vocabulary that could be understood by adolescents up to 18 years old. Phase 1 (N = 48) consisted of item generation and expert (N = 5) review of items; Phase 2 (N = 179) involved item analyses; in Phase 3 (N = 496) exploratory factor analysis assessed the underlying structure; in Phase 4 (N = 405) confirmatory factor analyses resulted in a 25-item inventory with 5 subscales.
The ABC’s of Suicide Risk Assessment: Applying a Tripartite Approach to Individual Evaluations

PubMed Central

Harris, Keith M.; Syu, Jia-Jia; Lello, Owen D.; Chew, Y. L. Eileen; Willcox, Christopher H.; Ho, Roger H. M.

2015-01-01

There is considerable need for accurate suicide risk assessment for clinical, screening, and research purposes. This study applied the tripartite affect-behavior-cognition theory, the suicidal barometer model, classical test theory, and item response theory (IRT), to develop a brief self-report measure of suicide risk that is theoretically-grounded, reliable and valid. An initial survey (n = 359) employed an iterative process to an item pool, resulting in the six-item Suicidal Affect-Behavior-Cognition Scale (SABCS). Three additional studies tested the SABCS and a highly endorsed comparison measure. Studies included two online surveys (Ns = 1007, and 713), and one prospective clinical survey (n = 72; Time 2, n = 54). Factor analyses demonstrated SABCS construct validity through unidimensionality. Internal reliability was high (α = .86-.93, split-half = .90-.94)). The scale was predictive of future suicidal behaviors and suicidality (r = .68, .73, respectively), showed convergent validity, and the SABCS-4 demonstrated clinically relevant sensitivity to change. IRT analyses revealed the SABCS captured more information than the comparison measure, and better defined participants at low, moderate, and high risk. The SABCS is the first suicide risk measure to demonstrate no differential item functioning by sex, age, or ethnicity. In all comparisons, the SABCS showed incremental improvements over a highly endorsed scale through stronger predictive ability, reliability, and other properties. The SABCS is in the public domain, with this publication, and is suitable for clinical evaluations, public screening, and research. PMID:26030590
Psychometric Properties of the Cognitive Emotion Regulation Questionnaire (CERQ) in Patients with Fibromyalgia Syndrome.

PubMed

Feliu-Soler, Albert; Reche-Camba, Elvira; Borràs, Xavier; Pérez-Aranda, Adrián; Andrés-Rodríguez, Laura; Peñarrubia-María, María T; Navarro-Gil, Mayte; García-Campayo, Javier; Bellón, Juan A; Luciano, Juan V

2017-01-01

Given that Fibromyalgia Syndrome (FMS) is associated with problems in emotion regulation, the importance of assessing this construct is widely acknowledged by clinical psychologists and pain specialists. Although the Cognitive Emotion Regulation Questionnaire (CERQ) is a self-report measure used worldwide, there are no data on its psychometric properties in patients with FMS. This study analyzed the dimensionality, reliability, and validity of the CERQ in a sample of 231 patients with FMS. Given that "fibrofog" is one of the most disabling FMS symptoms, in the present study, items in the CERQ were grouped by dimension. This change in item presentation was conceived as an efficient way of facilitating responses as a result of a clear understanding of what the items related to each dimension are attempting to measure. The following battery of measures was administered: the CERQ, the Revised Fibromyalgia Impact Questionnaire, the Pain Catastrophizing Scale, the Center for Epidemiologic Studies Depression Scale, and the State-Trait Anxiety Inventory. Four models of the CERQ structure were examined and confirmatory factor analyses supported the original factor model, consisting of nine factors-Self-blame, Acceptance, Rumination, Positive refocusing, Refocus on planning, Positive reappraisal, Putting into perspective, Catastrophizing, and Other-blame. There was minimal overlap between CERQ subscales and their internal consistency was adequate. Correlational and regression analyses supported the construct validity of the CERQ. Our findings indicate that the CERQ (items-grouped version) is a sound instrument for assessing cognitive emotion regulation in patients with FMS.
Psychometric Properties of the Cognitive Emotion Regulation Questionnaire (CERQ) in Patients with Fibromyalgia Syndrome

PubMed Central

Feliu-Soler, Albert; Reche-Camba, Elvira; Borràs, Xavier; Pérez-Aranda, Adrián; Andrés-Rodríguez, Laura; Peñarrubia-María, María T.; Navarro-Gil, Mayte; García-Campayo, Javier; Bellón, Juan A.; Luciano, Juan V.

2017-01-01

Given that Fibromyalgia Syndrome (FMS) is associated with problems in emotion regulation, the importance of assessing this construct is widely acknowledged by clinical psychologists and pain specialists. Although the Cognitive Emotion Regulation Questionnaire (CERQ) is a self-report measure used worldwide, there are no data on its psychometric properties in patients with FMS. This study analyzed the dimensionality, reliability, and validity of the CERQ in a sample of 231 patients with FMS. Given that “fibrofog” is one of the most disabling FMS symptoms, in the present study, items in the CERQ were grouped by dimension. This change in item presentation was conceived as an efficient way of facilitating responses as a result of a clear understanding of what the items related to each dimension are attempting to measure. The following battery of measures was administered: the CERQ, the Revised Fibromyalgia Impact Questionnaire, the Pain Catastrophizing Scale, the Center for Epidemiologic Studies Depression Scale, and the State-Trait Anxiety Inventory. Four models of the CERQ structure were examined and confirmatory factor analyses supported the original factor model, consisting of nine factors—Self-blame, Acceptance, Rumination, Positive refocusing, Refocus on planning, Positive reappraisal, Putting into perspective, Catastrophizing, and Other-blame. There was minimal overlap between CERQ subscales and their internal consistency was adequate. Correlational and regression analyses supported the construct validity of the CERQ. Our findings indicate that the CERQ (items-grouped version) is a sound instrument for assessing cognitive emotion regulation in patients with FMS. PMID:29321750
A Multidimensional Ideal Point Item Response Theory Model for Binary Data

ERIC Educational Resources Information Center

Maydeu-Olivares, Albert; Hernandez, Adolfo; McDonald, Roderick P.

2006-01-01

We introduce a multidimensional item response theory (IRT) model for binary data based on a proximity response mechanism. Under the model, a respondent at the mode of the item response function (IRF) endorses the item with probability one. The mode of the IRF is the ideal point, or in the multidimensional case, an ideal hyperplane. The model…
The trucker strain monitor: an occupation-specific questionnaire measuring psychological job strain.

PubMed

De Croon, E M; Blonk, R W; Van der Beek, J; Frings-Dresen, M H

2001-08-01

To develop and validate a short and user-friendly questionnaire measuring psychological job strain in truck drivers. In cooperation with an occupational physician in the Dutch road transport industry we developed items on the basis of face validity and information of existing questionnaires on the subject. These items were pilot-tested, by means of interviews, in 15 truck drivers. Study I examined the factorial structure of the initial 30-item trucker strain monitor (TSM) in a sample of 153 truck drivers. Subsequently, number of items per factor was reduced on the basis of reliability analyses (Cronbach's alpha). Study II examined construct and criterion validity of the TSM in a randomly selected group of 2,000 truck drivers, of whom 1,111 participated (adjusted response = 63%). Additionally, sensitivity and specificity were assessed by examining the ability of the TSM to identify truck drivers with or without self-reported sickness absence in the past 12 months because of psychological complaints. Factor analyses of the initial 30-item TSM revealed a two-factor solution. Item reduction resulted in a six-item work-related fatigue scale and four-item sleeping problems scale with high internal consistency. Results of study II confirmed the internal consistency of the TSM scales and provided support for construct and criterion validity. The composite, work-related fatigue, and sleeping problems scale had a sensitivity of 83%, 80% and 71% respectively, in identifying truck drivers with prior sickness absence because of psychological complaints. Specificity rates were 72%, 73% and 72% respectively. Despite methodological limitations, the results suggest that the TSM is a reliable and valid indicator of psychological job strain in truck drivers. In particular, the composite and work-related fatigue scale identified drivers with prior absenteeism because of psychological complaints, quite accurately. Future longitudinal research in specific sub-groups of truck drivers including both self-reported and objective psychological health measures should evidence whether (1) the distinction between two indicators of psychological job strain is useful, and whether (2) the TSM can be used in screening out truck drivers at risk of developing psychological health problems.
Improving the Factor Structure of Psychological Scales

PubMed Central

Zhang, Xijuan; Savalei, Victoria

2015-01-01

Many psychological scales written in the Likert format include reverse worded (RW) items in order to control acquiescence bias. However, studies have shown that RW items often contaminate the factor structure of the scale by creating one or more method factors. The present study examines an alternative scale format, called the Expanded format, which replaces each response option in the Likert scale with a full sentence. We hypothesized that this format would result in a cleaner factor structure as compared with the Likert format. We tested this hypothesis on three popular psychological scales: the Rosenberg Self-Esteem scale, the Conscientiousness subscale of the Big Five Inventory, and the Beck Depression Inventory II. Scales in both formats showed comparable reliabilities. However, scales in the Expanded format had better (i.e., lower and more theoretically defensible) dimensionalities than scales in the Likert format, as assessed by both exploratory factor analyses and confirmatory factor analyses. We encourage further study and wider use of the Expanded format, particularly when a scale’s dimensionality is of theoretical interest. PMID:27182074
The SPAIC-11 and SPAICP-11: Two brief child- and parent-rated measures of social anxiety.

PubMed

Bunnell, Brian E; Beidel, Deborah C; Liu, Liwen; Joseph, Dana L; Higa-McMillan, Charmaine

2015-12-01

The Social Phobia and Anxiety Inventory for Children-11 (SPAIC-11) and Social Phobia and Anxiety Inventory for Children's Parents-11 (SPAICP-11) were developed as brief versions of the Social Phobia and Anxiety Inventory--Child and Parent Versions via item response theory (IRT) using child and parent reports of social anxiety. A sample of 496 children was analyzed using IRT analyses, revealing 11 items that exhibit measurement equivalence across parent and child reports. Descriptive and psychometric data are provided for the child, parent, and combined total scores. Discriminant validity was demonstrated using logistic regression and receiver operating characteristic curve analyses. The SPAIC-11 and SPAICP-11 are psychometrically sound measures that are able to measure social anxiety invariantly across children and their parents. These brief measures which include combined parent and child perception of the child's social anxiety may provide notable benefits to clinical research. Copyright © 2015 Elsevier Ltd. All rights reserved.
Analysis of the psychometric properties of the Multiple Sclerosis Impact Scale-29 (MSIS-29) in relapsing–remitting multiple sclerosis using classical and modern test theory

PubMed Central

Wyrwich, KW; Phillips, GA; Vollmer, T; Guo, S

2016-01-01

Background Investigations using classical test theory support the psychometric properties of the original version of the Multiple Sclerosis Impact Scale (MSIS-29v1), a disease-specific measure of multiple sclerosis (MS) impact (physical and psychological subscales). Later, assessments of the MSIS-29v1 in an MS community-based sample using Rasch analysis led to revisions of the instrument’s response options (MSIS-29v2). Objective The objective of this paper is to evaluate the psychometric properties of the MSIS-29v1 in a clinical trial cohort of relapsing–remitting MS patients (RRMS). Methods Data from 600 patients with RRMS enrolled in the SELECT clinical trial were used. Assessments were performed at baseline and at Weeks 12, 24, and 52. In addition to traditional psychometric analyses, Item Response Theory (IRT) and Rasch analysis were used to evaluate the measurement properties of the MSIS-29v1. Results Both MSIS-29v1 subscales demonstrated strong reliability, construct validity, and responsiveness. The IRT and Rasch analysis showed overall support for response category threshold ordering, person-item fit, and item fit for both subscales. Conclusions Both MSIS-29v1 subscales demonstrated robust measurement properties using classical, IRT, and Rasch techniques. Unlike previous research using a community-based sample, the MSIS-29v1 was found to be psychometrically sound to assess physical and psychological impairments in a clinical trial sample of patients with RRMS. PMID:28607741
Analysis of the psychometric properties of the Multiple Sclerosis Impact Scale-29 (MSIS-29) in relapsing-remitting multiple sclerosis using classical and modern test theory.

PubMed

Bacci, E D; Wyrwich, K W; Phillips, G A; Vollmer, T; Guo, S

2016-01-01

Investigations using classical test theory support the psychometric properties of the original version of the Multiple Sclerosis Impact Scale (MSIS-29v1), a disease-specific measure of multiple sclerosis (MS) impact (physical and psychological subscales). Later, assessments of the MSIS-29v1 in an MS community-based sample using Rasch analysis led to revisions of the instrument's response options (MSIS-29v2). The objective of this paper is to evaluate the psychometric properties of the MSIS-29v1 in a clinical trial cohort of relapsing-remitting MS patients (RRMS). Data from 600 patients with RRMS enrolled in the SELECT clinical trial were used. Assessments were performed at baseline and at Weeks 12, 24, and 52. In addition to traditional psychometric analyses, Item Response Theory (IRT) and Rasch analysis were used to evaluate the measurement properties of the MSIS-29v1. Both MSIS-29v1 subscales demonstrated strong reliability, construct validity, and responsiveness. The IRT and Rasch analysis showed overall support for response category threshold ordering, person-item fit, and item fit for both subscales. Both MSIS-29v1 subscales demonstrated robust measurement properties using classical, IRT, and Rasch techniques. Unlike previous research using a community-based sample, the MSIS-29v1 was found to be psychometrically sound to assess physical and psychological impairments in a clinical trial sample of patients with RRMS.
Evaluation of the Parent-Report Inventory of Callous-Unemotional Traits in a Sample of Children Recruited from Intimate Partner Violence Services: A Multidimensional Rasch Analysis.

PubMed

McDonald, Shelby Elaine; Ma, Lin; Green, Kathy E; Hitti, Stephanie A; Cody, Anna M; Donovan, Courtney; Williams, James Herbert; Ascione, Frank R

2018-03-01

Our study applied multidimensional item response theory (MIRT) to compare structural models of the parent-report version of the Inventory of Callous and Unemotional Traits (ICU; English and North American Spanish translations). A total of 291 maternal caregivers were recruited from community-based domestic violence services and reported on their children (77.9% ethnic minority; 47% female), who ranged in age from 7 to 12 years (mean = 9.07, standard deviation = 1.64). We compared 9 models that were based on prior psychometric evaluations of the ICU. MIRT analyses indicated that a revised 18-item version comprising 2 factors (callous-unemotional and empathic-prosocial) was more suitable for our sample. Differential item functioning was found for several items across ethnic and language groups, but not for child gender or age. Evidence of construct validity was found. We recommend continued research and revisions to the ICU to better assess the presence of callous-unemotional traits in community samples of school-age children. © 2017 Wiley Periodicals, Inc.
Self-report measure of financial exploitation of older adults.

PubMed

Conrad, Kendon J; Iris, Madelyn; Ridings, John W; Langley, Kate; Wilber, Kathleen H

2010-12-01

this study was designed to improve the measurement of financial exploitation (FE) by testing psychometric properties of the older adult financial exploitation measure (OAFEM), a client self-report instrument. rasch item response theory and traditional validation approaches were used. Questionnaires were administered by 22 adult protective services investigators from 7 agencies in Illinois to 227 substantiated abuse clients. Analyses included tests for dimensionality, model fit, and additional construct validation. Results from the OAFEM were also compared with the substantiation decision of abuse and with investigators' assessments of FE using a staff report version. Hypotheses were generated to test hypothesized relationships. the OAFEM, including the original 79-, 54-, and 30-item measures, met stringent Rasch analysis fit and unidimensionality criteria and had high internal consistency and item reliability. The validation results were supportive, while leading to reconsideration of aspects of the hypothesized theoretical hierarchy. Thresholds were suggested to demonstrate levels of severity. the measure is now available to aid in the assessment of FE of older adults by both clinicians and researchers. Theoretical refinements developed using the empirically generated item hierarchy may help to improve assessment and intervention.
The case for an international patient-reported outcomes measurement information system (PROMIS®) initiative

PubMed Central

2013-01-01

Patient-reported outcomes (PROs) play an increasingly important role in clinical practice and research. Modern psychometric methods such as item response theory (IRT) enable the creation of item banks that support fixed-length forms as well as computerized adaptive testing (CAT), often resulting in improved measurement precision and responsiveness. Here we describe and discuss the case for developing an international core set of PROs building from the US PROMIS® network. PROMIS is a U.S.-based cooperative group of research sites and centers of excellence convened to develop and standardize PRO measures across studies and settings. If extended to a global collaboration, PROMIS has the potential to transform PRO measurement by creating a shared, unifying terminology and metric for reporting of common symptoms and functional life domains. Extending a common set of standardized PRO measures to the international community offers great potential for improving patient-centered research, clinical trials reporting, population monitoring, and health care worldwide. Benefits of such standardization include the possibility of: international syntheses (such as meta-analyses) of research findings; international population monitoring and policy development; health services administrators and planners access to relevant information on the populations they serve; better assessment and monitoring of patients by providers; and improved shared decision making. The goal of the current PROMIS International initiative is to ensure that item banks are translated and culturally adapted for use in adults and children in as many countries as possible. The process includes 3 key steps: translation/cultural adaptation, calibration, and validation. A universal translation, an approach focusing on commonalities, rather than differences across versions developed in regions or countries speaking the same language, is proposed to ensure conceptual equivalence for all items. International item calibration using nationally representative samples of adults and children within countries is essential to demonstrate that all items possess expected strong measurement properties. Finally, it is important to demonstrate that the PROMIS measures are valid, reliable and responsive to change when used in an international context. IRT item banking will allow for tailoring within countries and facilitate growth and evolution of PROs through contributions from the international measurement community. A number of opportunities and challenges of international development of PROs item banks are discussed. PMID:24359143
The case for an international patient-reported outcomes measurement information system (PROMIS®) initiative.

PubMed

Alonso, Jordi; Bartlett, Susan J; Rose, Matthias; Aaronson, Neil K; Chaplin, John E; Efficace, Fabio; Leplège, Alain; Lu, Aiping; Tulsky, David S; Raat, Hein; Ravens-Sieberer, Ulrike; Revicki, Dennis; Terwee, Caroline B; Valderas, Jose M; Cella, David; Forrest, Christopher B

2013-12-20

Patient-reported outcomes (PROs) play an increasingly important role in clinical practice and research. Modern psychometric methods such as item response theory (IRT) enable the creation of item banks that support fixed-length forms as well as computerized adaptive testing (CAT), often resulting in improved measurement precision and responsiveness. Here we describe and discuss the case for developing an international core set of PROs building from the US PROMIS® network.PROMIS is a U.S.-based cooperative group of research sites and centers of excellence convened to develop and standardize PRO measures across studies and settings. If extended to a global collaboration, PROMIS has the potential to transform PRO measurement by creating a shared, unifying terminology and metric for reporting of common symptoms and functional life domains. Extending a common set of standardized PRO measures to the international community offers great potential for improving patient-centered research, clinical trials reporting, population monitoring, and health care worldwide. Benefits of such standardization include the possibility of: international syntheses (such as meta-analyses) of research findings; international population monitoring and policy development; health services administrators and planners access to relevant information on the populations they serve; better assessment and monitoring of patients by providers; and improved shared decision making.The goal of the current PROMIS International initiative is to ensure that item banks are translated and culturally adapted for use in adults and children in as many countries as possible. The process includes 3 key steps: translation/cultural adaptation, calibration, and validation. A universal translation, an approach focusing on commonalities, rather than differences across versions developed in regions or countries speaking the same language, is proposed to ensure conceptual equivalence for all items. International item calibration using nationally representative samples of adults and children within countries is essential to demonstrate that all items possess expected strong measurement properties. Finally, it is important to demonstrate that the PROMIS measures are valid, reliable and responsive to change when used in an international context.IRT item banking will allow for tailoring within countries and facilitate growth and evolution of PROs through contributions from the international measurement community. A number of opportunities and challenges of international development of PROs item banks are discussed.

Student beliefs and learning environments: Developing a survey of factors related to conceptual change

NASA Astrophysics Data System (ADS)

Hanrahan, Mary

1994-12-01

This paper presents a model for the type of classroom environment believed to facilitate scientific conceptual change. A survey based on this model contains items about students' motivational beliefs, their study approach and their perceptions of their teacher's actions and learning goal orientation. Results obtained from factor analyses, correlations and analyses of variance, based on responses from 113 students, suggest that an empowering interpersonal teacher-student relationship is related to a deep approach to learning, a positive attitude to science, and positive self-efficacy beliefs, and may be increased by a constructivist approach to teaching.
Psychometric evaluation of an inpatient consumer survey measuring satisfaction with psychiatric care.

PubMed

Ortiz, Glorimar; Schacht, Lucille

2012-01-01

Measurement of consumers' satisfaction in psychiatric settings is important because it has been correlated with improved clinical outcomes and administrative measures of high-quality care. These consumer satisfaction measurements are actively used as performance measures required by the accreditation process and for quality improvement activities. Our objectives were (i) to re-evaluate, through exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), the structure of an instrument intended to measure consumers' satisfaction with care in psychiatric settings and (ii) to examine and publish the psychometric characteristics, validity and reliability, of the Inpatient Consumer Survey (ICS). To psychometrically test the structure of the ICS, 34 878 survey results, submitted by 90 psychiatric hospitals in 2008, were extracted from the Behavioral Healthcare Performance Measurement System (BHPMS). Basic descriptive item-response and correlation analyses were performed for total surveys. Two datasets were randomly created for analysis. A random sample of 8229 survey results was used for EFA. Another random sample of 8261 consumer survey results was used for CFA. This same sample was used to perform validity and reliability analyses. The item-response analysis showed that the mean range for a disagree/agree five-point scale was 3.10-3.94. Correlation analysis showed a strong relationship between items. Six domains (dignity, rights, environment, empowerment, participation, and outcome) with internal reliabilities between good to moderate (0.87-0.73) were shown to be related to overall care satisfaction. Overall reliability for the instrument was excellent (0.94). Results from CFA provided support for the domains structure of the ICS proposed through EFA. The overall findings from this study provide evidence that the ICS is a reliable measure of consumer satisfaction in psychiatric inpatient settings. The analysis has shown the ICS to provide valid and reliable results and to focus on the specific concerns of consumers of psychiatric inpatient care. Scores by item indicate that opportunity for improvement exists across healthcare organizations.
"Reactivity to Stimuli” Is a Temperamental Factor Contributing to Canine Aggression

PubMed Central

Arata, Sayaka; Takeuchi, Yukari; Inoue, Mai; Mori, Yuji

2014-01-01

Canine aggression is one of the most frequent problems in veterinary behavioral medicine, which in severe cases may result in relinquishment or euthanasia. As it is important to reveal underlying factors of aggression for both treatment and prevention, we recently developed a questionnaire on aggression and temperamental traits and found that “reactivity to stimuli” was associated with aggression toward owners, children, strangers, and other dogs of the Shiba Inu breed. In order to examine whether these associations were consistent in other breeds, we asked the owners of insured dogs of Anicom Insurance Inc. to complete our questionnaire. The top 17 contracted breeds were included. The questionnaire consisted of dogs' general information, four items related to aggression toward owners, children, strangers, and other dogs, and 20 other behavioral items. Aggression-related and behavioral items were rated on a five-point frequency scale. Valid responses (n = 5610) from owners of dogs aged 1 through 10 years were collected. Factor analyses on 18 behavioral items (response rate over 95%) extracted five largely consistent factors in 14 breeds: “sociability with humans,” “fear of sounds,” “chase proneness,” “reactivity to stimuli,” and “avoidance of aversive events.” By stepwise multiple regression analyses, using the Schwartz's Bayesian information criterion (BIC) method with aggression points as objective variables and general information and temperamental factor points as explanatory variables, “reactivity to stimuli,” i.e., physical reactivity to sudden movement or sound at home, was shown to be significantly associated with owner-directed aggression in 13 breeds, child-directed aggression in eight breeds, stranger-directed aggression in nine breeds, and dog-directed aggression in five breeds. These results suggest that “reactivity to stimuli” is simultaneously involved in several types of aggression. Therefore, it would be worth taking “reactivity to stimuli” into account in the treatment and prevention of canine aggression. PMID:24972077
Adult Attachment Ratings (AAR): an item response theory analysis.

PubMed

Pilkonis, Paul A; Kim, Yookyung; Yu, Lan; Morse, Jennifer Q

2014-01-01

The Adult Attachment Ratings (AAR) include 3 scales for anxious, ambivalent attachment (excessive dependency, interpersonal ambivalence, and compulsive care-giving), 3 for avoidant attachment (rigid self-control, defensive separation, and emotional detachment), and 1 for secure attachment. The scales include items (ranging from 6-16 in their original form) scored by raters using a 3-point format (0 = absent, 1 = present, and 2 = strongly present) and summed to produce a total score. Item response theory (IRT) analyses were conducted with data from 414 participants recruited from psychiatric outpatient, medical, and community settings to identify the most informative items from each scale. The IRT results allowed us to shorten the scales to 5-item versions that are more precise and easier to rate because of their brevity. In general, the effective range of measurement for the scales was 0 to +2 SDs for each of the attachment constructs; that is, from average to high levels of attachment problems. Evidence for convergent and discriminant validity of the scales was investigated by comparing them with the Experiences of Close Relationships-Revised (ECR-R) scale and the Kobak Attachment Q-sort. The best consensus among self-reports on the ECR-R, informant ratings on the ECR-R, and expert judgments on the Q-sort and the AAR emerged for anxious, ambivalent attachment. Given the good psychometric characteristics of the scale for secure attachment, however, this measure alone might provide a simple alternative to more elaborate procedures for some measurement purposes. Conversion tables are provided for the 7 scales to facilitate transformation from raw scores to IRT-calibrated (theta) scores.
Development and psychometric analysis of the Brief DSM-5 Alcohol Use Disorder Diagnostic Assessment: Towards effective diagnosis in college students.

PubMed

Hagman, Brett T

2017-11-01

The Diagnostic and Statistical Manual of Mental Disorders (5th edition) Alcohol Use Disorder (DSM-5 AUD) criteria have been modified to reflect a single, continuous disorder. It is critical that we develop brief assessment measures that can accurately assess for DSM-5 AUD criteria in college students to assist in screening, referral, and brief intervention services implemented on college campuses. The present study sought to develop and assess for the psychometric properties of a brief 13-item measure designed to capture the full spectrum of the DSM-5 AUD criteria in a sample of college students. Participants were past-year drinkers (N = 923) between the ages of 18 to 30 enrolled at 3 universities. Respondents completed a 30-min anonymous battery of questionnaires online. The Brief DSM-5 AUD Assessment consisted of 13 items designed to reflect the DSM-5 AUD criteria. Results indicated a high degree of internal consistency reliability with high item-to-scale correlations. Confirmatory factor analyses indicated that a dominant single factor emerged with good model fit. The Item Response Theory (IRT) analyses indicated that the difficulty parameters for each criterion were intermixed along the upper portion of the underlying AUD severity continuum, and the discrimination parameters were all high. Additional analysis indicated that those with a DSM-5 AUD had greater levels of alcohol and other drug use and problem severity in comparison to those without a DSM-5 AUD. Study findings provide empirical support for the reliability and validity of the Brief 13-item DSM-5 Assessment. It should be routinely included into research and clinical practice efforts. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Safety climate in Swiss hospital units: Swiss version of the Safety Climate Survey

PubMed Central

Gehring, Katrin; Mascherek, Anna C.; Bezzola, Paula

2015-01-01

Abstract Rationale, aims and objectives Safety climate measurements are a broadly used element of improvement initiatives. In order to provide a sound and easy‐to‐administer instrument for the use in Swiss hospitals, we translated the Safety Climate Survey into German and French. Methods After translating the Safety Climate Survey into French and German, a cross‐sectional survey study was conducted with health care professionals (HCPs) in operating room (OR) teams and on OR‐related wards in 10 Swiss hospitals. Validity of the instrument was examined by means of Cronbach's alpha and missing rates of the single items. Item‐descriptive statistics group differences and percentage of ‘problematic responses’ (PPR) were calculated. Results 3153 HCPs completed the survey (response rate: 63.4%). 1308 individuals were excluded from the analyses because of a profession other than doctor or nurse or invalid answers (n = 1845; nurses = 1321, doctors = 523). Internal consistency of the translated Safety Climate Survey was good (Cronbach's alpha G erman = 0.86; Cronbach's alpha F rench = 0.84). Missing rates at item level were rather low (0.23–4.3%). We found significant group differences in safety climate values regarding profession, managerial function, work area and time spent in direct patient care. At item level, 14 out of 21 items showed a PPR higher than 10%. Conclusions Results indicate that the French and German translations of the Safety Climate Survey might be a useful measurement instrument for safety climate in Swiss hospital units. Analyses at item level allow for differentiating facets of safety climate into more positive and critical safety climate aspects. PMID:25656302
The Piper Fatigue Scale-12 (PFS-12): Psychometric Findings and Item Reduction in a Cohort of Breast Cancer Survivors

PubMed Central

Reeve, Bryce B.; Stover, Angela M.; Alfano, Catherine M.; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B.; Piper, Barbara F.

2013-01-01

Purpose Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study’s primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Methods Breast cancer survivors (n=799; stages in situ through IIIa; ages 29–86 yrs) were recruited through 3 SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has 4 subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale’s content validity, items’ relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Results Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87–0.89, compared to 0.90–0.94 prior to item removal. Conclusion The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African-Americans and Caucasian breast cancer survivors. Further testing in diverse populations is warranted. PMID:22933027
Analyses related to the development of DSM-5 criteria for substance use related disorders: 1. Toward amphetamine, cocaine and prescription drug use disorder continua using Item Response Theory.

PubMed

Saha, Tulshi D; Compton, Wilson M; Chou, S Patricia; Smith, Sharon; Ruan, W June; Huang, Boji; Pickering, Roger P; Grant, Bridget F

2012-04-01

Prior research has demonstrated the dimensionality of alcohol, nicotine and cannabis use disorders criteria. The purpose of this study was to examine the unidimensionality of DSM-IV cocaine, amphetamine and prescription drug abuse and dependence criteria and to determine the impact of elimination of the legal problems criterion on the information value of the aggregate criteria. Factor analyses and Item Response Theory (IRT) analyses were used to explore the unidimensionality and psychometric properties of the illicit drug use criteria using a large representative sample of the U.S. population. All illicit drug abuse and dependence criteria formed unidimensional latent traits. For amphetamines, cocaine, sedatives, tranquilizers and opioids, IRT models fit better for models without legal problems criterion than models with legal problems criterion and there were no differences in the information value of the IRT models with and without the legal problems criterion, supporting the elimination of that criterion. Consistent with findings for alcohol, nicotine and cannabis, amphetamine, cocaine, sedative, tranquilizer and opioid abuse and dependence criteria reflect underlying unitary dimensions of severity. The legal problems criterion associated with each of these substance use disorders can be eliminated with no loss in informational value and an advantage of parsimony. Taken together, these findings support the changes to substance use disorder diagnoses recommended by the American Psychiatric Association's DSM-5 Substance and Related Disorders Workgroup. Published by Elsevier Ireland Ltd.
Confirmatory Factor Analysis of the Patient Reported Outcomes Measurement Information System (PROMIS) Adult Domain Framework Using Item Response Theory Scores.

PubMed

Carle, Adam C; Riley, William; Hays, Ron D; Cella, David

2015-10-01

To guide measure development, National Institutes of Health-supported Patient reported Outcomes Measurement Information System (PROMIS) investigators developed a hierarchical domain framework. The framework specifies health domains at multiple levels. The initial PROMIS domain framework specified that physical function and symptoms such as Pain and Fatigue indicate Physical Health (PH); Depression, Anxiety, and Anger indicate Mental Health (MH); and Social Role Performance and Social Satisfaction indicate Social Health (SH). We used confirmatory factor analyses to evaluate the fit of the hypothesized framework to data collected from a large sample. We used data (n=14,098) from PROMIS's wave 1 field test and estimated domain scores using the PROMIS item response theory parameters. We then used confirmatory factor analyses to test whether the domains corresponded to the PROMIS domain framework as expected. A model corresponding to the domain framework did not provide ideal fit [root mean square error of approximation (RMSEA)=0.13; comparative fit index (CFI)=0.92; Tucker Lewis Index (TLI)=0.88; standardized root mean square residual (SRMR)=0.09]. On the basis of modification indices and exploratory factor analyses, we allowed Fatigue to load on both PH and MH. This model fit the data acceptably (RMSEA=0.08; CFI=0.97; TLI=0.96; SRMR=0.03). Our findings generally support the PROMIS domain framework. Allowing Fatigue to load on both PH and MH improved fit considerably.
A Two-Decision Model for Responses to Likert-Type Items

ERIC Educational Resources Information Center

Thissen-Roe, Anne; Thissen, David

2013-01-01

Extreme response set, the tendency to prefer the lowest or highest response option when confronted with a Likert-type response scale, can lead to misfit of item response models such as the generalized partial credit model. Recently, a series of intrinsically multidimensional item response models have been hypothesized, wherein tendency toward…
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory.

PubMed

Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E

2016-04-01

We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
On the Relationship Between Classical Test Theory and Item Response Theory: From One to the Other and Back.

PubMed

Raykov, Tenko; Marcoulides, George A

2016-04-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
The continuity between DSM-5 obsessive-compulsive personality disorder traits and obsessive-compulsive symptoms in adolescence: an item response theory study.

PubMed

De Caluwé, Elien; Rettew, David C; De Clercq, Barbara

2014-11-01

Various studies have shown that obsessive-compulsive symptoms exist as part of not only obsessive-compulsive disorder (OCD) but also obsessive-compulsive personality disorder (OCPD). Despite these shared characteristics, there is an ongoing debate on the inclusion of OCPD into the recently developed DSM-5 obsessive-compulsive and related disorders (OCRDs) category. The current study aims to clarify whether this inclusion can be justified from an item response theory approach. The validity of the continuity model for understanding the association between OCD and OCPD was explored in 787 Dutch community and referred adolescents (70% female, 12-20 years old, mean = 16.16, SD = 1.40) studied between July 2011 and January 2013, relying on item response theory (IRT) analyses of self-reported OCD symptoms (Youth Obsessive-Compulsive Symptoms Scale [YOCSS]) and OCPD traits (Personality Inventory for DSM-5 [PID-5]). The results support the continuity hypothesis, indicating that both OCD and OCPD can be represented along a single underlying spectrum. OCD, and especially the obsessive symptom domain, can be considered as the extreme end of OCPD traits. The current study empirically supports the classification of OCD and OCPD along a single dimension. This integrative perspective in OC-related pathology addresses the dimensional nature of traits and psychopathology and may improve the transparency and validity of assessment procedures. © Copyright 2014 Physicians Postgraduate Press, Inc.
A comparison of Italian, Japanese and American students' responses to the Adolescent Reinforcement Survey Schedule.

PubMed

Galeazzi, A; Franceschina, E; Cautela, J; Holmes, G R; Sakano, Y

1998-02-01

The Italian form of the Adolescent Reinforcement Survey Schedule (ARSS-I) was administered to (N = 648) high school boys and girls from northern and central Italy. Their responses were factor analyzed using a principal component. VARIMAX rotation procedure (SAS Institute, Inc., 1990). The 10 interpretable factors from the Italian data were compared and contrasted to factor analytic results from Holmes (1991, 1994) studies using American and Japanese students. Additionally, the Italian data analyses includes an examination by gender using t tests for each of the ARSS-I items and an ANOVA for age and age-gender effects on responses to the ARSS-I.
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

PubMed

Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

2013-06-01

To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

ERIC Educational Resources Information Center

Lee, Wooyeol; Cho, Sun-Joo

2017-01-01

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

ERIC Educational Resources Information Center

Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

2013-01-01

We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
Examining the psychometric properties of a sport-related concussion survey: a Rasch measurement approach.

PubMed

Hecimovich, Mark; Marais, Ida

2017-06-26

Awareness of sport-related concussion (SRC) is an essential step in increasing the number of athletes or parents who report on SRC. This awareness is important, as there is no established data on medical care at youth-level sports and may be limited to individuals with only first aid training. In this circumstance, aside from the coach, it is the players and their parents who need to be aware of possible signs and symptoms. The aim of this study was to examine the psychometric properties of a parent and player concussion survey intended for use before and after an education campaign regarding SRC. 1441 questionnaires were received from parents and 284 questionnaires from players. The responses to the sixteen-item section of the questionnaire's 'recognition of signs and symptoms' were submitted to psychometric analysis using the dichotomous and polytomous Rasch model via the Rasch Unidimensional Measurement Model software RUMM2030. The Rasch model of Modern Test Theory can be considered a refinement of, or advance on, traditional analyses of an instrument's psychometric properties. The main finding is that these sixteen items measure two factors: items that are symptoms of concussion and items that are not symptoms of concussion. Parents and athletes were able to identify most or all of the symptoms, but were not as good at distinguishing symptoms that are not symptoms of concussion. Analyzing these responses revealed differential item functioning for parents and athletes on non-symptom items. When the DIF was resolved a significant difference was found between parents and athletes. The main finding is that the items measure two 'dimensions' in concussion symptom recognition. The first dimension consists of those items that are symptoms of concussion and the second dimension of those items that are not symptoms of concussion. Parents and players were able to identify most or all of the symptoms of concussion, so one would not expect to pick up any positive change on these items after an education campaign. Parents and players were not as good at distinguishing symptoms that are not symptoms of concussion. It is on these items that one may possibly expect improvement to manifest, so to evaluate the effectiveness of an education campaign it would pay to look for improvement in distinguishing symptoms that are not symptoms of concussion.
Students' approaches to learning in a clinical practicum: A psychometric evaluation based on item response theory.

PubMed

Zhao, Yue; Kuan, Hoi Kei; Chung, Joyce O K; Chan, Cecilia K Y; Li, William H C

2018-07-01

The investigation of learning approaches in the clinical workplace context has remained an under-researched area. Despite the validation of learning approach instruments and their applications in various clinical contexts, little is known about the extent to which an individual item, that reflects a specific learning strategy and motive, effectively contributes to characterizing students' learning approaches. This study aimed to measure nursing students' approaches to learning in a clinical practicum using the Approaches to Learning at Work Questionnaire (ALWQ). Survey research design was used in the study. A sample of year 3 nursing students (n = 208) who undertook a 6-week clinical practicum course participated in the study. Factor analyses were conducted, followed by an item response theory analysis, including model assumption evaluation (unidimensionality and local independence), item calibration and goodness-of-fit assessment. Two subscales, deep and surface, were derived. Findings suggested that: (a) items measuring the deep motive from intrinsic interest and deep strategies of relating new ideas to similar situations, and that of concept mapping served as the strongest discriminating indicators; (b) the surface strategy of memorizing facts and details without an overall picture exhibited the highest discriminating power among all surface items; and, (c) both subscales appeared to be informative in assessing a broad range of the corresponding latent trait. The 21-item ALWQ derived from this study presented an efficient, internally consistent and precise measure. Findings provided a useful psychometric evaluation of the ALWQ in the clinical practicum context, added evidence to the utility of the ALWQ for nursing education practice and research, and echoed the discussions from previous studies on the role of the contextual factors in influencing student choices of different learning strategies. They provided insights for clinical educators to measure nursing students' approaches to learning and facilitate their learning in the clinical practicum setting. Copyright © 2018. Published by Elsevier Ltd.
The Health and Functioning ICF-60: Development and Psychometric Properties

PubMed Central

Tutelyan, V A; Chatterji, S; Baturin, A K; Pogozheva, A V; Kishko, O N; Akolzina, S E

2014-01-01

Background This paper describes the development and psychometric properties of the Health and Functioning ICF-60 (HF-ICF-60) measure, based on the World Health Organization (WHO) ‘International Classification of Functioning, Disability and Health: ICF’ (2001). The aims of the present study were to test psychometric properties of the HF-ICF-60, developed as a measure that would be responsive to change in functioning through changes in health and nutritional status, as a prospective measure to monitor health and nutritional status of populations and to explore the relationship of the HF-ICF-60 with quality of life measures such as the World Health Organization WHOQOL-BREF quality of life assessment in relation to non-communicable diseases. Methods The HF-ICF-60 measure consists of 60 items selected from the ICF by an expert panel, which included 18 items that cover Body Functions, 21 items that cover Activities and Participation, rated on five-point scales, and 21 items that cover Environmental Factors (seven items cover Individual Environmental Factors and 14 items cover Societal Environmental Factors), rated on nine-point scales. The HF-ICF-60 measure was administered to the Russian nationally representative sample within the Russian National Population Quality of Life, Health and Nutrition Survey, in 2004 (n = 9807) and 2005 (n = 9560), as part of the two waves of the Russian Longitudinal Monitoring Survey (RLMS). The statistical analyses were carried out with the use of both classical and modern psychometric methods, such as factor analysis, and based on Item Response Theory, respectively. Results The HF-ICF-60 questionnaire is a new measure derived directly from the ICF and covers the ICF components as follows: Body Functions, Activities and Participation, and Environmental Factors (Individual Environmental Factors and Societal Environmental Factors). The results from the factor analyses (both Exploratory Factor Analyses and Confirmatory Factor Analyses) show good support for the proposed structure together with an overall higher-order factor for each scale of the measure. The measure has good reliability and validity, and sensitivity to change in the health and nutritional status of respondents over time. Normative values were developed for the Russian adult population. Conclusions The HF-ICF-60 has shown good psychometric properties in the two waves of the nationally representative RLMS, which provided considerable support to using the HF-ICF-60 data as the normative health and functioning values for the Russian population. Similarly, the administration of the WHOQOL-BREF in the same two waves of the nationally representative RLMS has allowed the normative quality of life values for the Russian population to be obtained. Therefore, the objective assessment of health and functioning of the HF-ICF-60 could be mapped onto the subjective evaluation of quality of life of the WHOQOL-BREF to increase the potential usefulness of the surveys in relation to non-communicable diseases. © 2014 The Authors. Clinical Psychology & Psychotherapy. Published by John Wiley & Sons, Ltd. Key Practitioner Message The HF-ICF-60 offers a new perspective in measuring change in functioning through changes in lifestyle and diet. The HF-ICF-60 can be combined with the WHOQOL-BREF to map the objective assessment of health and functioning onto the subjective evaluation of quality of life. Combined use of the HF-ICF-60 and the WHOQOL-BREF can be especially useful for national and global monitoring and surveillance of implementation of measures to reduce risk factors of non-communicable diseases and to promote healthy lifestyles and healthy diets. PMID:24931300

On Multidimensional Item Response Theory: A Coordinate-Free Approach. Research Report. ETS RR-07-30

ERIC Educational Resources Information Center

Antal, Tamás

2007-01-01

A coordinate-free definition of complex-structure multidimensional item response theory (MIRT) for dichotomously scored items is presented. The point of view taken emphasizes the possibilities and subtleties of understanding MIRT as a multidimensional extension of the classical unidimensional item response theory models. The main theorem of the…
Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory

ERIC Educational Resources Information Center

Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.

2016-01-01

Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…
The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

ERIC Educational Resources Information Center

Bennett, Randy Elliot; And Others

1990-01-01

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Evaluating the healthiness of chain-restaurant menu items using crowdsourcing: a new method.

PubMed

Lesser, Lenard I; Wu, Leslie; Matthiessen, Timothy B; Luft, Harold S

2017-01-01

To develop a technology-based method for evaluating the nutritional quality of chain-restaurant menus to increase the efficiency and lower the cost of large-scale data analysis of food items. Using a Modified Nutrient Profiling Index (MNPI), we assessed chain-restaurant items from the MenuStat database with a process involving three steps: (i) testing 'extreme' scores; (ii) crowdsourcing to analyse fruit, nut and vegetable (FNV) amounts; and (iii) analysis of the ambiguous items by a registered dietitian. In applying the approach to assess 22 422 foods, only 3566 could not be scored automatically based on MenuStat data and required further evaluation to determine healthiness. Items for which there was low agreement between trusted crowd workers, or where the FNV amount was estimated to be >40 %, were sent to a registered dietitian. Crowdsourcing was able to evaluate 3199, leaving only 367 to be reviewed by the registered dietitian. Overall, 7 % of items were categorized as healthy. The healthiest category was soups (26 % healthy), while desserts were the least healthy (2 % healthy). An algorithm incorporating crowdsourcing and a dietitian can quickly and efficiently analyse restaurant menus, allowing public health researchers to analyse the healthiness of menu items.
Perceived freedom-responsibility covariation among Cypriot adolescents.

PubMed

Frangou, Georgia; Wilkerson, Keith; McGahan, Joseph R

2008-04-01

Participants were 67 Cypriot adolescents who responded to propositions regarding positive, negative, and noncontingent relations between freedom and responsibility. The authors framed items so that half dealt with freedom given responsibility, and the other half dealt with responsibility given freedom. Results indicated participants were more likely to endorse positive-contingency items than they were negative and noncontingency items when items were framed around freedom given responsibility. However, when items were framed around responsibility given freedom, no such differences emerged. The authors discuss results relative to cultural and sociopolitical differences and similarities between children in Cypress and participants in the United States and implications concerning the present study and previous studies regarding these constructs.
Psychometric validation of the dysmenorrhea daily diary (DysDD): a patient-reported outcome for dysmenorrhea.

PubMed

Nguyen, Allison M; Arbuckle, Rob; Korver, Tjeerd; Chen, Fang; Taylor, Beverley; Turnbull, Alice; Norquist, Josephine M

2017-08-01

The objective of this study was to evaluate the psychometric properties of the Dysmenorrhea Daily Diary (DysDD), an electronic patient-reported outcome, in a sample of 355 women with primary dysmenorrhea enrolled in a phase IIb, multicenter, randomized, partially blinded, placebo-controlled trial for treatment of dysmenorrhea. Subjects completed the DysDD over three menstrual cycles, one pre-treatment baseline cycle and two treatment cycles. The DysDD was administered alongside the Menstrual Distress Questionnaire (MDQ), the Short-Form 36 Version 2.0 (SF-36v2), and a Global Assessment of Change (GAC). Item response distributions, test-retest reliability, concurrent and known groups validity, responsiveness, and minimally important difference (MID) were evaluated for the DysDD. As expected, item response distributions varied throughout the menstrual period for all items, with the response scales fully utilized. Within-cycle test-retest reliability was adequate (weighted kappa: 0.5-0.7), although between-cycle test-retest was poor (weighted kappa: 0.1-0.5), most likely due to the highly variable nature of dysmenorrhea between cycles rather than limitations of the measure. Correlations with the MDQ and SF-36v2 were low-moderate, but in the predicted direction, supporting concurrent validity. There were significant differences in DysDD scores across severity groups based on pain medication use. The DysDD was responsive to changes in patients' dysmenorrhea with significantly different changes in scores between change groups (p < 0.0001). MID analyses suggest changes on the DysDD 0-10 pelvic pain score of three points can be considered clinically meaningful. Overall, findings indicate that the DysDD has acceptable reliability and is a valid and responsive instrument for assessing dysmenorrhea.
Dealing with Omitted and Not-Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models

ERIC Educational Resources Information Center

Pohl, Steffi; Gräfe, Linda; Rose, Norman

2014-01-01

Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
A cross-national study on the multidimensional characteristics of the five-item psychological demands scale of the Job Content Questionnaire.

PubMed

Choi, BongKyoo; Kawakami, Norito; Chang, SeiJin; Koh, SangBaek; Bjorner, Jakob; Punnett, Laura; Karasek, Robert

2008-01-01

The five-item psychological demands scale of the Job Content Questionnaire (JCQ) has been assumed to be one-dimensional in practice. To examine whether the scale has sufficient internal consistency and external validity to be treated as a single scale, using the cross-national JCQ datasets from the United States, Korea, and Japan. Exploratory factor analyses with 22 JCQ items, confirmatory factor analyses with the five psychological demands items, and correlations analyses with mental health indexes. Generally, exploratory factor analyses displayed the predicted demand/control/support structure with three and four factors extracted. However, at more detailed levels of exploratory and confirmatory factor analyses, the demands scale showed clear evidence of multi-factor structure. The correlations of items and subscales of the demands scale with mental health indexes were similar to those of the full scale in the Korean and Japanese datasets, but not in the U.S. data. In 4 out of 16 sub-samples of the U.S. data, several significant correlations of the components of the demands scale with job dissatisfaction and life dissatisfaction were obscured by the full scale. The multidimensionality of the psychological demands scale should be considered in psychometric analysis and interpretation, occupational epidemiologic studies, and future scale extension.
Low Reporting Quality of the Meta-Analyses in Diagnostic Pathology.

PubMed

Liu, Xulei; Kinzler, Michael; Yuan, Jiangfan; He, Guozhong; Zhang, Lanjing

2017-03-01

- Little is known regarding the reporting quality of meta-analyses in diagnostic pathology. - To compare reporting quality of meta-analyses in diagnostic pathology and medicine and to examine factors associated with reporting quality of diagnostic pathology meta-analyses. - Meta-analyses were identified in 12 major diagnostic pathology journals without specifying years and 4 major medicine journals in 2006 and 2011 using PubMed. Reporting quality of meta-analyses was evaluated using the 27-item checklist of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement published in 2009. A higher PRISMA score indicates higher reporting quality. - Forty-one diagnostic pathology meta-analyses and 118 medicine meta-analyses were included. Overall, reporting quality of meta-analyses in diagnostic pathology was lower than that in medicine (median [interquartile range] = 22 [15, 25] versus 27 [23, 28], P < .001). Compared with medicine meta-analyses, diagnostic pathology meta-analyses less likely reported 23 of the 27 items (85.2%) on the PRISMA checklist, but more likely reported the data items. Higher reporting quality of diagnostic pathology meta-analyses was associated with recent publication years (later than 2009 versus 2009 or earlier, P = .002) and non-North American first authors (versus North American, P = .001), but not journal publisher's location (P = .11). Interestingly, reporting quality was not associated with adjusted citation ratio for meta-analyses in either diagnostic pathology or medicine (P = .40 and P = .09, respectively). - Meta-analyses in diagnostic pathology had lower reporting quality than those in medicine. Reporting quality of diagnostic pathology meta-analyses is linked to publication year and first author's location, but not to journal publisher's location or article's adjusted citation ratios. More research and education on meta-analysis methodology may improve the reporting quality of diagnostic pathology meta-analyses.
Measuring Alcohol Marketing Engagement: The Development and Psychometric Properties of the Alcohol Marketing Engagement Scale.

PubMed

Robertson, Angela; Morse, David T; Hood, Kristina; Walker, Courtney

Ample evidence exists in support of the influence of media, both traditional and electronic, on perceptions and engagement with alcohol marketing. We describe the development, calibration, and evidence for technical quality and utility for a new measure, the Alcohol Marketing Engagement Scale. Using two samples of college undergraduates (n1 = 199, n2 = 732), we collected field test responses to a total of 13 items. Initial support for scale validity is presented via correlations with attributes previously shown to be related to alcohol engagement. While the joint map of estimated scale locations of items and respondents indicates the need for further scale development, the results of the present analyses are promising. Implications for use in research are discussed.
IRTs of the ABCs: Children's Letter Name Acquisition

PubMed Central

Piasta, Shayne B.; Anthony, Jason L.; Lonigan, Christopher J.; Francis, David J.

2015-01-01

We examined the developmental sequence of letter name knowledge acquisition by children from 2 to five years of age. Data from 2 samples representing diverse regions, ethnicity, and socioeconomic backgrounds (ns = 1074 & 500) were analyzed using item response theory (IRT) and differential item functioning techniques. Results from factor analyses indicated that letter name knowledge represented a unidimensional skill; IRT results yielded significant differences between letters in both difficulty and discrimination. Results also indicated an approximate developmental sequence in letter name learning for the simplest and most challenging to learn letters -- but with no clear sequence between these extremes. Findings also suggested that children were most likely to first learn their first initial. We discuss implications for assessment and instruction. PMID:22710016
An introduction to mixture item response theory models.

PubMed

De Ayala, R J; Santiago, S Y

2017-02-01

Mixture item response theory (IRT) allows one to address situations that involve a mixture of latent subpopulations that are qualitatively different but within which a measurement model based on a continuous latent variable holds. In this modeling framework, one can characterize students by both their location on a continuous latent variable as well as by their latent class membership. For example, in a study of risky youth behavior this approach would make it possible to estimate an individual's propensity to engage in risky youth behavior (i.e., on a continuous scale) and to use these estimates to identify youth who might be at the greatest risk given their class membership. Mixture IRT can be used with binary response data (e.g., true/false, agree/disagree, endorsement/not endorsement, correct/incorrect, presence/absence of a behavior), Likert response scales, partial correct scoring, nominal scales, or rating scales. In the following, we present mixture IRT modeling and two examples of its use. Data needed to reproduce analyses in this article are available as supplemental online materials at http://dx.doi.org/10.1016/j.jsp.2016.01.002. Copyright © 2016 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
Identifying predictors of physics item difficulty: A linear regression approach

NASA Astrophysics Data System (ADS)

Mesic, Vanes; Muratovic, Hasnija

2011-06-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.
A Comparison of Limited-Information and Full-Information Methods in M"plus" for Estimating Item Response Theory Parameters for Nonnormal Populations

ERIC Educational Resources Information Center

DeMars, Christine E.

2012-01-01

In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2-parameter item response theory (IRT) model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and…
Estimation of Item Response Theory Parameters in the Presence of Missing Data

ERIC Educational Resources Information Center

Finch, Holmes

2008-01-01

Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
Examination of Different Item Response Theory Models on Tests Composed of Testlets

ERIC Educational Resources Information Center

Kogar, Esin Yilmaz; Kelecioglu, Hülya

2017-01-01

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…
A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing

ERIC Educational Resources Information Center

Wang, Chun; Fan, Zhewen; Chang, Hua-Hua; Douglas, Jeffrey A.

2013-01-01

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the…
Missouri Assessment Program (MAP), Spring 2000: High School Health/Physical Education, Released Items, Grade 9.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Bi-dimensional acculturation and cultural response set in CES-D among Korean immigrants

PubMed Central

Kim, Eunjung; Seo, Kumin; Cain, Kevin C.

2017-01-01

This study examined a cultural response set to positive affect items and depressive symptom items in CES-D among 172 Korean immigrants. Bi-dimensional acculturation approach, which considers maintenance of Korean Orientation and adoption of American Orientation, was utilized. As Korean immigrants increased American Orientation, they tended to score higher on positive affect items, while no changes occurred in depressive symptom items. Korean Orientation was not related to either positive affect items or depressive symptom items. Korean immigrants have response bias toward positive affect items in CES-D, which decreases as they adopt more American Orientation. CES-D lacks cultural equivalence for Korean immigrants. PMID:20701420

Efficacy of aripiprazole augmentation in Japanese patients with major depressive disorder: a subgroup analysis and Montgomery-Åsberg Depression Rating Scale and Hamilton Rating Scale for Depression item analyses of the Aripiprazole Depression Multicenter Efficacy study.

PubMed

Ozaki, Norio; Otsubo, Tempei; Kato, Masaki; Higuchi, Teruhiko; Ono, Hiroaki; Kamijima, Kunitoshi

2015-01-01

Results from this randomized, placebo-controlled study of aripiprazole augmentation to antidepressant therapy (ADT) in Japanese patients with major depressive disorder (MDD) (the Aripiprazole Depression Multicenter Efficacy [ADMIRE] study) revealed that aripiprazole augmentation was superior to ADT alone and was well tolerated. In subgroup analyses, we investigated the influence of demographic- and disease-related factors on the observed responses. We also examined how individual symptom improvement was related to overall improvement in MDD. Data from the ADMIRE study were analyzed. Subgroup analyses were performed on the primary outcome measures: the mean change in the Montgomery-Åsberg Depression Rating Scale (MADRS) total score from the end of selective serotonin reuptake inhibitor (SSRI)/serotonin norepinephrine reuptake inhibitor (SNRI) treatment to the end of the randomized treatment. Changes in the MADRS total scores were consistently greater with aripiprazole than placebo in each of the subgroups. Efficacy was not related to sex, age, number of adequate ADT trials in the current episode, MDD diagnosis, number of depressive episodes, duration of the current episode, age at first depressive episode, time since the first depressive episode, type of SSRI/SNRI, or severity at the end of SSRI/SNRI treatment phase. Compared to placebo, aripiprazole resulted in significant and rapid improvement on seven of the 10 MADRS items, including sadness. These post-hoc analyses indicated that aripiprazole was effective for a variety of Japanese patients with MDD who had exhibited inadequate responses to ADT. Additionally, we suggest that aripiprazole significantly and rapidly improved the core depressive symptoms. © 2014 The Authors. Psychiatry and Clinical Neurosciences © 2014 Japanese Society of Psychiatry and Neurology.
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means.

PubMed

Polak, Marike; de Rooij, Mark; Heiser, Willem J

2012-09-01

In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) criterion of irrelevance, which is a graphical, exploratory method for evaluating the "relevance" of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) concerning opinions of capital punishment.
Developing an interactive mobile phone self-report system for self-management of hypertension. Part 2: content validity and usability.

PubMed

Bengtsson, Ulrika; Kjellgren, Karin; Höfer, Stefan; Taft, Charles; Ring, Lena

2014-10-01

Self-management support tools using technology may improve adherence to hypertension treatment. There is a need for user-friendly tools facilitating patients' understanding of the interconnections between blood pressure, wellbeing and lifestyle. This study aimed to examine comprehension, comprehensiveness and relevance of items, and further to evaluate the usability and reliability of an interactive hypertension-specific mobile phone self-report system. Areas important in supporting self-management and candidate items were derived from five focus group interviews with patients and healthcare professionals (n = 27), supplemented by a literature review. Items and response formats were drafted to meet specifications for mobile phone administration and were integrated into a mobile phone data-capture system. Content validity and usability were assessed iteratively in four rounds of cognitive interviews with patients (n = 21) and healthcare professionals (n = 4). Reliability was examined using a test-retest. Focus group analyses yielded six areas covered by 16 items. The cognitive interviews showed satisfactory item comprehension, relevance and coverage; however, one item was added. The mobile phone self-report system was reliable and perceived easy to use. The mobile phone self-report system appears efficiently to capture information relevant in patients' self-management of hypertension. Future studies need to evaluate the effectiveness of this tool in improving self-management of hypertension in clinical practice.
Competitive foods available in Pennsylvania public high schools.

PubMed

Probart, Claudia; McDonnell, Elaine; Weirich, J Elaine; Hartman, Terryl; Bailey-Davis, Lisa; Prabhakher, Vaheedha

2005-08-01

This study examined the types and extent of competitive foods available in public high schools in Pennsylvania. We developed, pilot tested, and distributed surveys to school foodservice directors in a random sample of 271 high schools in Pennsylvania. Two hundred twenty-eight surveys were returned, for a response rate of 84%. Statistical analyses were performed: Descriptive statistics were used to examine the extent of competitive food sales in Pennsylvania public high schools. The survey data were analyzed using SPSS software version 11.5.1 (2002, SPSS base 11.0 for Windows, SPSS Inc, Chicago, IL). A la carte sales provide almost dollar 700/day to school foodservice programs, almost 85% of which receive no financial support from their school districts. The top-selling a la carte items are "hamburgers, pizza, and sandwiches." Ninety-four percent of respondents indicated that vending machines are accessible to students. The item most commonly offered in vending machines is bottled water (71.5%). While food items are less often available through school stores and club fund-raisers, candy is the item most commonly offered through these sources. Competitive foods are widely available in high schools. Although many of the items available are low in nutritional value, we found several of the top-selling a la carte options to be nutritious and bottled water the item most often identified as available through vending machines.
Differential Item Functioning of the Boston Naming Test in Cognitively Normal African American and Caucasian Older Adults

PubMed Central

Pedraza, Otto; Graff-Radford, Neill R.; Smith, Glenn E.; Ivnik, Robert J.; Willis, Floyd B.; Petersen, Ronald C.; Lucas, John A.

2010-01-01

Scores on the Boston Naming Test (BNT) are frequently lower for African American when compared to Caucasian adults. Although demographically-based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo's Older Americans and Older African Americans Normative Studies. Under a 2-parameter logistic IRT framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Six of these 12 items (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. PMID:19570311
Improving Assessment of Work Related Mental Health Function Using the Work Disability Functional Assessment Battery (WD-FAB).

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; McDonough, Christine; Peterik, Kara; Marino, Molly; Meterko, Mark; Rasch, Elizabeth K; Chan, Leighton; Brandt, Diane; Jette, Alan M

2018-03-01

Purpose To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function. Methods Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB. Results Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties. Conclusions IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.
Comparison of the medical students' perceived self-efficacy and the evaluation of the observers and patients.

PubMed

Ammentorp, Jette; Thomsen, Janus Laust; Jarbøl, Dorte Ejg; Holst, René; Øvrehus, Anne Lindebo Holm; Kofoed, Poul-Erik

2013-04-08

The accuracy of self-assessment has been questioned in studies comparing physicians' self-assessments to observed assessments; however, none of these studies used self-efficacy as a method for self-assessment. The aim of the study was to investigate how medical students' perceived self-efficacy of specific communication skills corresponds to the evaluation of simulated patients and observers. All of the medical students who signed up for an Objective Structured Clinical Examination (OSCE) were included. As a part of the OSCE, the student performance in the "parent-physician interaction" was evaluated by a simulated patient and an observer at one of the stations. After the examination the students were asked to assess their self-efficacy according to the same specific communication skills. The Calgary Cambridge Observation Guide formed the basis for the outcome measures used in the questionnaires. A total of 12 items was rated on a Likert scale from 1-5 (strongly disagree to strongly agree). We used extended Rasch models for comparisons between the groups of responses of the questionnaires. Comparisons of groups were conducted on dichotomized responses. Eighty-four students participated in the examination, 87% (73/84) of whom responded to the questionnaire. The response rate for the simulated patients and the observers was 100%. Significantly more items were scored in the highest categories (4 and 5) by the observers and simulated patients compared to the students (observers versus students: -0.23; SE:0.112; p=0.002 and patients versus students:0.177; SE:0.109; p=0.037). When analysing the items individually, a statistically significant difference only existed for two items. This study showed that students scored their communication skills lower compared to observers or simulated patients. The differences were driven by only 2 of 12 items. The results in this study indicate that self-efficacy based on the Calgary Cambridge Observation guide seems to be a reliable tool.
Item response theory analysis of the Pain Self-Efficacy Questionnaire.

PubMed

Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

2017-01-01

The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain. Copyright © 2016 Scandinavian Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Biases and power for groups comparison on subjective health measurements.

PubMed

Hamel, Jean-François; Hardouin, Jean-Benoit; Le Neel, Tanguy; Kubis, Gildas; Roquelaure, Yves; Sébille, Véronique

2012-01-01

Subjective health measurements are increasingly used in clinical research, particularly for patient groups comparisons. Two main types of analytical strategies can be used for such data: so-called classical test theory (CTT), relying on observed scores and models coming from Item Response Theory (IRT) relying on a response model relating the items responses to a latent parameter, often called latent trait. Whether IRT or CTT would be the most appropriate method to compare two independent groups of patients on a patient reported outcomes measurement remains unknown and was investigated using simulations. For CTT-based analyses, groups comparison was performed using t-test on the scores. For IRT-based analyses, several methods were compared, according to whether the Rasch model was considered with random effects or with fixed effects, and the group effect was included as a covariate or not. Individual latent traits values were estimated using either a deterministic method or by stochastic approaches. Latent traits were then compared with a t-test. Finally, a two-steps method was performed to compare the latent trait distributions, and a Wald test was performed to test the group effect in the Rasch model including group covariates. The only unbiased IRT-based method was the group covariate Wald's test, performed on the random effects Rasch model. This model displayed the highest observed power, which was similar to the power using the score t-test. These results need to be extended to the case frequently encountered in practice where data are missing and possibly informative.
Examination of an eHealth literacy scale and a health literacy scale in a population with moderate to high cardiovascular risk: Rasch analyses.

PubMed

Richtering, Sarah S; Morris, Rebecca; Soh, Sze-Ee; Barker, Anna; Bampi, Fiona; Neubeck, Lis; Coorey, Genevieve; Mulley, John; Chalmers, John; Usherwood, Tim; Peiris, David; Chow, Clara K; Redfern, Julie

2017-01-01

Electronic health (eHealth) strategies are evolving making it important to have valid scales to assess eHealth and health literacy. Item response theory methods, such as the Rasch measurement model, are increasingly used for the psychometric evaluation of scales. This paper aims to examine the internal construct validity of an eHealth and health literacy scale using Rasch analysis in a population with moderate to high cardiovascular disease risk. The first 397 participants of the CONNECT study completed the electronic health Literacy Scale (eHEALS) and the Health Literacy Questionnaire (HLQ). Overall Rasch model fit as well as five key psychometric properties were analysed: unidimensionality, response thresholds, targeting, differential item functioning and internal consistency. The eHEALS had good overall model fit (χ2 = 54.8, p = 0.06), ordered response thresholds, reasonable targeting and good internal consistency (person separation index (PSI) 0.90). It did, however, appear to measure two constructs of eHealth literacy. The HLQ subscales (except subscale 5) did not fit the Rasch model (χ2: 18.18-60.60, p: 0.00-0.58) and had suboptimal targeting for most subscales. Subscales 6 to 9 displayed disordered thresholds indicating participants had difficulty distinguishing between response options. All subscales did, nonetheless, demonstrate moderate to good internal consistency (PSI: 0.62-0.82). Rasch analyses demonstrated that the eHEALS has good measures of internal construct validity although it appears to capture different aspects of eHealth literacy (e.g. using eHealth and understanding eHealth). Whilst further studies are required to confirm this finding, it may be necessary for these constructs of the eHEALS to be scored separately. The nine HLQ subscales were shown to measure a single construct of health literacy. However, participants' scores may not represent their actual level of ability, as distinction between response categories was unclear for the last four subscales. Reducing the response categories of these subscales may improve the ability of the HLQ to distinguish between different levels of health literacy.
Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

PubMed Central

Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

2014-01-01

Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Preferred Reporting Items for Systematic Review and Meta-Analyses of individual participant data: the PRISMA-IPD Statement.

PubMed

Stewart, Lesley A; Clarke, Mike; Rovers, Maroeska; Riley, Richard D; Simmonds, Mark; Stewart, Gavin; Tierney, Jayne F

2015-04-28

Systematic reviews and meta-analyses of individual participant data (IPD) aim to collect, check, and reanalyze individual-level data from all studies addressing a particular research question and are therefore considered a gold standard approach to evidence synthesis. They are likely to be used with increasing frequency as current initiatives to share clinical trial data gain momentum and may be particularly important in reviewing controversial therapeutic areas. To develop PRISMA-IPD as a stand-alone extension to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) Statement, tailored to the specific requirements of reporting systematic reviews and meta-analyses of IPD. Although developed primarily for reviews of randomized trials, many items will apply in other contexts, including reviews of diagnosis and prognosis. Development of PRISMA-IPD followed the EQUATOR Network framework guidance and used the existing standard PRISMA Statement as a starting point to draft additional relevant material. A web-based survey informed discussion at an international workshop that included researchers, clinicians, methodologists experienced in conducting systematic reviews and meta-analyses of IPD, and journal editors. The statement was drafted and iterative refinements were made by the project, advisory, and development groups. The PRISMA-IPD Development Group reached agreement on the PRISMA-IPD checklist and flow diagram by consensus. Compared with standard PRISMA, the PRISMA-IPD checklist includes 3 new items that address (1) methods of checking the integrity of the IPD (such as pattern of randomization, data consistency, baseline imbalance, and missing data), (2) reporting any important issues that emerge, and (3) exploring variation (such as whether certain types of individual benefit more from the intervention than others). A further additional item was created by reorganization of standard PRISMA items relating to interpreting results. Wording was modified in 23 items to reflect the IPD approach. PRISMA-IPD provides guidelines for reporting systematic reviews and meta-analyses of IPD.
Item Response Theory Using Hierarchical Generalized Linear Models

ERIC Educational Resources Information Center

Ravand, Hamdollah

2015-01-01

Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Item Response Theory Equating Using Bayesian Informative Priors.

ERIC Educational Resources Information Center

de la Torre, Jimmy; Patz, Richard J.

This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…
Instrument Formatting with Computer Data Entry in Mind.

ERIC Educational Resources Information Center

Boser, Judith A.; And Others

Different formats for four types of research items were studied for ease of computer data entry. The types were: (1) numeric response items; (2) individual multiple choice items; (3) multiple choice items with the same response items; and (4) card column indicator placement. Each of the 13 experienced staff members of a major university's Data…
EXTENDING THE FLOOR AND THE CEILING FOR ASSESSMENT OF PHYSICAL FUNCTION

PubMed Central

Fries, James F.; Lingala, Bharathi; Siemons, Liseth; Glas, Cees A. W.; Cella, David; Hussain, Yusra N; Bruce, Bonnie; Krishnan, Eswar

2014-01-01

Objective The objective of the current study was to improve the assessment of physical function by improving the precision of assessment at the floor (extremely poor function) and at the ceiling (extremely good health) of the health continuum. Methods Under the NIH PROMIS program, we developed new physical function floor and ceiling items to supplement the existing item bank. Using item response theory (IRT) and the standard PROMIS methodology, we developed 30 floor items and 26 ceiling items and administered them during a 12-month prospective observational study of 737 individuals at the extremes of health status. Change over time was compared across anchor instruments and across items by means of effect sizes. Using the observed changes in scores, we back-calculated sample size requirements for the new and comparison measures. Results We studied 444 subjects with chronic illness and/or extreme age, and 293 generally fit subjects including athletes in training. IRT analyses confirmed that the new floor and ceiling items outperformed reference items (p<0.001). The estimated post-hoc sample size requirements were reduced by a factor of two to four at the floor and a factor of two at the ceiling. Conclusion Extending the range of physical function measurement can substantially improve measurement quality, can reduce sample size requirements and improve research efficiency. The paradigm shift from Disability to Physical Function includes the entire spectrum of physical function, signals improvement in the conceptual base of outcome assessment, and may be transformative as medical goals more closely approach societal goals for health. PMID:24782194
Item parameters dissociate between expectation formats: a regression analysis of time-frequency decomposed EEG data

PubMed Central

Monsalve, Irene F.; Pérez, Alejandro; Molinaro, Nicola

2014-01-01

During language comprehension, semantic contextual information is used to generate expectations about upcoming items. This has been commonly studied through the N400 event-related potential (ERP), as a measure of facilitated lexical retrieval. However, the associative relationships in multi-word expressions (MWE) may enable the generation of a categorical expectation, leading to lexical retrieval before target word onset. Processing of the target word would thus reflect a target-identification mechanism, possibly indexed by a P3 ERP component. However, given their time overlap (200–500 ms post-stimulus onset), differentiating between N400/P3 ERP responses (averaged over multiple linguistically variable trials) is problematic. In the present study, we analyzed EEG data from a previous experiment, which compared ERP responses to highly expected words that were placed either in a MWE or a regular non-fixed compositional context, and to low predictability controls. We focused on oscillatory dynamics and regression analyses, in order to dissociate between the two contexts by modeling the electrophysiological response as a function of item-level parameters. A significant interaction between word position and condition was found in the regression model for power in a theta range (~7–9 Hz), providing evidence for the presence of qualitative differences between conditions. Power levels within this band were lower for MWE than compositional contexts when the target word appeared later on in the sentence, confirming that in the former lexical retrieval would have taken place before word onset. On the other hand, gamma-power (~50–70 Hz) was also modulated by predictability of the item in all conditions, which is interpreted as an index of a similar “matching” sub-step for both types of contexts, binding an expected representation and the external input. PMID:25161630
The existence of parenting styles in the owner-dog relationship.

PubMed

Herwijnen, Ineke R van; van der Borg, Joanne A M; Naguib, Marc; Beerda, Bonne

2018-01-01

Parents interact with children following specific styles, known to influence child development. These styles represent variations in the dimensions of demandingness and responsiveness, resulting in authoritarian, authoritative, permissive or uninvolved parenting. Given the similarities in the parent to child and owner to dog relationships, we determined the extent to which parenting styles exist in the owner to dog relationship using the existing Parenting Styles and Dimensions Questionnaire for the parent-child relationship and an adapted version for dog owners. Items on the parenting of children/dogs were rated for applicability on a five-point Likert scale by 518 Dutch dog owning parents. Principal Component Analyses grouped parenting propensities into styles, with some marked differences between the findings for children and dogs. Dog-directed items grouped into an authoritarian-correction orientated style, incorporating variation in demandingness and focussing on correcting a dog for behaviour verbally/physically, and in two styles based on authoritative items. An authoritative-intrinsic value orientated style reflected variation in mainly responsiveness and oriented on the assumed needs and emotions of the animal. A second authoritative-item based style, captured variations in demandingness and responsiveness. We labelled this style authoritative-training orientated, as it orientated on manners in teaching a dog how to behave in social situations. Thus, we defined dog-directed parenting styles and constructed a Dog-Directed Parenting Styles and Dimensions Questionnaire along the lines of the existing theoretical framework on parenting styles. We did not find a dog-directed parenting style of being permissive or uninvolved, which we attribute to a study population of devoted dog owners and our findings should be interpreted with this specific study population in mind. We found evidence of dog-directed parenting styles and provide a fundament for determining their possible impact on the different aspects of a dog's life.
Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne

2017-01-01

Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…
Comparing five depression measures in depressed Chinese patients using item response theory: an examination of item properties, measurement precision and score comparability.

PubMed

Zhao, Yue; Chan, Wai; Lo, Barbara Chuen Yee

2017-04-04

Item response theory (IRT) has been increasingly applied to patient-reported outcome (PRO) measures. The purpose of this study is to apply IRT to examine item properties (discrimination and severity of depressive symptoms), measurement precision and score comparability across five depression measures, which is the first study of its kind in the Chinese context. A clinical sample of 207 Hong Kong Chinese outpatients was recruited. Data analyses were performed including classical item analysis, IRT concurrent calibration and IRT true score equating. The IRT assumptions of unidimensionality and local independence were tested respectively using confirmatory factor analysis and chi-square statistics. The IRT linking assumptions of construct similarity, equity and subgroup invariance were also tested. The graded response model was applied to concurrently calibrate all five depression measures in a single IRT run, resulting in the item parameter estimates of these measures being placed onto a single common metric. IRT true score equating was implemented to perform the outcome score linking and construct score concordances so as to link scores from one measure to corresponding scores on another measure for direct comparability. Findings suggested that (a) symptoms on depressed mood, suicidality and feeling of worthlessness served as the strongest discriminating indicators, and symptoms concerning suicidality, changes in appetite, depressed mood, feeling of worthlessness and psychomotor agitation or retardation reflected high levels of severity in the clinical sample. (b) The five depression measures contributed to various degrees of measurement precision at varied levels of depression. (c) After outcome score linking was performed across the five measures, the cut-off scores led to either consistent or discrepant diagnoses for depression. The study provides additional evidence regarding the psychometric properties and clinical utility of the five depression measures, offers methodological contributions to the appropriate use of IRT in PRO measures, and helps elucidate cultural variation in depressive symptomatology. The approach of concurrently calibrating and linking multiple PRO measures can be applied to the assessment of PROs other than the depression context.

Consequences of Ignoring Guessing when Estimating the Latent Density in Item Response Theory

ERIC Educational Resources Information Center

Woods, Carol M.

2008-01-01

In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters. In extant Monte Carlo evaluations of RC-IRT, the item response function (IRF) used to fit the data is the same one used to generate the data. The present simulation study examines RC-IRT when the IRF is imperfectly…
Asymptotic Properties of Induced Maximum Likelihood Estimates of Nonlinear Models for Item Response Variables: The Finite-Generic-Item-Pool Case.

ERIC Educational Resources Information Center

Jones, Douglas H.

The progress of modern mental test theory depends very much on the techniques of maximum likelihood estimation, and many popular applications make use of likelihoods induced by logistic item response models. While, in reality, item responses are nonreplicate within a single examinee and the logistic models are only ideal, practitioners make…
Limits on Log Cross-Product Ratios for Item Response Models. Research Report. ETS RR-06-10

ERIC Educational Resources Information Center

Haberman, Shelby J.; Holland, Paul W.; Sinharay, Sandip

2006-01-01

Bounds are established for log cross-product ratios (log odds ratios) involving pairs of items for item response models. First, expressions for bounds on log cross-product ratios are provided for unidimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model.…
Reporting completeness and transparency of meta-analyses of depression screening tool accuracy: A comparison of meta-analyses published before and after the PRISMA statement.

PubMed

Rice, Danielle B; Kloda, Lorie A; Shrier, Ian; Thombs, Brett D

2016-08-01

Meta-analyses that are conducted rigorously and reported completely and transparently can provide accurate evidence to inform the best possible healthcare decisions. Guideline makers have raised concerns about the utility of existing evidence on the diagnostic accuracy of depression screening tools. The objective of our study was to evaluate the transparency and completeness of reporting in meta-analyses of the diagnostic accuracy of depression screening tools using the PRISMA tool adapted for diagnostic test accuracy meta-analyses. We searched MEDLINE and PsycINFO from January 1, 2005 through March 13, 2016 for recent meta-analyses in any language on the diagnostic accuracy of depression screening tools. Two reviewers independently assessed the transparency in reporting using the PRISMA tool with appropriate adaptations made for studies of diagnostic test accuracy. We identified 21 eligible meta-analyses. Twelve of 21 meta-analyses complied with at least 50% of adapted PRISMA items. Of 30 adapted PRISMA items, 11 were fulfilled by ≥80% of included meta-analyses, 3 by 50-79% of meta-analyses, 7 by 25-45% of meta-analyses, and 9 by <25%. On average, post-PRISMA meta-analyses complied with 17 of 30 items compared to 13 of 30 items pre-PRISMA. Deficiencies in the transparency of reporting in meta-analyses of the diagnostic test accuracy of depression screening tools of meta-analyses were identified. Authors, reviewers, and editors should adhere to the PRISMA statement to improve the reporting of meta-analyses of the diagnostic accuracy of depression screening tools. Copyright © 2016 Elsevier Inc. All rights reserved.
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary.

PubMed

Petscher, Yaacov; Mitchell, Alison M; Foorman, Barbara R

2015-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

PubMed Central

Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

2016-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568
Mini-Mental Status Examination: mixed Rasch model item analysis derived two different cognitive dimensions of the MMSE.

PubMed

Schultz-Larsen, Kirsten; Kreiner, Svend; Lomholt, Rikke Kirstine

2007-03-01

This study published in two companion papers assesses properties of the Mini-Mental State Examination (MMSE) with the purpose of improving the efficiencies of the methods of screening for cognitive impairment and dementia. An item analysis by conventional and mixed Rasch models was used to explore empirically derived cognitive dimensions of the MMSE, to assess item bias, and to construct diagnostic cut-points. The scores of 1,189 elderly residents were analyzed. Two dimensions of cognitive function, which are statistically and conceptually different from those obtained in previous studies, were derived. The corresponding sum scales were (1) age-correlated MMSE scale (A-MMSE scale: orientation to time, attention/calculation, naming, repetition, and three-stage command) and (2) non-age-correlated MMSE scale (B-MMSE scale: orientation to place, registration, recall, reading, and copying). The "writing" item was not included due to differential effects of age and sex. The analysis also showed that the study sample consisted of two cognitively different groups of elderly. The findings indicate that a two-scale solution is a stable and statistically supported framework for interpreting data obtained by means of the MMSE. Supplementary analyses are presented in the companion paper to explore the performance of this item response theory calibration as a screening test for dementia.
What Aspect of Dependence Does the Fagerström Test for Nicotine Dependence Measure?

PubMed Central

DiFranza, Joseph R.; Wellman, Robert J.; Savageau, Judith A.; Beccia, Ariel; Ursprung, W. W. Sanouri A.; McMillen, Robert

2013-01-01

Although the Fagerström Test for Nicotine Dependence (FTND) and the Heaviness of Smoking Index (HSI) are widely used, there is a uncertainty regarding what is measured by these scales. We examined associations between these instruments and items assessing different aspects of dependence. Adult current smokers (n = 422, mean age 33.3 years, 61.9% female) completed a web-based survey comprised of items related to demographics and smoking behavior plus (1) the FTND and HSI; (2) the Autonomy over Tobacco Scale (AUTOS) with subscales measuring Withdrawal, Psychological Dependence, and Cue-Induced Cravings; (3) 6 questions tapping smokers' wanting, craving, or needing experiences in response to withdrawal and the latency to each experience during abstinence; (4) 3 items concerning how smokers prepare to cope with periods of abstinence. In regression analyses the Withdrawal subscale of the AUTOS was the strongest predictor of FTND and HSI scores, followed by taking precautions not to run out of cigarettes or smoking extra to prepare for abstinence. The FTND and its six items, including the HSI, consistently showed the strongest correlations with withdrawal, suggesting that the behaviors described by the items of the FTND are primarily indicative of a difficulty maintaining abstinence because of withdrawal symptoms. PMID:25969829
What aspect of dependence does the fagerström test for nicotine dependence measure?

PubMed

DiFranza, Joseph R; Wellman, Robert J; Savageau, Judith A; Beccia, Ariel; Ursprung, W W Sanouri A; McMillen, Robert

2013-01-01

Although the Fagerström Test for Nicotine Dependence (FTND) and the Heaviness of Smoking Index (HSI) are widely used, there is a uncertainty regarding what is measured by these scales. We examined associations between these instruments and items assessing different aspects of dependence. Adult current smokers (n = 422, mean age 33.3 years, 61.9% female) completed a web-based survey comprised of items related to demographics and smoking behavior plus (1) the FTND and HSI; (2) the Autonomy over Tobacco Scale (AUTOS) with subscales measuring Withdrawal, Psychological Dependence, and Cue-Induced Cravings; (3) 6 questions tapping smokers' wanting, craving, or needing experiences in response to withdrawal and the latency to each experience during abstinence; (4) 3 items concerning how smokers prepare to cope with periods of abstinence. In regression analyses the Withdrawal subscale of the AUTOS was the strongest predictor of FTND and HSI scores, followed by taking precautions not to run out of cigarettes or smoking extra to prepare for abstinence. The FTND and its six items, including the HSI, consistently showed the strongest correlations with withdrawal, suggesting that the behaviors described by the items of the FTND are primarily indicative of a difficulty maintaining abstinence because of withdrawal symptoms.
Evaluating a Modular Design Approach to Collecting Survey Data Using Text Messages

PubMed Central

West, Brady T.; Ghimire, Dirgha; Axinn, William G.

2015-01-01

This article presents analyses of data from a pilot study in Nepal that was designed to provide an initial examination of the errors and costs associated with an innovative methodology for survey data collection. We embedded a randomized experiment within a long-standing panel survey, collecting data on a small number of items with varying sensitivity from a probability sample of 450 young Nepalese adults. Survey items ranged from simple demographics to indicators of substance abuse and mental health problems. Sampled adults were randomly assigned to one of three different modes of data collection: 1) a standard one-time telephone interview, 2) a “single sitting” back-and-forth interview with an interviewer using text messaging, and 3) an interview using text messages within a modular design framework (which generally involves breaking the survey response task into distinct parts over a short period of time). Respondents in the modular group were asked to respond (via text message exchanges with an interviewer) to only one question on a given day, rather than complete the entire survey. Both bivariate and multivariate analyses demonstrate that the two text messaging modes increased the probability of disclosing sensitive information relative to the telephone mode, and that respondents in the modular design group, while responding less frequently, found the survey to be significantly easier. Further, those who responded in the modular group were not unique in terms of available covariates, suggesting that the reduced item response rates only introduced limited nonresponse bias. Future research should consider enhancing this methodology, applying it with other modes of data collection (e. g., web surveys), and continuously evaluating its effectiveness from a total survey error perspective. PMID:26322137
Risky Business: Factor Analysis of Survey Data – Assessing the Probability of Incorrect Dimensionalisation

PubMed Central

van der Eijk, Cees; Rose, Jonathan

2015-01-01

This paper undertakes a systematic assessment of the extent to which factor analysis the correct number of latent dimensions (factors) when applied to ordered-categorical survey items (so-called Likert items). We simulate 2400 data sets of uni-dimensional Likert items that vary systematically over a range of conditions such as the underlying population distribution, the number of items, the level of random error, and characteristics of items and item-sets. Each of these datasets is factor analysed in a variety of ways that are frequently used in the extant literature, or that are recommended in current methodological texts. These include exploratory factor retention heuristics such as Kaiser’s criterion, Parallel Analysis and a non-graphical scree test, and (for exploratory and confirmatory analyses) evaluations of model fit. These analyses are conducted on the basis of Pearson and polychoric correlations. We find that, irrespective of the particular mode of analysis, factor analysis applied to ordered-categorical survey data very often leads to over-dimensionalisation. The magnitude of this risk depends on the specific way in which factor analysis is conducted, the number of items, the properties of the set of items, and the underlying population distribution. The paper concludes with a discussion of the consequences of over-dimensionalisation, and a brief mention of alternative modes of analysis that are much less prone to such problems. PMID:25789992
The Chinese version of the Myocardial Infarction Dimensional Assessment Scale (MIDAS): Mokken scaling

PubMed Central

2012-01-01

Background Hierarchical scales are very useful in clinical practice due to their ability to discriminate precisely between individuals, and the original English version of the Myocardial Infarction Dimensional Assessment Scale has been shown to contain a hierarchy of items. The purpose of this study was to analyse a Mandarin Chinese translation of the Myocardial Infarction Dimensional Assessment Scale for a hierarchy of items according to the criteria of Mokken scaling. Data from 180 Chinese participants who completed the Chinese translation of the Myocardial Infarction Dimensional Assessment Scale were analysed using the Mokken Scaling Procedure and the 'R' statistical programme using the diagnostics available in these programmes. Correlation between Mandarin Chinese items and a Chinese translation of the Short Form (36) Health Survey was also analysed. Findings Fifteen items from the Mandarin Chinese Myocardial Infarction Dimensional Assessment Scale were retained in a strong and reliable Mokken scale; invariant item ordering was not evident and the Mokken scaled items of the Chinese Myocardial Infarction Dimensional Assessment Scale correlated with the Short Form (36) Health Survey. Conclusions Items from the Mandarin Chinese Myocardial Infarction Dimensional Assessment Scale form a Mokken scale and this offers further insight into how the items of the Myocardial Infarction Dimensional Assessment Scale relate to the measurement of health-related quality of life people with a myocardial infarction. PMID:22221696
Development and validation of a new condition-specific instrument for evaluation of smile esthetics-related quality of life.

PubMed

Saltovic, Ema; Lajnert, Vlatka; Saltovic, Sabina; Kovacevic Pavicic, Daniela; Pavlic, Andrej; Spalj, Stjepan

2018-03-01

Orofacial esthetics raises psychosocial issues. The purpose was to create and validate new short instrument for psychosocial impacts of altered smile esthetics. A team of an orthodontist, two prosthodontists, psychologist, and a dental student generated items that could draw up specific hypothetical psychosocial dimensions (69 items initially, 39 in final analysis). The sample consisted of 261 Caucasian subjects attending local high schools and university (26% male) aged 14 to 28 years that have self-administrated the designed questionnaire. Factorial analysis, Cronbach's alpha, Pearson correlation, paired samples t-test and analysis of variance were used for analyses of internal consistency, construct validity, responsiveness, and test-retest. Three dimensions of psychosocial impacts of altered smile esthetics were identified: dental self-consciousness, dental self-confidence and social contacts that can be best fitted by 12 items, 4 items in each dimension. Internal consistency was good (α in range 0.85-0.89). Good stability in test-retest was confirmed. In responsiveness testing, tooth whitening induced increase in dental self-confidence (P = 0.002), but no significant changes in other dimensions. The new instrument, Smile Esthetics-Related Quality of Life (SERQoL), is short and has proven to be a good indicator of psychosocial dimensions related to perception of smile esthetics. Smile Esthetics-Related Quality of Life questionnaire might have practical validity when applied in esthetic dental clinical procedures. © 2017 Wiley Periodicals, Inc.
Multiple sensitive estimation and optimal sample size allocation in the item sum technique.

PubMed

Perri, Pier Francesco; Rueda García, María Del Mar; Cobo Rodríguez, Beatriz

2018-01-01

For surveys of sensitive issues in life sciences, statistical procedures can be used to reduce nonresponse and social desirability response bias. Both of these phenomena provoke nonsampling errors that are difficult to deal with and can seriously flaw the validity of the analyses. The item sum technique (IST) is a very recent indirect questioning method derived from the item count technique that seeks to procure more reliable responses on quantitative items than direct questioning while preserving respondents' anonymity. This article addresses two important questions concerning the IST: (i) its implementation when two or more sensitive variables are investigated and efficient estimates of their unknown population means are required; (ii) the determination of the optimal sample size to achieve minimum variance estimates. These aspects are of great relevance for survey practitioners engaged in sensitive research and, to the best of our knowledge, were not studied so far. In this article, theoretical results for multiple estimation and optimal allocation are obtained under a generic sampling design and then particularized to simple random sampling and stratified sampling designs. Theoretical considerations are integrated with a number of simulation studies based on data from two real surveys and conducted to ascertain the efficiency gain derived from optimal allocation in different situations. One of the surveys concerns cannabis consumption among university students. Our findings highlight some methodological advances that can be obtained in life sciences IST surveys when optimal allocation is achieved. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Development and validation of a novel patient-reported treatment satisfaction measure for hyperfunctional facial lines: facial line satisfaction questionnaire.

PubMed

Pompilus, Farrah; Burgess, Somali; Hudgens, Stacie; Banderas, Benjamin; Daniels, Selena

2015-12-01

Facial lines or wrinkles are among the most visible signs of aging, and minimally invasive cosmetic procedures are becoming increasingly popular. The aim of this study was to develop and validate the Facial Line Satisfaction Questionnaire (FLSQ) for use in adults with upper facial lines (UFL). A literature review, concept elicitation interviews (n = 33), and cognitive debriefing interviews (n = 23) of adults with UFL were conducted to develop the FLSQ. The FLSQ comprises Baseline and Follow-up versions and was field-tested with 150 subjects in a US observational study designed to assess its psychometric performance. Analyses included acceptability (item and scale distribution [i.e. missingness, floor, and ceiling effects]), reliability, and validity (including concurrent validity). In total, 69 concepts were elicited during patient interviews. Following cognitive debriefing interviews, the FLSQ-Baseline version included 11 items and the Follow-up version included 13 items. Response rates for the FLSQ were 100% and 73% at baseline and follow-up, respectively; no items had excessive missing data. Questionnaire scale scores were normally distributed. Most domain scores demonstrated good internal consistency reliability (Cronbach's α ≥ 0.70). Most items within their respective domains exhibited good convergent (item-scale correlations > 0.40) and discriminant (items had higher correlation with their hypothesized scales than other scales) validity. Concurrent validity correlation coefficients of the FLSQ domain scores with the associated concurrent measures were acceptable (range: r = 0.40-0.70). Six FLSQ items demonstrated reliability and validity as stand-alone items outside their domains. The FLSQ is a valid questionnaire for assessing treatment expectations, satisfaction, impact, and preference in adults with UFL. © 2015 The Authors. Journal of Cosmetic Dermatology Published by Wiley Periodicals, Inc.
The EORTC CAT Core-The computer adaptive version of the EORTC QLQ-C30 questionnaire.

PubMed

Petersen, Morten Aa; Aaronson, Neil K; Arraras, Juan I; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Dirven, Linda; Fayers, Peter; Gamper, Eva-Maria; Giesinger, Johannes M; Habets, Esther J J; Hammerlid, Eva; Helbostad, Jorunn; Hjermstad, Marianne J; Holzner, Bernhard; Johnson, Colin; Kemmler, Georg; King, Madeleine T; Kaasa, Stein; Loge, Jon H; Reijneveld, Jaap C; Singer, Susanne; Taphoorn, Martin J B; Thamsborg, Lise H; Tomaszewski, Krzysztof A; Velikova, Galina; Verdonck-de Leeuw, Irma M; Young, Teresa; Groenvold, Mogens

2018-06-21

To optimise measurement precision, relevance to patients and flexibility, patient-reported outcome measures (PROMs) should ideally be adapted to the individual patient/study while retaining direct comparability of scores across patients/studies. This is achievable using item banks and computerised adaptive tests (CATs). The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Core 30 (QLQ-C30) is one of the most widely used PROMs in cancer research and clinical practice. Here we provide an overview of the research program to develop CAT versions of the QLQ-C30's 14 functional and symptom domains. The EORTC Quality of Life Group's strategy for developing CAT item banks consists of: literature search to identify potential candidate items; formulation of new items compatible with the QLQ-C30 item style; expert evaluations and patient interviews; field-testing and psychometric analyses, including factor analysis, item response theory calibration and simulation of measurement properties. In addition, software for setting up, running and scoring CAT has been developed. Across eight rounds of data collections, 9782 patients were recruited from 12 countries for the field-testing. The four phases of development resulted in a total of 260 unique items across the 14 domains. Each item bank consists of 7-34 items. Psychometric evaluations indicated higher measurement precision and increased statistical power of the CAT measures compared to the QLQ-C30 scales. Using CAT, sample size requirements may be reduced by approximately 20-35% on average without loss of power. The EORTC CAT Core represents a more precise, powerful and flexible measurement system than the QLQ-C30. It is currently being validated in a large independent, international sample of cancer patients. Copyright © 2018 Elsevier Ltd. All rights reserved.
The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration

PubMed Central

Liberati, Alessandro; Altman, Douglas G; Tetzlaff, Jennifer; Mulrow, Cynthia; Gøtzsche, Peter C; Ioannidis, John P A; Clarke, Mike; Devereaux, P J; Kleijnen, Jos; Moher, David

2009-01-01

Systematic reviews and meta-analyses are essential to summarise evidence relating to efficacy and safety of healthcare interventions accurately and reliably. The clarity and transparency of these reports, however, are not optimal. Poor reporting of systematic reviews diminishes their value to clinicians, policy makers, and other users. Since the development of the QUOROM (quality of reporting of meta-analysis) statement—a reporting guideline published in 1999—there have been several conceptual, methodological, and practical advances regarding the conduct and reporting of systematic reviews and meta-analyses. Also, reviews of published systematic reviews have found that key information about these studies is often poorly reported. Realising these issues, an international group that included experienced authors and methodologists developed PRISMA (preferred reporting items for systematic reviews and meta-analyses) as an evolution of the original QUOROM guideline for systematic reviews and meta-analyses of evaluations of health care interventions. The PRISMA statement consists of a 27-item checklist and a four-phase flow diagram. The checklist includes items deemed essential for transparent reporting of a systematic review. In this explanation and elaboration document, we explain the meaning and rationale for each checklist item. For each item, we include an example of good reporting and, where possible, references to relevant empirical studies and methodological literature. The PRISMA statement, this document, and the associated website (www.prisma-statement.org/) should be helpful resources to improve reporting of systematic reviews and meta-analyses. PMID:19622552
Performance of the Swedish version of the Revised Piper Fatigue Scale.

PubMed

Jakobsson, Sofie; Taft, Charles; Östlund, Ulrika; Ahlberg, Karin

2013-12-01

The Revised Piper Fatigue scale is one of the most widely used instruments internationally to assess cancer-related fatigue. The aim of the present study was to evaluate selected psychometric properties of a Swedish version of the RPFS (SPFS). An earlier translation of the SPFS was further evaluated and developed. The new version was mailed to 300 patients undergoing curative radiotherapy. The internal validity was assessed using Principal Axis Factor Analysis with oblimin rotation and multitrait analysis. External validity was examined in relation to the Multidimensional Fatigue Inventory-20 (MFI-20) and in known-groups analyses. Totally 196 patients (response rate = 65%) returned evaluable questionnaires. Principal axis factoring analysis yielded three factors (74% of the variance) rather than four as in the original RPFS. Multitrait analyses confirmed the adequacy of scaling assumptions. Known-groups analyses failed to support the discriminative validity. Concurrent validity was satisfactory. The new Swedish version of the RPFS showed good acceptability, reliability and convergent and- discriminant item-scale validity. Our results converge with other international versions of the RPFS in failing to support the four-dimension conceptual model of the instrument. Hence, RPFS suitability for use in international comparisons may be limited which also may have implications for cross-cultural validity of the newly released 12-item version of the RPFS. Further research on the Swedish version should address reasons for high missing rates for certain items in the subscale of affective meaning, further evaluation of the discriminative validity and assessment of its sensitivity in detecting changes over time. Copyright © 2013 Elsevier Ltd. All rights reserved.
The effect of computer-mediated administration on self-disclosure of problems on the addiction severity index.

PubMed

Butler, Stephen F; Villapiano, Albert; Malinow, Andrew

2009-12-01

People tend to disclose more personal information when communication is mediated through the use of a computer. This study was conducted to examine the impact of this phenomenon on the way respondents answer questions during computer-mediated, self-administration of the Addiction Severity Index (ASI) called the Addiction Severity Index-Multimedia Version((R)) (ASI-MV((R))). A sample of 142 clients in substance abuse treatment was administered the ASI via an interviewer and the computerized ASI-MV((R)), three to five days apart in a counterbalanced order. Seven composite scores were compared between the two test administrations using paired t-tests. Post hoc analyses examined interviewer effects. Comparisons of composite scores for each of the domains between the face-to-face administered and computer-mediated, self-administered ASI revealed that significantly greater problem severity was reported by clients in five of the seven domains during administration of the computer-mediated, self-administered version compared to the trained interviewer version. Item analyses identified certain items as responsible for significant differences, especially those asking clients to rate need for treatment. All items that were significantly different between the two modes of administration revealed greater problem severity reported on the ASI-MV((R)) as compared to the interview administered assessment. Post hoc analyses yielded significant interviewer effects on four of the five domains where differences were observed. These data support a growing literature documenting a tendency for respondents to be more self-disclosing in a computer-mediated format over a face-to-face interview. Differences in interviewer skill in establishing rapport may account for these observations.
Score Metric Equivalence of the Psychopathy Checklist-Revised (PCL-R) across Criminal Offenders in North America and the United Kingdom. A Critique of Cooke, Michie, Hart, and Clark (2005) and New Analyses

ERIC Educational Resources Information Center

Bolt, Daniel M.; Hare, Robert D.; Neumann, Craig S.

2007-01-01

David Cooke and colleagues have published a series of item response theory (IRT) studies investigating the equivalence of the Psychopathy Checklist-Revised (PCL-R) for European versus North American (NA) male criminal offenders. They have consistently concluded that PCL-R scores are not equivalent, with European offenders receiving scores up to…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.