constructed response items: Topics by Science.gov

Sample records for constructed response items

A HO-IRT Based Diagnostic Assessment System with Constructed Response Items

ERIC Educational Resources Information Center

Yang, Chih-Wei; Kuo, Bor-Chen; Liao, Chen-Huei

2011-01-01

The aim of the present study was to develop an on-line assessment system with constructed response items in the context of elementary mathematics curriculum. The system recorded the problem solving process of constructed response items and transfered the process to response codes for further analyses. An inference mechanism based on artificial…
An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

ERIC Educational Resources Information Center

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
Item Construction and Psychometric Models Appropriate for Constructed Responses

DTIC Science & Technology

1991-08-01

which involve only one attribute per item. This is especially true when we are dealing with constructed-response items, we have to measure much more...Service University of Ilinois Educacional Testing Service Rosedal Road Capign. IL 61801 Princeton. K3 08541 Princeton. N3 08541 Dr. Charles LeiS Dr
Applying the Nominal Response Model within a Longitudinal Framework to Construct the Positive Family Relationships Scale

ERIC Educational Resources Information Center

Preston, Kathleen Suzanne Johnson; Parral, Skye N.; Gottfried, Allen W.; Oliver, Pamella H.; Gottfried, Adele Eskeles; Ibrahim, Sirena M.; Delany, Danielle

2015-01-01

A psychometric analysis was conducted using the nominal response model under the item response theory framework to construct the Positive Family Relationships scale. Using data from the Fullerton Longitudinal Study, this scale was constructed within a long-term longitudinal framework spanning middle childhood through adolescence. Items tapping…
Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds: Multiple-Choice versus Constructed-Response Training Items

ERIC Educational Resources Information Center

Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.

2016-01-01

Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

ERIC Educational Resources Information Center

Wan, Lei; Henly, George A.

2012-01-01

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

ERIC Educational Resources Information Center

Lee, Woo-yeol; Cho, Sun-Joo

2017-01-01

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
The Effect of the Multiple-Choice Item Format on the Measurement of Knowledge of Language Structure

ERIC Educational Resources Information Center

Currie, Michael; Chiramanee, Thanyapa

2010-01-01

Noting the widespread use of multiple-choice items in tests in English language education in Thailand, this study compared their effect against that of constructed-response items. One hundred and fifty-two university undergraduates took a test of English structure first in constructed-response format, and later in three, stem-equivalent…
A Quasi-Parametric Method for Fitting Flexible Item Response Functions

ERIC Educational Resources Information Center

Liang, Longjuan; Browne, Michael W.

2015-01-01

If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
Developing a Machine-Supported Coding System for Constructed-Response Items in PISA. Research Report. ETS RR-17-47

ERIC Educational Resources Information Center

Yamamoto, Kentaro; He, Qiwei; Shin, Hyo Jeong; von Davier, Mattias

2017-01-01

Approximately a third of the Programme for International Student Assessment (PISA) items in the core domains (math, reading, and science) are constructed-response items and require human coding (scoring). This process is time-consuming, expensive, and prone to error as often (a) humans code inconsistently, and (b) coding reliability in…
Identification of high school students' ability level of constructing free body diagrams to solve restricted and structured response items in force matter

NASA Astrophysics Data System (ADS)

Rahmaniar, Andinisa; Rusnayati, Heni; Sutiadi, Asep

2017-05-01

While solving physics problem particularly in force matter, it is needed to have the ability of constructing free body diagrams which can help students to analyse every force which acts on an object, the length of its vector and the naming of its force. Mix method was used to explain the result without any special treatment to participants. The participants were high school students in first grade totals 35 students. The purpose of this study is to identify students' ability level of constructing free body diagrams in solving restricted and structured response items. Considering of two types of test, every student would be classified into four levels ability of constructing free body diagrams which is every level has different characteristic and some students were interviewed while solving test in order to know how students solve the problem. The result showed students' ability of constructing free body diagrams on restricted response items about 34.86% included in no evidence of level, 24.11% inadequate level, 29.14% needs improvement level and 4.0% adequate level. On structured response items is about 16.59% included no evidence of level, 23.99% inadequate level, 36% needs improvement level, and 13.71% adequate level. Researcher found that students who constructed free body diagrams first and constructed free body diagrams correctly were more successful in solving restricted and structured response items.
Construct Validity Evidence for Single-Response Items to Estimate Physical Activity Levels in Large Sample Studies

ERIC Educational Resources Information Center

Jackson, Allen W.; Morrow, James R., Jr.; Bowles, Heather R.; FitzGerald, Shannon J.; Blair, Steven N.

2007-01-01

Valid measurement of physical activity is important for studying the risks for morbidity and mortality. The purpose of this study was to examine evidence of construct validity of two similar single-response items assessing physical activity via self-report. Both items are based on the stages of change model. The sample was 687 participants (men =…
Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Developing a Strategy for Using Technology-Enhanced Items in Large-Scale Standardized Tests

ERIC Educational Resources Information Center

Bryant, William

2017-01-01

As large-scale standardized tests move from paper-based to computer-based delivery, opportunities arise for test developers to make use of items beyond traditional selected and constructed response types. Technology-enhanced items (TEIs) have the potential to provide advantages over conventional items, including broadening construct measurement,…
Evaluation of the Psychometric Properties of the Asian Adolescent Depression Scale and Construction of a Short Form: An Item Response Theory Analysis.

PubMed

Lo, Barbara Chuen Yee; Zhao, Yue; Kwok, Alice Wai Yee; Chan, Wai; Chan, Calais Kin Yuen

2017-07-01

The present study applied item response theory to examine the psychometric properties of the Asian Adolescent Depression Scale and to construct a short form among 1,084 teenagers recruited from secondary schools in Hong Kong. Findings suggested that some items of the full form reflected higher levels of severity and were more discriminating than others, and the Asian Adolescent Depression Scale was useful in measuring a broad range of depressive severity in community youths. Differential item functioning emerged in several items where females reported higher depressive severity than males. In the short form construction, preliminary validation suggested that, relative to the 20-item full form, our derived short form offered significantly greater diagnostic performance and stronger discriminatory ability in differentiating depressed and nondepressed groups, and simultaneously maintained adequate measurement precision with a reduced response burden in assessing depression in the Asian adolescents. Cultural variance in depressive symptomatology and clinical implications are discussed.
Missouri Assessment Program (MAP), Spring 2000: High School Health/Physical Education, Released Items, Grade 9.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Validation of Automated Scoring of Science Assessments

ERIC Educational Resources Information Center

Liu, Ou Lydia; Rios, Joseph A.; Heilman, Michael; Gerard, Libby; Linn, Marcia C.

2016-01-01

Constructed response items can both measure the coherence of student ideas and serve as reflective experiences to strengthen instruction. We report on new automated scoring technologies that can reduce the cost and complexity of scoring constructed-response items. This study explored the accuracy of c-rater-ML, an automated scoring engine…
Constructed-Response Problems

ERIC Educational Resources Information Center

Swinford, Ashleigh

2016-01-01

With rigor outlined in state and Common Core standards and the addition of constructed-response test items to most state tests, math constructed-response questions have become increasingly popular in today's classroom. Although constructed-response problems can present a challenge for students, they do offer a glimpse of students' learning through…
Firestar-"D": Computerized Adaptive Testing Simulation Program for Dichotomous Item Response Theory Models

ERIC Educational Resources Information Center

Choi, Seung W.; Podrabsky, Tracy; McKinney, Natalie

2012-01-01

Computerized adaptive testing (CAT) enables efficient and flexible measurement of latent constructs. The majority of educational and cognitive measurement constructs are based on dichotomous item response theory (IRT) models. An integral part of developing various components of a CAT system is conducting simulations using both known and empirical…
Developing and Evaluating a Machine-Scorable, Constrained Constructed-Response Item.

ERIC Educational Resources Information Center

Braun, Henry I.; And Others

The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non-multiple choice…

Single- versus Double-Scoring of Trend Responses in Trend Score Equating with Constructed-Response Tests. Research Report. ETS RR-10-12

ERIC Educational Resources Information Center

Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam

2010-01-01

This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…
Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

PubMed Central

Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J

2004-01-01

Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
Applying Item Response Theory methods to design a learning progression-based science assessment

NASA Astrophysics Data System (ADS)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all the defined boundaries. This ensures the accuracy of the classification. Third, when item threshold parameters vary a bit, the scoring rubrics and the items need to be reviewed to make the threshold parameters similar across items. This is because one important design criterion of the learning progression-based items is that ideally, a student should be at the same level across items, which means that the item threshold parameters (d1, d 2 and d3) should be similar across items. To design a learning progression-based science assessment, we need to understand whether the assessment measures a single construct or several constructs and how items are associated with the constructs being measured. Results from dimension analyses indicate that items of different carbon transforming processes measure different aspects of the carbon cycle construct. However, items of different practices assess the same construct. In general, there are high correlations among different processes or practices. It is not clear whether the strong correlations are due to the inherent links among these process/practice dimensions or due to the fact that the student sample does not show much variation in these process/practice dimensions. Future data are needed to examine the dimensionalities in terms of process/practice in detail. Finally, based on item characteristics analysis, recommendations are made to write more discriminative CR items and better OMC, MTF options. Item writers can follow these recommendations to write better learning progression-based items.
An Extension of IRT-Based Equating to the Dichotomous Testlet Response Theory Model

ERIC Educational Resources Information Center

Tao, Wei; Cao, Yi

2016-01-01

Current procedures for equating number-correct scores using traditional item response theory (IRT) methods assume local independence. However, when tests are constructed using testlets, one concern is the violation of the local item independence assumption. The testlet response theory (TRT) model is one way to accommodate local item dependence.…
Measuring Constructs in Family Science: How Can Item Response Theory Improve Precision and Validity?

PubMed Central

Gordon, Rachel A.

2014-01-01

This article provides family scientists with an understanding of contemporary measurement perspectives and the ways in which item response theory (IRT) can be used to develop measures with desired evidence of precision and validity for research uses. The article offers a nontechnical introduction to some key features of IRT, including its orientation toward locating items along an underlying dimension and toward estimating precision of measurement for persons with different levels of that same construct. It also offers a didactic example of how the approach can be used to refine conceptualization and operationalization of constructs in the family sciences, using data from the National Longitudinal Survey of Youth 1979 (n = 2,732). Three basic models are considered: (a) the Rasch and (b) two-parameter logistic models for dichotomous items and (c) the Rating Scale Model for multicategory items. Throughout, the author highlights the potential for researchers to elevate measurement to a level on par with theorizing and testing about relationships among constructs. PMID:25663714
Developing Form Assembly Specifications for Exams with Multiple Choice and Constructed Response Items: Balancing Reliability and Validity Concerns

ERIC Educational Resources Information Center

Hendrickson, Amy; Patterson, Brian; Ewing, Maureen

2010-01-01

The psychometric considerations and challenges associated with including constructed response items on tests are discussed along with how these issues affect the form assembly specifications for mixed-format exams. Reliability and validity, security and fairness, pretesting, content and skills coverage, test length and timing, weights, statistical…
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program

PubMed Central

2013-01-01

Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program.

PubMed

Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M

2013-03-04

Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Assessing Construct Validity Using Multidimensional Item Response Theory.

ERIC Educational Resources Information Center

Ackerman, Terry A.

The concept of a user-specified validity sector is discussed. The idea of the validity sector combines the work of M. D. Reckase (1986) and R. Shealy and W. Stout (1991). Reckase developed a methodology to represent an item in a multidimensional latent space as a vector. Item vectors are computed using multidimensional item response theory item…
Evaluation of Internal Construct Validity and Unidimensionality of the Brachial Assessment Tool, A Patient-Reported Outcome Measure for Brachial Plexus Injury.

PubMed

Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea

2016-12-01

To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright Â© 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The Construction of a Long Variable of Conceptual Development in Social Education.

ERIC Educational Resources Information Center

Doig, Brian

This paper demonstrates a method for constructing long variables using items that elicit partically correct responses across ages. Long variables may be defined by students at different ages (year levels) attempting common items within a test containing other items considered to be appropriate for each age or year level. A developmental model of…
Evaluating and Refining the Construct of Sexual Quality With Item Response Theory: Development of the Quality of Sex Inventory.

PubMed

Shaw, Amanda M; Rogge, Ronald D

2016-02-01

This study took a critical look at the construct of sexual quality. The 65 items of four well-validated self-report measures of sexual satisfaction (the Index of Sexual Satisfaction [ISS], Hudson, Harrison, & Crosscup, 1981; the Global Measure of Sexual Satisfaction [GMSEX], Lawrance & Byers, 1995; the Pinney Sexual Satisfaction Inventory [PSSI], Pinney, Gerrard, & Denney, 1987; the Young Sexual Satisfaction Scale [YSSS], Young, Denny, Luquis, & Young, 1998) and an additional 74 potential sexual quality items were given to 3060 online participants. Using Item Response Theory (IRT), we demonstrated that the ISS, YSSS, and PSSI scales provided suboptimal levels of precision in assessing sexual quality, particularly given the length of those scales. Exploratory factor analyses, IRT, differential item functioning analyses, and longitudinal responsiveness analyses were used to develop and evaluate the Quality of Sex Inventory. Results suggested that, in comparison to existing scales, the QSI (1) offers investigators and clinicians more theoretically focused scales, (2) distinguishes sexual satisfaction from sexual dissatisfaction, and (3) offers greater precision and power for detecting differences with (4) comparably high levels of responsiveness for detecting change over time despite being notably shorter than most of the existing scales. The QSI-satisfaction subscales demonstrated strong convergent validity with other measures of sexual satisfaction and excellent construct validity with anchor scales from the nomological net surrounding that construct, suggesting that they continue to assess the same theoretical construct as prior scales. Implications for research are discussed.
Extreme Response Style: Which Model Is Best?

ERIC Educational Resources Information Center

Leventhal, Brian

2017-01-01

More robust and rigorous psychometric models, such as multidimensional Item Response Theory models, have been advocated for survey applications. However, item responses may be influenced by construct-irrelevant variance factors such as preferences for extreme response options. Through empirical and simulation methods, this study evaluates the use…
Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain

PubMed Central

Crins, Martine H. P.; Roorda, Leo D.; Smits, Niels; de Vet, Henrica C. W.; Westhovens, Rene; Cella, David; Cook, Karon F.; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B.

2015-01-01

The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach’s alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach’s alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed. PMID:26214178
Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain.

PubMed

Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B

2015-01-01

The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.
Discriminant content validity: a quantitative methodology for assessing content of theory-based measures, with illustrative applications.

PubMed

Johnston, Marie; Dixon, Diane; Hart, Jo; Glidewell, Liz; Schröder, Carin; Pollard, Beth

2014-05-01

In studies involving theoretical constructs, it is important that measures have good content validity and that there is not contamination of measures by content from other constructs. While reliability and construct validity are routinely reported, to date, there has not been a satisfactory, transparent, and systematic method of assessing and reporting content validity. In this paper, we describe a methodology of discriminant content validity (DCV) and illustrate its application in three studies. Discriminant content validity involves six steps: construct definition, item selection, judge identification, judgement format, single-sample test of content validity, and assessment of discriminant items. In three studies, these steps were applied to a measure of illness perceptions (IPQ-R) and control cognitions. The IPQ-R performed well with most items being purely related to their target construct, although timeline and consequences had small problems. By contrast, the study of control cognitions identified problems in measuring constructs independently. In the final study, direct estimation response formats for theory of planned behaviour constructs were found to have as good DCV as Likert format. The DCV method allowed quantitative assessment of each item and can therefore inform the content validity of the measures assessed. The methods can be applied to assess content validity before or after collecting data to select the appropriate items to measure theoretical constructs. Further, the data reported for each item in Appendix S1 can be used in item or measure selection. Statement of contribution What is already known on this subject? There are agreed methods of assessing and reporting construct validity of measures of theoretical constructs, but not their content validity. Content validity is rarely reported in a systematic and transparent manner. What does this study add? The paper proposes discriminant content validity (DCV), a systematic and transparent method of assessing and reporting whether items assess the intended theoretical construct and only that construct. In three studies, DCV was applied to measures of illness perceptions, control cognitions, and theory of planned behaviour response formats. Appendix S1 gives content validity indices for each item of each questionnaire investigated. Discriminant content validity is ideally applied while the measure is being developed, before using to measure the construct(s), but can also be applied after using a measure. © 2014 The British Psychological Society.
Effect of response format on cognitive reflection: Validating a two- and four-option multiple choice question version of the Cognitive Reflection Test.

PubMed

Sirota, Miroslav; Juanchich, Marie

2018-03-27

The Cognitive Reflection Test, measuring intuition inhibition and cognitive reflection, has become extremely popular because it reliably predicts reasoning performance, decision-making, and beliefs. Across studies, the response format of CRT items sometimes differs, based on the assumed construct equivalence of tests with open-ended versus multiple-choice items (the equivalence hypothesis). Evidence and theoretical reasons, however, suggest that the cognitive processes measured by these response formats and their associated performances might differ (the nonequivalence hypothesis). We tested the two hypotheses experimentally by assessing the performance in tests with different response formats and by comparing their predictive and construct validity. In a between-subjects experiment (n = 452), participants answered stem-equivalent CRT items in an open-ended, a two-option, or a four-option response format and then completed tasks on belief bias, denominator neglect, and paranormal beliefs (benchmark indicators of predictive validity), as well as on actively open-minded thinking and numeracy (benchmark indicators of construct validity). We found no significant differences between the three response formats in the numbers of correct responses, the numbers of intuitive responses (with the exception of the two-option version, which had a higher number than the other tests), and the correlational patterns of the indicators of predictive and construct validity. All three test versions were similarly reliable, but the multiple-choice formats were completed more quickly. We speculate that the specific nature of the CRT items helps build construct equivalence among the different response formats. We recommend using the validated multiple-choice version of the CRT presented here, particularly the four-option CRT, for practical and methodological reasons. Supplementary materials and data are available at https://osf.io/mzhyc/ .
What Do You Think You Are Measuring? A Mixed-Methods Procedure for Assessing the Content Validity of Test Items and Theory-Based Scaling

PubMed Central

Koller, Ingrid; Levenson, Michael R.; Glück, Judith

2017-01-01

The valid measurement of latent constructs is crucial for psychological research. Here, we present a mixed-methods procedure for improving the precision of construct definitions, determining the content validity of items, evaluating the representativeness of items for the target construct, generating test items, and analyzing items on a theoretical basis. To illustrate the mixed-methods content-scaling-structure (CSS) procedure, we analyze the Adult Self-Transcendence Inventory, a self-report measure of wisdom (ASTI, Levenson et al., 2005). A content-validity analysis of the ASTI items was used as the basis of psychometric analyses using multidimensional item response models (N = 1215). We found that the new procedure produced important suggestions concerning five subdimensions of the ASTI that were not identifiable using exploratory methods. The study shows that the application of the suggested procedure leads to a deeper understanding of latent constructs. It also demonstrates the advantages of theory-based item analysis. PMID:28270777
Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses

NASA Astrophysics Data System (ADS)

Sukkarieh, Jana Z.

2011-03-01

Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community—one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types—which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.
Optimal Item Selection with Credentialing Examinations.

ERIC Educational Resources Information Center

Hambleton, Ronald K.; And Others

The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…

Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior

ERIC Educational Resources Information Center

Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia

2016-01-01

The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…
Development and Application of Methods for Estimating Operating Characteristics of Discrete Test Item Responses without Assuming any Mathematical Form.

ERIC Educational Resources Information Center

Samejima, Fumiko

In latent trait theory the latent space, or space of the hypothetical construct, is usually represented by some unidimensional or multi-dimensional continuum of real numbers. Like the latent space, the item response can either be treated as a discrete variable or as a continuous variable. Latent trait theory relates the item response to the latent…
Perceived freedom-responsibility covariation among Cypriot adolescents.

PubMed

Frangou, Georgia; Wilkerson, Keith; McGahan, Joseph R

2008-04-01

Participants were 67 Cypriot adolescents who responded to propositions regarding positive, negative, and noncontingent relations between freedom and responsibility. The authors framed items so that half dealt with freedom given responsibility, and the other half dealt with responsibility given freedom. Results indicated participants were more likely to endorse positive-contingency items than they were negative and noncontingency items when items were framed around freedom given responsibility. However, when items were framed around responsibility given freedom, no such differences emerged. The authors discuss results relative to cultural and sociopolitical differences and similarities between children in Cypress and participants in the United States and implications concerning the present study and previous studies regarding these constructs.
A signal detection-item response theory model for evaluating neuropsychological measures.

PubMed

Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

2018-02-05

Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
Algorithms for the Construction of Parallel Tests by Zero-One Programming. Project Psychometric Aspects of Item Banking No. 7. Research Report 86-7.

ERIC Educational Resources Information Center

Boekkooi-Timminga, Ellen

Nine methods for automated test construction are described. All are based on the concepts of information from item response theory. Two general kinds of methods for the construction of parallel tests are presented: (1) sequential test design; and (2) simultaneous test design. Sequential design implies that the tests are constructed one after the…
Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks

PubMed Central

Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando

2014-01-01

Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843
Estimating the Nominal Response Model under Nonnormal Conditions

ERIC Educational Resources Information Center

Preston, Kathleen Suzanne Johnson; Reise, Steven Paul

2014-01-01

The nominal response model (NRM), a much understudied polytomous item response theory (IRT) model, provides researchers the unique opportunity to evaluate within-item category distinctions. Polytomous IRT models, such as the NRM, are frequently applied to psychological assessments representing constructs that are unlikely to be normally…
Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?

ERIC Educational Resources Information Center

DeMars, Christine

Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…
A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

ERIC Educational Resources Information Center

Yao, Lihua; Schwarz, Richard D.

2006-01-01

Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…
Covariates of the Rating Process in Hierarchical Models for Multiple Ratings of Test Items

ERIC Educational Resources Information Center

Mariano, Louis T.; Junker, Brian W.

2007-01-01

When constructed response test items are scored by more than one rater, the repeated ratings allow for the consideration of individual rater bias and variability in estimating student proficiency. Several hierarchical models based on item response theory have been introduced to model such effects. In this article, the authors demonstrate how these…
Measurement of activity limitations and participation restrictions: examination of ICF-linked content and scale properties of the FIM and PC-PART instruments.

PubMed

Darzins, Susan W; Imms, Christine; Di Stefano, Marilyn

2017-05-01

To explore the operationalization of activity and participation-related measurement constructs through comparison of item phrasing, item response categories and scoring (scale properties) for two separate instruments targeting activities of daily living. Personal Care Participation Assessment and Resource Tool (PC-PART) item content was linked to ICF categories using established linking rules. Previously reported ICF-linked FIM content categories and ICF-linked PC-PART content categories were compared to identify common ICF categories between the instruments. Scale properties of both instruments were compared using a patient scenario to explore the instruments' separate measurement constructs. The PC-PART and FIM shared 15 of the 53 level two ICF-linked categories identified across both instruments. Examination of the instruments' scale properties for items with overlapping ICF content, and exploration through a patient scenario, provided supportive evidence that the instruments measure different constructs. While the PC-PART and FIM share common ICF-linked content, they measure separate constructs. Measurement construct was influenced by the instruments' scale properties. The FIM was observed to measure activity limitations and the PC-PART measured participation restrictions. Scrutiny of instruments' scale properties in addition to item content is critical in the operationalization of activity and participation-related measurement constructs. Implications for Rehabilitation When selecting outcome measures for use in rehabilitation it is necessary to examine both the content of the instruments' items and item phrasing, response categories and scoring, to clarify the construct being measured. Measurement of activity limitations as well as participation restrictions in activities of daily living required for community life provides a more comprehensive measurement of rehabilitation outcomes than measurement of either construct alone. To measure the effects of interventions used in rehabilitation, it is necessary to select measures with relevant content and scale properties that enable evaluation of change in the constructs that are expected to change, as a result of the rehabilitation intervention.
23 CFR 635.116 - Subcontracting and contractor responsibilities.

Code of Federal Regulations, 2010 CFR

2010-04-01

... TRAFFIC OPERATIONS CONSTRUCTION AND MAINTENANCE Contract Procedures § 635.116 Subcontracting and contractor responsibilities. (a) Contracts for projects shall specify the minimum percentage of work that a... total original contract price excluding any identified specialty items. Specialty items may be performed...
Item response theory scoring and the detection of curvilinear relationships.

PubMed

Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A

2017-03-01

Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Methodology for developing and evaluating the PROMIS smoking item banks.

PubMed

Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando

2014-09-01

This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Ramsay-Curve Differential Item Functioning

ERIC Educational Resources Information Center

Woods, Carol M.

2011-01-01

Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
The Work Instability Scale for Rheumatoid Arthritis (RA-WIS): Does it work in osteoarthritis?

PubMed

Tang, Kenneth; Beaton, Dorcas E; Lacaille, Diane; Gignac, Monique A M; Zhang, Wei; Anis, Aslam H; Bombardier, Claire

2010-09-01

To validate the 23-item Work Instability Scale for Rheumatoid Arthritis (RA-WIS) for use in osteoarthritis (OA) using both classical test theory and item response theory approaches. Baseline and 12-month follow-up data were collected from workers with OA recruited from community and clinical settings (n = 130). Fit of RA-WIS data to the Rasch model was evaluated by item- and person-fit statistics (size of residual, chi-sq), assessments of differential item functioning, and tests of unidimensionality and local independence. Internal consistency was assessed by KR-20. Convergent construct validity (Spearman r, known-groups) was evaluated against theoretical constructs that assess impact of health on work. Responsiveness to global indicators of change was assessed by standardized response means (SRM) and area under the receiver operating characteristic curves. Data structure of the RA-WIS showed adequate fit to the Rasch model (chi-sq = 83.2, P = 0.03) after addressing local dependency in three item pairs by creating testlets. High internal consistency (KR-20 = 0.93) and convergent validity with work-oriented constructs (|r| = 0.55-0.77) were evident. The RA-WIS correlated most strongly with the concept of illness intrusiveness (r = 0.77) and was highly responsive to changes (SRM = 1.05 [deterioration]; -0.78 [improvement]). Although developed for RA, the RA-WIS is psychometrically sound for OA and demonstrates interval-level property.
Confirmatory Factor Analysis of the Finnish Job Content Questionnaire (JCQ) in 590 Professional Musicians.

PubMed

Vastamäki, Heidi; Vastamäki, Martti; Laimi, Katri; Saltychev, Michail

2017-07-01

Poorly functioning work environments may lead to dissatisfaction for the employees and financial loss for the employers. The Job Content Questionnaire (JCQ) was designed to measure social and psychological characteristics of work environments. To investigate the factor construct of the Finnish 14-item version of JCQ when applied to professional orchestra musicians. In a cross-sectional survey, the questionnaire was sent by mail to 1550 orchestra musicians and students. 630 responses were received. Full data were available for 590 respondents (response rate 38%).The questionnaire also contained questions on demographics, job satisfaction, health status, health behaviors, and intensity of playing music. Confirmatory factor analysis of the 2-factor model of JCQ was conducted. Of the 5 estimates, JCQ items in the "job demand" construct, the "conflicting demands" (question 5) explained most of the total variance in this construct (79%) demonstrating almost perfect correlation of 0.63. In the construct of "job control," "opinions influential" (question 10) demonstrated a perfect correlation index of 0.84 and the items "little decision freedom" (question 14) and "allows own decisions" (question 6) showed substantial correlations of 0.77 and 0.65. The 2-factor model of the Finnish 14-item version of JCQ proposed in this study fitted well into the observed data. The "conflicting demands," "opinions influential," "little decision freedom," and "allows own decisions" items demonstrated the strongest correlations with latent factors suggesting that in a population similar to the studied one, especially these items should be taken into account when observed in the response of a population.
Forced-Choice Assessment of Work-Related Maladaptive Personality Traits: Preliminary Evidence From an Application of Thurstonian Item Response Modeling.

PubMed

Guenole, Nigel; Brown, Anna A; Cooper, Andrew J

2018-06-01

This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
Item Vector Plots for the Multidimensional Three-Parameter Logistic Model

ERIC Educational Resources Information Center

Bryant, Damon; Davis, Larry

2011-01-01

This brief technical note describes how to construct item vector plots for dichotomously scored items fitting the multidimensional three-parameter logistic model (M3PLM). As multidimensional item response theory (MIRT) shows promise of being a very useful framework in the test development life cycle, graphical tools that facilitate understanding…
Environmental Knowledge and Beliefs among Grade 10 Students in Australia.

ERIC Educational Resources Information Center

Eyers, Vivian George

To develop environmental education in Australia, a survey of tenth-grade students was undertaken. Thirty knowledge items and ten belief items were constructed. A panel of environmentalists and educators identified best responses for the knowledge items, and a common reference point, preservation of homo sapiens, for the belief items, so a…

Measuring organizational effectiveness in information and communication technology companies using item response theory.

PubMed

Trierweiller, Andréa Cristina; Peixe, Blênio César Severo; Tezza, Rafael; Pereira, Vera Lúcia Duarte do Valle; Pacheco, Waldemar; Bornia, Antonio Cezar; de Andrade, Dalton Francisco

2012-01-01

The aim of this paper is to measure the effectiveness of the organizations Information and Communication Technology (ICT) from the point of view of the manager, using Item Response Theory (IRT). There is a need to verify the effectiveness of these organizations which are normally associated to complex, dynamic, and competitive environments. In academic literature, there is disagreement surrounding the concept of organizational effectiveness and its measurement. A construct was elaborated based on dimensions of effectiveness towards the construction of the items of the questionnaire which submitted to specialists for evaluation. It demonstrated itself to be viable in measuring organizational effectiveness of ICT companies under the point of view of a manager through using Two-Parameter Logistic Model (2PLM) of the IRT. This modeling permits us to evaluate the quality and property of each item placed within a single scale: items and respondents, which is not possible when using other similar tools.
Preschool Gifted Education: Perceived Challenges Associated with Program Development

ERIC Educational Resources Information Center

Kettler, Todd; Oveross, Mattie E.; Salman, Rania C.

2017-01-01

This descriptive study investigated the challenges related to implementing gifted education services in preschool centers. Participants were 254 licensed preschool center directors in a southern state. Participants completed a researcher-created survey including both selected response items and constructed response items to examine the perceived…
Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure

PubMed Central

McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.

2013-01-01

Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342
Calibration of the Dutch-Flemish PROMIS Pain Behavior item bank in patients with chronic pain.

PubMed

Crins, M H P; Roorda, L D; Smits, N; de Vet, H C W; Westhovens, R; Cella, D; Cook, K F; Revicki, D; van Leeuwen, J; Boers, M; Dekker, J; Terwee, C B

2016-02-01

The aims of the current study were to calibrate the item parameters of the Dutch-Flemish PROMIS Pain Behavior item bank using a sample of Dutch patients with chronic pain and to evaluate cross-cultural validity between the Dutch-Flemish and the US PROMIS Pain Behavior item banks. Furthermore, reliability and construct validity of the Dutch-Flemish PROMIS Pain Behavior item bank were evaluated. The 39 items in the bank were completed by 1042 Dutch patients with chronic pain. To evaluate unidimensionality, a one-factor confirmatory factor analysis (CFA) was performed. A graded response model (GRM) was used to calibrate the items. To evaluate cross-cultural validity, Differential item functioning (DIF) for language (Dutch vs. English) was evaluated. Reliability of the item bank was also examined and construct validity was studied using several legacy instruments, e.g. the Roland Morris Disability Questionnaire. CFA supported the unidimensionality of the Dutch-Flemish PROMIS Pain Behavior item bank (CFI = 0.960, TLI = 0.958), the data also fit the GRM, and demonstrated good coverage across the pain behavior construct (threshold parameters range: -3.42 to 3.54). Analysis showed good cross-cultural validity (only six DIF items), reliability (Cronbach's α = 0.95) and construct validity (all correlations ≥0.53). The Dutch-Flemish PROMIS Pain Behavior item bank was found to have good cross-cultural validity, reliability and construct validity. The development of the Dutch-Flemish PROMIS Pain Behavior item bank will serve as the basis for Dutch-Flemish PROMIS short forms and computer adaptive testing (CAT). © 2015 European Pain Federation - EFIC®
Measuring the ICF components of impairment, activity limitation and participation restriction: an item analysis using classical test theory and item response theory

PubMed Central

Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie

2009-01-01

Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in the development of measures. PMID:19422677
A Multivariate Multilevel Approach to the Modeling of Accuracy and Speed of Test Takers

ERIC Educational Resources Information Center

Klein Entink, R. H.; Fox, J. P.; van der Linden, W. J.

2009-01-01

Response times on test items are easily collected in modern computerized testing. When collecting both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel…
Test of Achievement in Quantitative Economics for Secondary Schools: Construction and Validation Using Item Response Theory

ERIC Educational Resources Information Center

Eleje, Lydia I.; Esomonu, Nkechi P. M.

2018-01-01

A Test to measure achievement in quantitative economics among secondary school students was developed and validated in this study. The test is made up 20 multiple choice test items constructed based on quantitative economics sub-skills. Six research questions guided the study. Preliminary validation was done by two experienced teachers in…
Measuring the quality of life in hypertension according to Item Response Theory

PubMed Central

Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

2017-01-01

ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

ERIC Educational Resources Information Center

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
The Performance of IRT Model Selection Methods with Mixed-Format Tests

ERIC Educational Resources Information Center

Whittaker, Tiffany A.; Chang, Wanchen; Dodd, Barbara G.

2012-01-01

When tests consist of multiple-choice and constructed-response items, researchers are confronted with the question of which item response theory (IRT) model combination will appropriately represent the data collected from these mixed-format tests. This simulation study examined the performance of six model selection criteria, including the…
Multiple-Choice versus Constructed-Response Tests in the Assessment of Mathematics Computation Skills.

ERIC Educational Resources Information Center

Gadalla, Tahany M.

The equivalence of multiple-choice (MC) and constructed response (discrete) (CR-D) response formats as applied to mathematics computation at grade levels two to six was tested. The difference between total scores from the two response formats was tested for statistical significance, and the factor structure of items in both response formats was…
Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.

PubMed

Shou, Yiyun; Sellbom, Martin; Xu, Jing

2018-05-01

There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
The Effect of Year-to-Year Rater Variation on IRT Linking

ERIC Educational Resources Information Center

Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg

2005-01-01

Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology…
Pesticide applicators questionnaire content validation: A fuzzy delphi method.

PubMed

Manakandan, S K; Rosnah, I; Mohd Ridhuan, J; Priya, R

2017-08-01

The most crucial step in forming a set of survey questionnaire is deciding the appropriate items in a construct. Retaining irrelevant items and removing important items will certainly mislead the direction of a particular study. This article demonstrates Fuzzy Delphi method as one of the scientific analysis technique to consolidate consensus agreement within a panel of experts pertaining to each item's appropriateness. This method reduces the ambiguity, diversity, and discrepancy of the opinions among the experts hence enhances the quality of the selected items. The main purpose of this study was to obtain experts' consensus on the suitability of the preselected items on the questionnaire. The panel consists of sixteen experts from the Occupational and Environmental Health Unit of Ministry of Health, Vector-borne Disease Control Unit of Ministry of Health and Occupational and Safety Health Unit of both public and private universities. A set of questionnaires related to noise and chemical exposure were compiled based on the literature search. There was a total of six constructs with 60 items in which three constructs for knowledge, attitude, and practice of noise exposure and three constructs for knowledge, attitude, and practice of chemical exposure. The validation process replicated recent Fuzzy Delphi method that using a concept of Triangular Fuzzy Numbers and Defuzzification process. A 100% response rate was obtained from all the sixteen experts with an average Likert scoring of four to five. Post FDM analysis, the first prerequisite was fulfilled with a threshold value (d) ≤ 0.2, hence all the six constructs were accepted. For the second prerequisite, three items (21%) from noise-attitude construct and four items (40%) from chemical-practice construct had expert consensus lesser than 75%, which giving rise to about 12% from the total items in the questionnaire. The third prerequisite was used to rank the items within the constructs by calculating the average fuzzy numbers. The seven items which did not fulfill the second prerequisite similarly had lower ranks during the analysis, therefore those items were discarded from the final draft. Post FDM analysis, the experts' consensus on the suitability of the pre-selected items on the questionnaire set were obtained, hence it is now ready for further construct validation process.
Development of a PROMIS item bank to measure pain interference.

PubMed

Amtmann, Dagmar; Cook, Karon F; Jensen, Mark P; Chen, Wen-Hung; Choi, Seung; Revicki, Dennis; Cella, David; Rothrock, Nan; Keefe, Francis; Callahan, Leigh; Lai, Jin-Shei

2010-07-01

This paper describes the psychometric properties of the PROMIS-pain interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on the review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue=35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96-0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p<0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available. Copyright 2010 International Association for the Study of Pain. All rights reserved.
Development and Evaluation of the PROMIS® Pediatric Positive Affect Item Bank, Child-Report and Parent-Proxy Editions.

PubMed

Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B

2018-03-01

The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.
Capturing the true burden of dystonia on patients: the Cervical Dystonia Impact Profile (CDIP-58).

PubMed

Cano, S J; Warner, T T; Linacre, J M; Bhatia, K P; Thompson, A J; Fitzpatrick, R; Hobart, J C

2004-11-09

To develop a new rating scale for measuring the health impact of cervical dystonia (CD) that includes patients' perceptions and complements existing observer dependent clinician rating scales. Scale development was in three stages. In Stage 1, a large pool of items was generated from patient interviews (n = 25), expert opinion, and literature review. In Stage 2, these items were administered by postal survey to people with CD. The resulting data were analyzed using Rasch item analysis to construct, from the item pool, a rating scale that satisfied criteria for rigorous measurement. In Stage 3, the measurement properties of this rating scale were examined in an independent sample of people with CD. In Stage 1, 150 items concerning the health impact of CD were generated. In Stage 2, 556 people completed questionnaires (87% response rate) and a 58-item rating scale measuring the health impact of CD in eight areas was constructed (CD Impact Profile, CDIP-58). In Stage 3, CDIP-58 data from 391 people (87% response rate) were received. Analyses supported the measurement of eight unidimensional constructs (infit mean square range 0.62 to 1.50), item calibration (33.37 to 67.56), and patient separation statistics (2.59 to 3.38). Items demonstrated stable calibrations in subgroups of people with CD supporting the stability of the CDIP-58. The CDIP-58 is a reliable and valid patient-based rating scale measuring the health impact of CD in eight health dimensions.
Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation.

PubMed

Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel

2017-06-15

Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Immediate list recall as a measure of short-term episodic memory: insights from the serial position effect and item response theory.

PubMed

Gavett, Brandon E; Horwitz, Julie E

2012-03-01

The serial position effect shows that two interrelated cognitive processes underlie immediate recall of a supraspan word list. The current study used item response theory (IRT) methods to determine whether the serial position effect poses a threat to the construct validity of immediate list recall as a measure of verbal episodic memory. Archival data were obtained from a national sample of 4,212 volunteers aged 28-84 in the Midlife Development in the United States study. Telephone assessment yielded item-level data for a single immediate recall trial of the Rey Auditory Verbal Learning Test (RAVLT). Two parameter logistic IRT procedures were used to estimate item parameters and the Q(1) statistic was used to evaluate item fit. A two-dimensional model better fit the data than a unidimensional model, supporting the notion that list recall is influenced by two underlying cognitive processes. IRT analyses revealed that 4 of the 15 RAVLT items (1, 12, 14, and 15) were misfit (p < .05). Item characteristic curves for items 14 and 15 decreased monotonically, implying an inverse relationship between the ability level and the probability of recall. Elimination of the four misfit items provided better fit to the data and met necessary IRT assumptions. Performance on a supraspan list learning test is influenced by multiple cognitive abilities; failure to account for the serial position of words decreases the construct validity of the test as a measure of episodic memory and may provide misleading results. IRT methods can ameliorate these problems and improve construct validity.
Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.

PubMed

Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Automatically Scoring Short Essays for Content. CRESST Report 836

ERIC Educational Resources Information Center

Kerr, Deirdre; Mousavi, Hamid; Iseli, Markus R.

2013-01-01

The Common Core assessments emphasize short essay constructed response items over multiple choice items because they are more precise measures of understanding. However, such items are too costly and time consuming to be used in national assessments unless a way is found to score them automatically. Current automatic essay scoring techniques are…
A Two-Parameter Latent Trait Model. Methodology Project.

ERIC Educational Resources Information Center

Choppin, Bruce

On well-constructed multiple-choice tests, the most serious threat to measurement is not variation in item discrimination, but the guessing behavior that may be adopted by some students. Ways of ameliorating the effects of guessing are discussed, especially for problems in latent trait models. A new item response model, including an item parameter…
A Multidimensional Scaling Approach to Dimensionality Assessment for Measurement Instruments Modeled by Multidimensional Item Response Theory

ERIC Educational Resources Information Center

Toro, Maritsa

2011-01-01

The statistical assessment of dimensionality provides evidence of the underlying constructs measured by a survey or test instrument. This study focuses on educational measurement, specifically tests comprised of items described as multidimensional. That is, items that require examinee proficiency in multiple content areas and/or multiple cognitive…
Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior.

PubMed

Tassé, Marc J; Schalock, Robert L; Thissen, David; Balboni, Giulia; Bersani, Henry Hank; Borthwick-Duffy, Sharon A; Spreat, Scott; Widaman, Keith F; Zhang, Dalun; Navas, Patricia

2016-03-01

The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT modeling and a nationally representative standardization sample, the item set was reduced to 75 items that provide the most precise adaptive behavior information at the cutoff area determining the presence or not of significant adaptive behavior deficits across conceptual, social, and practical skills. The standardization of the DABS is described and discussed.
Measuring Constructs in Family Science: How Can Item Response Theory Improve Precision and Validity?

ERIC Educational Resources Information Center

Gordon, Rachel A.

2015-01-01

This article provides family scientists with an understanding of contemporary measurement perspectives and the ways in which item response theory (IRT) can be used to develop measures with desired evidence of precision and validity for research uses. The article offers a nontechnical introduction to some key features of IRT, including its…
Looking Closer at the Effects of Framing on Risky Choice: An Item Response Theory Analysis.

PubMed

Sickar; Highhouse

1998-07-01

Item response theory (IRT) methodology allowed an in-depth examination of several issues that would be difficult to explore using traditional methodology. IRT models were estimated for 4 risky-choice items, answered by students under either a gain or loss frame. Results supported the typical framing finding of risk-aversion for gains and risk-seeking for losses but also suggested that a latent construct we label preference for risk was influential in predicting risky choice. Also, the Asian Disease item, most often used in framing research, was found to have anomalous statistical properties when compared to other framing items. Copyright 1998 Academic Press.
INTRODUCTION TO PATIENT-REPORTED OUTCOME ITEM BANKS: ISSUES IN MINORITY AGING RESEARCH

PubMed Central

Templin, Thomas N; Hays, Ron D; Gershon, Richard C; Rothrock, Nan; Jones, Richard N; Teresi, Jeanne A; Stewart, Anita; Weech-Maldonado, Robert; Wallace, Steve

2014-01-01

In 2004 NIH awarded contracts to initiate the development of high quality psychological and neuropsychological outcome measures for improved assessment of health-related outcomes. The workshop introduced these measurement development initiatives, the measures created, and the NIH supported resource (Assessment Center) for internet or tablet-based test administration and scoring. Presentation covered: (a) item response theory (IRT) and assessment of test bias, (b) construction of item banks and computerized adaptive testing, and (c) the different ways in which qualitative analyses contribute to the definition of construct domains and the refinement of outcome constructs. The panel discussion included questions about representativeness of samples, and assessment of cultural bias. PMID:23570428
The Usability of CAT System for Assessing the Depressive Level of Japanese-A Study on Psychometric Properties and Response Behavior.

PubMed

Iwata, Noboru; Kikuchi, Kenichi; Fujihara, Yuya

2016-08-01

An innovative measurement system using a computerized adaptive testing technique based on the item response theory (CAT) has been expanding to measure mental health status. However, little is known about details in its measurement properties based on the empirical data. Moreover, the response time (RT) data, which are not available by a paper-and-pencil measurement but available by a computerized measurement, would be worth investigating for exploring the response behavior. We aimed at constructing the CAT to measure depressive symptomatology in a community population and exploring its measurement properties. Also, we examined the relationships between RTs, individual item responses, and depressive levels. For constructing the CAT system, responses of 2061 workers and university students to 24 depression scale plus four negatively revised positive affect items were subjected to a polytomous IRT analysis. The stopping rule was set for standard error of estimation < 0.30 or the maximum 15 items displayed. The CAT and non-adaptive computer-based test (CBT) were administered to 209 undergraduates, and 168 of them administered again after 1 week. On average, the CAT was converged by 10.4 items. The θ values estimated by CAT and CBT were highly correlated (r = 0.94 and 0.95 for the 1st and 2nd measurements) and with the traditional scoring procedures (r's > 0.90). The test-retest reliability was at a satisfactory level (r = 0.86). RTs to some items significantly correlated with the θ estimates. The mean RT varied by the item contents and wording, i.e., the RT to positive affect items required additional 2 s or longer than the other subscale items. The CAT would be a reliable and practical measurement tool for various purposes including stress check at workplace.
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory.

PubMed

Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E

2016-04-01

We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
The Nature of Science Instrument-Elementary (NOSI-E): Using Rasch principles to develop a theoretically grounded scale to measure elementary student understanding of the nature of science

NASA Astrophysics Data System (ADS)

Peoples, Shelagh

The purpose of this study was to determine which of three competing models will provide, reliable, interpretable, and responsive measures of elementary students' understanding of the nature of science (NOS). The Nature of Science Instrument-Elementary (NOSI-E), a 28-item Rasch-based instrument, was used to assess students' NOS understanding. The NOS construct was conceptualized using five construct dimensions (Empirical, Inventive, Theory-laden, Certainty and Socially & Culturally Embedded). The competing models represent three internal models for the NOS construct. One postulate is that the NOS construct is unidimensional where one latent construct explains the relationship between the 28 items of the NOSI-E. Alternatively, the NOS construct is composed of five independent unidimensional constructs (the consecutive approach). Lastly, the NOS construct is multidimensional and composed of five inter-related but separate dimensions. A validity argument was developed that hypothesized that the internal structure of the NOS construct is best represented by the multidimensional Rasch model. Four sets of analyses were performed in which the three representations were compared. These analyses addressed five validity aspects (content, substantive, generalizability, structural and external) of construct validity. The vast body of evidence supported the claim that the NOS construct is composed of five separate but inter-related dimensions that is best represented by the multidimensional Rasch model. The results of the multidimensional analyses indicated that the items of the five subscales were of excellent technical quality, exhibited no differential item functioning (based on gender), had an item hierarchy that conformed to theoretical expectations; and together formed subscales of reasonable reliability (> 0.7 on each subscale) that were responsive to change in the construct. Theory-laden scores from the multidimensional model predicted students' science achievement with scores from all five NOS dimensions significantly predicting students' perceptions of the constructivist nature of their classroom learning environment. The NOSI-E instrument is a theoretically grounded scale that can measure elementary students' NOS understanding and appears suitable for use in science education research.
Profile-likelihood Confidence Intervals in Item Response Theory Models.

PubMed

Chalmers, R Philip; Pek, Jolynn; Liu, Yang

2017-01-01

Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.
An item response theory analysis of the narcissistic personality inventory.

PubMed

Ackerman, Robert A; Donnellan, M Brent; Robins, Richard W

2012-01-01

This research uses item response theory methods to evaluate the Narcissistic Personality Inventory (NPI; Raskin & Terry, 1988). Analyses using the 2-parameter logistic model were conducted on the total score and the Corry, Merritt, Mrug, and Pamp (2008) and Ackerman et al. (2011) subscales for the NPI. In addition to offering precise information about the psychometric properties of the NPI item pool, these analyses generated insights that can be used to develop new measures of the personality constructs embedded within this frequently used inventory.
IRT-LR-DIF with Estimation of the Focal-Group Density as an Empirical Histogram

ERIC Educational Resources Information Center

Woods, Carol M.

2008-01-01

Item response theory-likelihood ratio-differential item functioning (IRT-LR-DIF) is used to evaluate the degree to which items on a test or questionnaire have different measurement properties for one group of people versus another, irrespective of group-mean differences on the construct. Usually, the latent distribution is presumed normal for both…
Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models

ERIC Educational Resources Information Center

Andersson, Björn

2016-01-01

In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Sexual Assault Prevention and Response Climate DEOCS 4.1 Construct Validity Summary

DTIC Science & Technology

2017-08-01

DEOCS, (7) examining variance and descriptive statistics (8) examining the relationship among items/areas to reduce multicollinearity, and (9...selecting items that demonstrate the strongest scale properties. Included is a review of the 4.0 description and items, followed by the proposed...Tables 1 – 7 for the description of each measure and corresponding items. Table 1. DEOCS 4.0 Perceptions of Safety Measure Description
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

NASA Astrophysics Data System (ADS)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

PubMed Central

Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

2014-01-01

Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
The Assignment of Raters to Items: Controlling for Rater Effects.

ERIC Educational Resources Information Center

Sykes, Robert C.; Heidorn, Mark; Lee, Guemin

A study was conducted to evaluate the effect of different modes (modalities) of assigning raters to test items. The impact on total constructed response (c.r.) score, and subsequently on total test score, of assigning a single versus multiple raters to an examination reading of a student's set of c.r. responses was evaluated for several mixed-item…
Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

PubMed Central

Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

2011-01-01

Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Test-retest reliability and construct validity of the ENERGY-child questionnaire on energy balance-related behaviours and their potential determinants: the ENERGY-project.

PubMed

Singh, Amika S; Vik, Froydis N; Chinapaw, Mai J M; Uijtdewilligen, Léonie; Verloigne, Maïté; Fernández-Alvira, Juan M; Stomfai, Sarolta; Manios, Yannis; Martens, Marloes; Brug, Johannes

2011-12-09

Insight in children's energy balance-related behaviours (EBRBs) and their determinants is important to inform obesity prevention research. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. To examine the test-retest reliability and construct validity of the child questionnaire used in the ENERGY-project, measuring EBRBs and their potential determinants among 10-12 year old children. We collected data among 10-12 year old children (n = 730 in the test-retest reliability study; n = 96 in the construct validity study) in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent face-to-face interview was assessed using ICC and percentage agreement. Of the 150 questionnaire items, 115 (77%) showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Test-retest reliability was moderate for 34 items (23%) and poor for one item. Construct validity appeared to be good to excellent for 70 (47%) of the 150 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 80 items, construct validity was moderate for 39 (26%) and poor for 41 items (27%). Our results demonstrate that the ENERGY-child questionnaire, assessing EBRBs of the child as well as personal, family, and school-environmental determinants related to these EBRBs, has good test-retest reliability and moderate to good construct validity for the large majority of items.

Test-retest reliability and construct validity of the ENERGY-child questionnaire on energy balance-related behaviours and their potential determinants: the ENERGY-project

PubMed Central

2011-01-01

Background Insight in children's energy balance-related behaviours (EBRBs) and their determinants is important to inform obesity prevention research. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. Objective To examine the test-retest reliability and construct validity of the child questionnaire used in the ENERGY-project, measuring EBRBs and their potential determinants among 10-12 year old children. Methods We collected data among 10-12 year old children (n = 730 in the test-retest reliability study; n = 96 in the construct validity study) in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent face-to-face interview was assessed using ICC and percentage agreement. Results Of the 150 questionnaire items, 115 (77%) showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Test-retest reliability was moderate for 34 items (23%) and poor for one item. Construct validity appeared to be good to excellent for 70 (47%) of the 150 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 80 items, construct validity was moderate for 39 (26%) and poor for 41 items (27%). Conclusions Our results demonstrate that the ENERGY-child questionnaire, assessing EBRBs of the child as well as personal, family, and school-environmental determinants related to these EBRBs, has good test-retest reliability and moderate to good construct validity for the large majority of items. PMID:22152048
Cross-cultural adaptation and construct validity of the Korean version of a physical activity measure for community-dwelling elderly.

PubMed

Choi, Bongsam

2018-01-01

[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
Development of a tool to assess adherence to a model of the division of responsibility in feeding young children: using response mapping to capacitate validation measures.

PubMed

Lohse, Barbara; Satter, Ellyn; Arnold, Kristen

2014-04-01

Accurate early assessment and targeted intervention with problematic parent/child feeding dynamics is critical for the prevention and treatment of child obesity. The division of responsibility in feeding (sDOR), articulated by the Satter Feeding Dynamics Model (fdSatter), has been demonstrated clinically as an effective approach to reduce child feeding problems, including those leading to obesity. Lack of a tested instrument to examine adherence to fdSatter stimulated initial construction of the Satter Feeding Dynamics Inventory (fdSI). The aim of this project was to refine the item pool to establish translational validity, making the fdSI suitable for advanced psychometric analysis. Cognitive interviews (n = 80) with caregivers of varied socioeconomic strata informed revisions that demonstrated face and content validity. fdSI responses were mapped to interviews using an iterative, multi-phase thematic approach to provide an instrument ready for construct validation. fdSI development required five interview phases over 32 months: Foundational; Refinement; Transitional; Assurance; and Launching. Each phase was associated with item reduction and revision. Thirteen items were removed from the 38-item Foundational phase and seven were revised in the Refinement phase. Revisions, deletions, and additions prompted by Transitional and Assurance phase interviews resulted in the 15-item Launching phase fdSI. Only one Foundational phase item was carried through all development phases, emphasizing the need to test for item comprehension and interpretation before psychometric analyses. Psychometric studies of item pools without encrypted meanings will facilitate progress toward a tool that accurately detects adherence to sDOR. Ability to measure sDOR will facilitate focus on feeding behaviors associated with reduced risk of childhood obesity.
Item response theory - A first approach

NASA Astrophysics Data System (ADS)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
On the Equivalence of Constructed-Response and Multiple-Choice Tests.

ERIC Educational Resources Information Center

Traub, Ross E.; Fisher, Charles W.

Two sets of mathematical reasoning and two sets of verbal comprehension items were cast into each of three formats--constructed response, standard multiple-choice, and Coombs multiple-choice--in order to assess whether tests with indentical content but different formats measure the same attribute, except for possible differences in error variance…
Slower is not always better: Response-time evidence clarifies the limited role of miserly information processing in the Cognitive Reflection Test

PubMed Central

Pitchford, Melanie; Ball, Linden J.; Hunt, Thomas E.; Steel, Richard

2017-01-01

We report a study examining the role of ‘cognitive miserliness’ as a determinant of poor performance on the standard three-item Cognitive Reflection Test (CRT). The cognitive miserliness hypothesis proposes that people often respond incorrectly on CRT items because of an unwillingness to go beyond default, heuristic processing and invest time and effort in analytic, reflective processing. Our analysis (N = 391) focused on people’s response times to CRT items to determine whether predicted associations are evident between miserly thinking and the generation of incorrect, intuitive answers. Evidence indicated only a weak correlation between CRT response times and accuracy. Item-level analyses also failed to demonstrate predicted response-time differences between correct analytic and incorrect intuitive answers for two of the three CRT items. We question whether participants who give incorrect intuitive answers on the CRT can legitimately be termed cognitive misers and whether the three CRT items measure the same general construct. PMID:29099840
Development of a simple 12-item theory-based instrument to assess the impact of continuing professional development on clinical behavioral intentions.

PubMed

Légaré, France; Borduas, Francine; Freitas, Adriana; Jacques, André; Godin, Gaston; Luconi, Francesca; Grimshaw, Jeremy

2014-01-01

Decision-makers in organizations providing continuing professional development (CPD) have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions. Our multipronged study had four phases. 1) We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2) A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3) An international group of experts (n = 70) reached consensus on the most relevant items using electronic Delphi surveys. 4) We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa) over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85. A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral intentions showed adequate validity and reliability. Further studies could assess its responsiveness to behavior change following CPD activities and its capacity to predict health professionals' clinical performance.
Development of a Simple 12-Item Theory-Based Instrument to Assess the Impact of Continuing Professional Development on Clinical Behavioral Intentions

PubMed Central

Légaré, France; Borduas, Francine; Freitas, Adriana; Jacques, André; Godin, Gaston; Luconi, Francesca; Grimshaw, Jeremy

2014-01-01

Background Decision-makers in organizations providing continuing professional development (CPD) have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions. Methods and Findings Our multipronged study had four phases. 1) We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2) A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3) An international group of experts (n = 70) reached consensus on the most relevant items using electronic Delphi surveys. 4) We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa) over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85. Conclusion A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral intentions showed adequate validity and reliability. Further studies could assess its responsiveness to behavior change following CPD activities and its capacity to predict health professionals' clinical performance. PMID:24643173
Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms

ERIC Educational Resources Information Center

Debeer, Dries; Ali, Usama S.; van Rijn, Peter W.

2017-01-01

Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
Assessment of Computer and Information Literacy in ICILS 2013: Do Different Item Types Measure the Same Construct?

ERIC Educational Resources Information Center

Ihme, Jan Marten; Senkbeil, Martin; Goldhammer, Frank; Gerick, Julia

2017-01-01

The combination of different item formats is found quite often in large scale assessments, and analyses on the dimensionality often indicate multi-dimensionality of tests regarding the task format. In ICILS 2013, three different item types (information-based response tasks, simulation tasks, and authoring tasks) were used to measure computer and…
Comparing Science Achievement Constructs: Targeted and Achieved

ERIC Educational Resources Information Center

Ferrara, Steve; Duncan, Teresa

2011-01-01

This article illustrates how test specifications based solely on academic content standards, without attention to other cognitive skills and item response demands, can fall short of their targeted constructs. First, the authors inductively describe the science achievement construct represented by a statewide sixth-grade science proficiency test.…
Multivariate Generalizability Analysis of Automated Scoring for Short Answer Items of Social Studies in Large-Scale Assessment

ERIC Educational Resources Information Center

Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee

2017-01-01

With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

ERIC Educational Resources Information Center

Wang, Zhen; Yao, Lihua

2013-01-01

The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…
Psychometric properties of the Epworth Sleepiness Scale: A factor analysis and item-response theory approach.

PubMed

Pilcher, June J; Switzer, Fred S; Munc, Alec; Donnelly, Janet; Jellen, Julia C; Lamm, Claus

2018-04-01

The purpose of this study is to examine the psychometric properties of the Epworth Sleepiness Scale (ESS) in two languages, German and English. Students from a university in Austria (N = 292; 55 males; mean age = 18.71 ± 1.71 years; 237 females; mean age = 18.24 ± 0.88 years) and a university in the US (N = 329; 128 males; mean age = 18.71 ± 0.88 years; 201 females; mean age = 21.59 ± 2.27 years) completed the ESS. An exploratory-factor analysis was completed to examine dimensionality of the ESS. Item response theory (IRT) analyses were used to provide information about the response rates on the items on the ESS and provide differential item functioning (DIF) analyses to examine whether the items were interpreted differently between the two languages. The factor analyses suggest that the ESS measures two distinct sleepiness constructs. These constructs indicate that the ESS is probing sleepiness in settings requiring active versus passive responding. The IRT analyses found that overall, the items on the ESS perform well as a measure of sleepiness. However, Item 8 and to a lesser extent Item 6 were being interpreted differently by respondents in comparison to the other items. In addition, the DIF analyses showed that the responses between German and English were very similar indicating that there are only minor measurement differences between the two language versions of the ESS. These findings suggest that the ESS provides a reliable measure of propensity to sleepiness; however, it does convey a two-factor approach to sleepiness. Researchers and clinicians can use the German and English versions of the ESS but may wish to exclude Item 8 when calculating a total sleepiness score.
Sexual Harassment DEOCS 4.1 Construct Validity Summary

DTIC Science & Technology

2017-08-01

These items were modified to provide additional clarity regarding chain of command actions and response in the final survey . ** These items were...modified to provide additional clarity regarding indivduals from the respondent’s workplace in the final survey . 4 Conclusion The revised sexual
Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

PubMed

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Application of Item Response Theory to Tests of Substance-related Associative Memory

PubMed Central

Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

2015-01-01

A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Developing Multiple Choice Tests: Tips & Techniques

ERIC Educational Resources Information Center

McCowan, Richard J.

1999-01-01

Item writing is a major responsibility of trainers. Too often, qualified staff who prepare lessons carefully and teach conscientiously use inadequate tests that do not validly reflect the true level of trainee achievement. This monograph describes techniques for constructing multiple-choice items that measure student performance accurately. It…
Using Automated Essay Scores as an Anchor When Equating Constructed Response Writing Tests

ERIC Educational Resources Information Center

Almond, Russell G.

2014-01-01

Assessments consisting of only a few extended constructed response items (essays) are not typically equated using anchor test designs as there are typically too few essay prompts in each form to allow for meaningful equating. This article explores the idea that output from an automated scoring program designed to measure writing fluency (a common…
A Hierarchical Rater Model for Constructed Responses, with a Signal Detection Rater Model

ERIC Educational Resources Information Center

DeCarlo, Lawrence T.; Kim, YoungKoung; Johnson, Matthew S.

2011-01-01

The hierarchical rater model (HRM) recognizes the hierarchical structure of data that arises when raters score constructed response items. In this approach, raters' scores are not viewed as being direct indicators of examinee proficiency but rather as indicators of essay quality; the (latent categorical) quality of an examinee's essay in turn…

Automatic Short Essay Scoring Using Natural Language Processing to Extract Semantic Information in the Form of Propositions. CRESST Report 831

ERIC Educational Resources Information Center

Kerr, Deirdre; Mousavi, Hamid; Iseli, Markus R.

2013-01-01

The Common Core assessments emphasize short essay constructed-response items over multiple-choice items because they are more precise measures of understanding. However, such items are too costly and time consuming to be used in national assessments unless a way to score them automatically can be found. Current automatic essay-scoring techniques…
Automatic Generation of Rasch-Calibrated Items: Figural Matrices Test GEOM and Endless-Loops Test EC

ERIC Educational Resources Information Center

Arendasy, Martin

2005-01-01

The future of test construction for certain psychological ability domains that can be analyzed well in a structured manner may lie--at the very least for reasons of test security--in the field of automatic item generation. In this context, a question that has not been explicitly addressed is whether it is possible to embed an item response theory…
Item Banks for Measuring Emotional Distress From the Patient-Reported Outcomes Measurement Information System (PROMIS®): Depression, Anxiety, and Anger

PubMed Central

Pilkonis, Paul A.; Choi, Seung W.; Reise, Steven P.; Stover, Angela M.; Riley, William T.; Cella, David

2011-01-01

The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately −1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items. PMID:21697139
Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger.

PubMed

Pilkonis, Paul A; Choi, Seung W; Reise, Steven P; Stover, Angela M; Riley, William T; Cella, David

2011-09-01

The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately -1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items.
Harmonizing Measures of Cognitive Performance Across International Surveys of Aging Using Item Response Theory.

PubMed

Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D

2015-12-01

To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
The multi-faceted assessment of independence in patients with rheumatoid arthritis: preliminary validation from the ATTAIN study.

PubMed

Hassett, Afton L; Li, Tracy; Buyske, Steven; Savage, Shantal V; Gignac, Monique A M

2008-05-01

To consider the feasibility of assessing multiple facets of independence in rheumatoid arthritis (RA) using a measure developed from existing items and examining its face validity, construct validity and responsiveness to change. The ATTAIN (Abatacept Trial in Treatment of Anti-tumor necrosis factor [TNF] Inadequate responders) database was used. Patients with RA were randomized 2:1, abatacept (n = 258) and placebo (n = 133). A multi-faceted scale to measure physical and psychosocial independence was constructed using items from the Health Assessment Questionnaire (HAQ) and Short Form 36 Health Survey (SF-36). Questions assessing activity limitations and need for outside caregiver help were also examined. Interviews with 20 RA patients assessed face validity. Item Response Theory analysis yielded two traits - 'Psychosocial Independence', derived from the number of days with activity limitations plus the Role Emotional, Social Functioning and Role Physical subscale items from the SF-36; and 'Physical Independence', derived from 15 HAQ items assessing need for help from another. The two traits showed no significant differential item functioning for age or gender and demonstrated good face validity. Changes over 169 days on Psychosocial Independence were greater (mean 0.46 units, 95% confidence interval [CI]: 0.17-0.75) for the abatacept group than for placebo (p = 0.002). Changes in Physical Independence were greater (mean 0.59 units, 95% CI: 0.35-0.82) for the abatacept group than for placebo (p < 0.001). The multi-faceted assessment of independence in RA based on items from commonly used instruments is feasible suggesting promise for evaluating independence in future clinical trials. This approach demonstrated good face and construct validity and responsiveness in RA patients who had previously failed anti-TNF therapy. However, we caution against an interpretation that these data suggest that abatacept improves independence because the component parts of this assessment came from instruments used in the ATTAIN trial where data had been previously analyzed.
The construct validity of the Major Depression Inventory: A Rasch analysis of a self-rating scale in primary care.

PubMed

Nielsen, Marie Germund; Ørnbøl, Eva; Vestergaard, Mogens; Bech, Per; Christensen, Kaj Sparle

2017-06-01

We aimed to assess the measurement properties of the ten-item Major Depression Inventory when used on clinical suspicion in general practice by performing a Rasch analysis. General practitioners asked consecutive persons to respond to the web-based Major Depression Inventory on clinical suspicion of depression. We included 22 practices and 245 persons. Rasch analysis was performed using RUMM2030 software. The Rasch model fit suggests that all items contribute to a single underlying trait (defined as internal construct validity). Mokken analysis was used to test dimensionality and scalability. Our Rasch analysis showed misfit concerning the sleep and appetite items (items 9 and 10). The response categories were disordered for eight items. After modifying the original six-point to a four-point scoring system for all items, we achieved ordered response categories for all ten items. The person separation reliability was acceptable (0.82) for the initial model. Dimensionality testing did not support combining the ten items to create a total score. The scale appeared to be well targeted to this clinical sample. No significant differential item functioning was observed for gender, age, work status and education. The Rasch and Mokken analyses revealed two dimensions, but the Major Depression Inventory showed fit to one scale if items 9 and 10 were excluded. Our study indicated scalability problems in the current version of the Major Depression Inventory. The conducted analysis revealed better statistical fit when items 9 and 10 were excluded. Copyright © 2017 Elsevier Inc. All rights reserved.
Reliability and validity evidence of the Assessment of Language Use in Social Contexts for Adults (ALUSCA).

PubMed

Valente, Ana Rita S; Hall, Andreia; Alvelos, Helena; Leahy, Margaret; Jesus, Luis M T

2018-04-12

The appropriate use of language in context depends on the speaker's pragmatic language competencies. A coding system was used to develop a specific and adult-focused self-administered questionnaire to adults who stutter and adults who do not stutter, The Assessment of Language Use in Social Contexts for Adults, with three categories: precursors, basic exchanges, and extended literal/non-literal discourse. This paper presents the content validity, item analysis, reliability coefficients and evidences of construct validity of the instrument. Content validity analysis was based on a two-stage process: first, 11 pragmatic questionnaires were assessed to identify items that probe each pragmatic competency and to create the first version of the instrument; second, items were assessed qualitatively by an expert panel composed by adults who stutter and controls, and quantitatively and qualitatively by an expert panel composed by clinicians. A pilot study was conducted with five adults who stutter and five controls to analyse items and calculate reliability. Construct validity evidences were obtained using the hypothesized relationships method and factor analysis with 28 adults who stutter and 28 controls. Concerning content validity, the questionnaires assessed up to 13 pragmatic competencies. Qualitative and quantitative analysis revealed ambiguities in items construction. Disagreement between experts was solved through item modification. The pilot study showed that the instrument presented internal consistency and temporal stability. Significant differences between adults who stutter and controls and different response profiles revealed the instrument's underlying construct. The instrument is reliable and presented evidences of construct validity.
Development of the multiple sclerosis (MS) early mobility impairment questionnaire (EMIQ).

PubMed

Ziemssen, Tjalf; Phillips, Glenn; Shah, Ruchit; Mathias, Adam; Foley, Catherine; Coon, Cheryl; Sen, Rohini; Lee, Andrew; Agarwal, Sonalee

2016-10-01

The Early Mobility Impairment Questionnaire (EMIQ) was developed to facilitate early identification of mobility impairments in multiple sclerosis (MS) patients. We describe the initial development of the EMIQ with a focus on the psychometric evaluation of the questionnaire using classical and item response theory methods. The initial 20-item EMIQ was constructed by clinical specialists and qualitatively tested among people with MS and physicians via cognitive interviews. Data from an observational study was used to make additional updates to the instrument based on exploratory factor analysis (EFA) and item response theory (IRT) analysis, and psychometric analyses were performed to evaluate the reliability and validity of the final instrument's scores and screening properties (i.e., sensitivity and specificity). Based on qualitative interview analyses, a revised 15-item EMIQ was included in the observational study. EFA, IRT and item-to-item correlation analyses revealed redundant items which were removed leading to the final nine-item EMIQ. The nine-item EMIQ performed well with respect to: test-retest reliability (ICC = 0.858); internal consistency (α = 0.893); convergent validity; and known-groups methods for construct validity. A cut-point of 41 on the 0-to-100 scale resulted in sufficient sensitivity and specificity statistics for viably identifying patients with mobility impairment. The EMIQ is a content valid and psychometrically sound instrument for capturing MS patients' experience with mobility impairments in a clinical practice setting. Additional research is suggested to further confirm the EMIQ's screening properties over time.
Differential item functional analysis on pedagogic and content knowledge (PCK) questionnaire for Indonesian teachers using RASCH model

NASA Astrophysics Data System (ADS)

Rahmani, B. D.

2018-01-01

The purpose of this paper is to evaluate Indonesian senior high school teacher’s pedagogical content knowledge also their perception toward curriculum changing in West Java Indonesia. The data used in this study were derived from a questionnaire survey conducted among teachers in Bandung, West Java. A total of 61 usable responses were collected. The Differential Item Functioning (DIFF) was used to analyze the data whether the item had a difference or not toward gender, education background also on school location. However, the result showed that there was no any significant difference on gender and school location toward the item response but educational background. As a conclusion, the teacher’s educational background influence on giving the response to the questionnaire. Therefore, it is suggested in the future to construct the items on the questionnaire which is coped the differences of the participant particularly the educational background.
Cognitive Diagnostic Models for Tests with Multiple-Choice and Constructed-Response Items

ERIC Educational Resources Information Center

Kuo, Bor-Chen; Chen, Chun-Hua; Yang, Chih-Wei; Mok, Magdalena Mo Ching

2016-01-01

Traditionally, teachers evaluate students' abilities via their total test scores. Recently, cognitive diagnostic models (CDMs) have begun to provide information about the presence or absence of students' skills or misconceptions. Nevertheless, CDMs are typically applied to tests with multiple-choice (MC) items, which provide less diagnostic…
Improving measurement of injection drug risk behavior using item response theory.

PubMed

Janulis, Patrick

2014-03-01

Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
Vegetable parenting practices scale. Item response modeling analyses

PubMed Central

Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom

2015-01-01

Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
An item response theory analysis of the Executive Interview and development of the EXIT8: A Project FRONTIER Study.

PubMed

Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E

2015-01-01

The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.
Using item response theory to address vulnerabilities in FFQ.

PubMed

Kazman, Josh B; Scott, Jonathan M; Deuster, Patricia A

2017-09-01

The limitations for self-reporting of dietary patterns are widely recognised as a major vulnerability of FFQ and the dietary screeners/scales derived from FFQ. Such instruments can yield inconsistent results to produce questionable interpretations. The present article discusses the value of psychometric approaches and standards in addressing these drawbacks for instruments used to estimate dietary habits and nutrient intake. We argue that a FFQ or screener that treats diet as a 'latent construct' can be optimised for both internal consistency and the value of the research results. Latent constructs, a foundation for item response theory (IRT)-based scales (e.g. Patient Reported Outcomes Measurement Information System) are typically introduced in the design stage of an instrument to elicit critical factors that cannot be observed or measured directly. We propose an iterative approach that uses such modelling to refine FFQ and similar instruments. To that end, we illustrate the benefits of psychometric modelling by using items and data from a sample of 12 370 Soldiers who completed the 2012 US Army Global Assessment Tool (GAT). We used factor analysis to build the scale incorporating five out of eleven survey items. An IRT-driven assessment of response category properties indicates likely problems in the ordering or wording of several response categories. Group comparisons, examined with differential item functioning (DIF), provided evidence of scale validity across each Army sub-population (sex, service component and officer status). Such an approach holds promise for future FFQ.
The Health Education Impact Questionnaire (heiQ): an outcomes and evaluation measure for patient education and self-management interventions for people with chronic conditions.

PubMed

Osborne, Richard H; Elsworth, Gerald R; Whitfield, Kathryn

2007-05-01

This paper describes the development and validation of the Health Education Impact Questionnaire (heiQ). The aim was to develop a user-friendly, relevant, and psychometrically sound instrument for the comprehensive evaluation of patient education programs, which can be applied across a broad range of chronic conditions. Item development for the heiQ was guided by a Program Logic Model, Concept Mapping, interviews with stakeholders and psychometric analyses. Construction (N=591) and confirmatory (N=598) samples were drawn from consumers of patient education programs and hospital outpatients. The properties of the heiQ were investigated using item response theory and structural equation modeling. Over 90 candidate items were generated, with 42 items selected for inclusion in the final scale. Eight independent dimensions were derived: Positive and Active Engagement in Life (five items, Cronbach's alpha (alpha)=0.86); Health Directed Behavior (four items, alpha=0.80); Skill and Technique Acquisition (five items, alpha=0.81); Constructive Attitudes and Approaches (five items, alpha=0.81); Self-Monitoring and Insight (seven items, alpha=0.70); Health Service Navigation (five items, alpha=0.82); Social Integration and Support (five items, alpha=0.86); and Emotional Wellbeing (six items, alpha=0.89). The heiQ has high construct validity and is a reliable measure of a broad range of patient education program benefits. The heiQ will provide valuable information to clinicians, researchers, policymakers and other stakeholders about the value of patient education programs in chronic disease management.
Analysis of the construct of dignity and content validity of the patient dignity inventory

PubMed Central

2011-01-01

Background Maintaining dignity, the quality of being worthy of esteem or respect, is considered as a goal of palliative care. The aim of this study was to analyse the construct of personal dignity and to assess the content validity of the Patient Dignity Inventory (PDI) in people with an advance directive in the Netherlands. Methods Data were collected within the framework of an advance directives cohort study. This cohort study is aiming to get a better insight into how decisions are made at the end of life with regard to advance directives in the Netherlands. One half of the cohort (n = 2404) received an open-ended question concerning factors relevant to dignity. Content labels were assigned to issues mentioned in the responses to the open-ended question. The other half of the cohort (n = 2537) received a written questionnaire including the PDI. The relevance and comprehensiveness of the PDI items were assessed with the COSMIN checklist ('COnsensus-based Standards for the selection of health status Measurement INstruments'). Results The majority of the PDI items were found to be relevant for the construct to be measured, the study population, and the purpose of the study but the items were not completely comprehensive. The responses to the open-ended question indicated that communication and care-related aspects were also important for dignity. Conclusions This study demonstrated that the PDI items were relevant for people with an advance directive in the Netherlands. The comprehensiveness of the items can be improved by including items concerning communication and care. PMID:21682924
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods

PubMed Central

Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-01-01

Background The eHealth Literacy Scale (eHEALS) is a tool to assess consumers’ comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. Objective The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Methods Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. Results CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. Conclusions The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers’ eHealth literacy. PMID:28400356
Analysis of the construct of dignity and content validity of the patient dignity inventory.

PubMed

Albers, Gwenda; Pasman, H Roeline W; Rurup, Mette L; de Vet, Henrica C W; Onwuteaka-Philipsen, Bregje D

2011-06-19

Maintaining dignity, the quality of being worthy of esteem or respect, is considered as a goal of palliative care. The aim of this study was to analyse the construct of personal dignity and to assess the content validity of the Patient Dignity Inventory (PDI) in people with an advance directive in the Netherlands. Data were collected within the framework of an advance directives cohort study. This cohort study is aiming to get a better insight into how decisions are made at the end of life with regard to advance directives in the Netherlands. One half of the cohort (n = 2404) received an open-ended question concerning factors relevant to dignity. Content labels were assigned to issues mentioned in the responses to the open-ended question. The other half of the cohort (n = 2537) received a written questionnaire including the PDI. The relevance and comprehensiveness of the PDI items were assessed with the COSMIN checklist ('COnsensus-based Standards for the selection of health status Measurement INstruments'). The majority of the PDI items were found to be relevant for the construct to be measured, the study population, and the purpose of the study but the items were not completely comprehensive. The responses to the open-ended question indicated that communication and care-related aspects were also important for dignity. This study demonstrated that the PDI items were relevant for people with an advance directive in the Netherlands. The comprehensiveness of the items can be improved by including items concerning communication and care.
Emotional Intelligence and Nurse Recruitment: Rasch and confirmatory factor analysis of the trait emotional intelligence questionnaire short form.

PubMed

Snowden, Austyn; Watson, Roger; Stenhouse, Rosie; Hale, Claire

2015-12-01

To examine the construct validity of the Trait Emotional Intelligence Questionnaire Short form. Emotional intelligence involves the identification and regulation of our own emotions and the emotions of others. It is therefore a potentially useful construct in the investigation of recruitment and retention in nursing and many questionnaires have been constructed to measure it. Secondary analysis of existing dataset of responses to Trait Emotional Intelligence Questionnaire Short form using concurrent application of Rasch analysis and confirmatory factor analysis. First year undergraduate nursing and computing students completed Trait Emotional Intelligence Questionnaire-Short Form in September 2013. Responses were analysed by synthesising results of Rasch analysis and confirmatory factor analysis. Participants (N = 938) completed Trait Emotional Intelligence Questionnaire Short form. Rasch analysis showed the majority of the Trait Emotional Intelligence Questionnaire-Short Form items made a unique contribution to the latent trait of emotional intelligence. Five items did not fit the model and differential item functioning (gender) accounted for this misfit. Confirmatory factor analysis revealed a four-factor structure consisting of: self-confidence, empathy, uncertainty and social connection. All five misfitting items from the Rasch analysis belonged to the 'social connection' factor. The concurrent use of Rasch and factor analysis allowed for novel interpretation of Trait Emotional Intelligence Questionnaire Short form. Much of the response variation in Trait Emotional Intelligence Questionnaire Short form can be accounted for by the social connection factor. Implications for practice are discussed. © 2015 John Wiley & Sons Ltd.

Equal Area Logistic Estimation for Item Response Theory

NASA Astrophysics Data System (ADS)

Lo, Shih-Ching; Wang, Kuo-Chang; Chang, Hsin-Li

2009-08-01

Item response theory (IRT) models use logistic functions exclusively as item response functions (IRFs). Applications of IRT models require obtaining the set of values for logistic function parameters that best fit an empirical data set. However, success in obtaining such set of values does not guarantee that the constructs they represent actually exist, for the adequacy of a model is not sustained by the possibility of estimating parameters. In this study, an equal area based two-parameter logistic model estimation algorithm is proposed. Two theorems are given to prove that the results of the algorithm are equivalent to the results of fitting data by logistic model. Numerical results are presented to show the stability and accuracy of the algorithm.
Construct Validity of the Spanish Versions of the Memorial Symptom Assessment Scale Short Form and Condensed Form: Rasch Analysis of Responses in Oncology Outpatients.

PubMed

Llamas-Ramos, Inés; Llamas-Ramos, Rocío; Buz, José; Cortés-Rodríguez, María; Martín-Nogueras, Ana María

2018-06-01

The Memorial Symptom Assessment Scale (MSAS) is a self-rating instrument for the assessment of symptom distress in cancer patients. The Spanish version of the MSAS has recently been validated. However, we lack evidence of the internal construct validity of the shorter versions (short form [MSAS-SF] and condensed form [CMSAS]). In addition, rigorous testing of these scales with modern psychometric methods is needed. The aim of this study was to evaluate the internal construct validity and reliability of the Spanish versions of the MSAS-SF and CMSAS in oncology outpatients using Rasch analysis. Data from a convenience sample of oncology outpatients receiving chemotherapy (n = 306; mean age 60 years; 63% women) at a university hospital were analyzed. The Rasch unidimensional measurement model was used to examine response category functioning, item hierarchy, targeting, unidimensionality, reliability, and differential item functioning by age, gender, and marital status. The response category structure of the symptom distress items was improved by collapsing two categories. The scales were adequately targeted to the study patients, showed overall Rasch model fit (mean Infit MnSq ranged from 0.98 to 1.05), met criteria for unidimensionality, and the reliability of scores was good (person reliability > 0.80), except for the CMSAS prevalence scale. Only four items showed differential item functioning. The present study demonstrated that the Spanish versions of the MSAS-SF and CMSAS have adequate psychometric properties to evaluate symptom distress in oncology outpatients. Additional studies of the CMSAS are recommended. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Item bank development, calibration and validation for patient-reported outcomes in female urinary incontinence

PubMed Central

Sung, Vivian W.; Griffith, James W.; Rogers, Rebecca G.; Raker, Christina A.; Clark, Melissa A.

2016-01-01

Purpose Current patient-reported outcomes for female urinary incontinence (UI) are limited by their inability to be tailored. Our objective is to describe the development and field-testing of 7 item banks designed to measure domains identified as important UI in females (UIf). We also describe the calibration and validation properties of the UIf-item banks, which allow for more efficient computerized-adaptive testing (CAT) in the future. METHODS The UIf-measures included 168 items covering 7 domains: Stress UI (SUI), Overactive Bladder (OAB), Urinary Frequency, Physical, Social and Emotional Health Impact, and Adaptation. Items underwent rigorous qualitative development and psychometric testing across 2 sites. Items were calibrated using item response theory and evaluated for internal consistency, construct validity and responsiveness. RESULTS 750 women (249 SUI, 249 OAB, and 252 mixed UI) participated. Mean age was 55±14 years ,23% were Hispanic, 80% white. In addition to face and content validity, the measures demonstrated good internal consistency (coefficient alpha 0.92-0.98) and unidimensionality. There was evidence for construct validity with moderate to strong correlations with the UDI (r’s ≥ 0.6) and IIQ (r’s = ≥ 0.6) scales. The measures were responsive to change for SUI treatment (paired t-test p <.001, ES range=1.3 to 2.9; SRM range=1.3 to 2.5) and OAB treatment (paired t-test p <.05 for all domains except Social Health Impact and Adaptation, ES range=.3 to 1.5, SRM range=0.4 to 1.0). The measures were responsive based on concurrent changes with the UDI and IIQ (p < 0.05). CAT versions were developed and pilot tested. CONCLUSIONS The UIf-item banks demonstrate good psychometric characteristics and are a sufficiently valid set of customizable tools for measuring UI symptoms and life impact. PMID:26732514
Diagnostic Opportunities Using Rasch Measurement in the Context of a Misconceptions-Based Physical Science Assessment

ERIC Educational Resources Information Center

Wind, Stefanie A.; Gale, Jessica D.

2015-01-01

Multiple-choice (MC) items that are constructed such that distractors target known misconceptions for a particular domain provide useful diagnostic information about student misconceptions (Herrmann-Abell & DeBoer, 2011, 2014; Sadler, 1998). Item response theory models can be used to examine misconceptions distractor-driven multiple-choice…
Using Hospital Anxiety and Depression Scale (HADS) on patients with epilepsy: Confirmatory factor analysis and Rasch models.

PubMed

Lin, Chung-Ying; Pakpour, Amir H

2017-02-01

The problems of mood disorders are critical in people with epilepsy. Therefore, there is a need to validate a useful tool for the population. The Hospital Anxiety and Depression Scale (HADS) has been used on the population, and showed that it is a satisfactory screening tool. However, more evidence on its construct validity is needed. A total of 1041 people with epilepsy were recruited in this study, and each completed the HADS. Confirmatory factor analysis (CFA) and Rasch analysis were used to understand the construct validity of the HADS. In addition, internal consistency was tested using Cronbachs' α, person separation reliability, and item separation reliability. Ordering of the response descriptors and the differential item functioning (DIF) were examined using the Rasch models. The HADS showed that 55.3% of our participants had anxiety; 56.0% had depression based on its cutoffs. CFA and Rasch analyses both showed the satisfactory construct validity of the HADS; the internal consistency was also acceptable (α=0.82 in anxiety and 0.79 in depression; person separation reliability=0.82 in anxiety and 0.73 in depression; item separation reliability=0.98 in anxiety and 0.91 in depression). The difficulties of the four-point Likert scale used in the HADS were monotonically increased, which indicates no disordering response categories. No DIF items across male and female patients and across types of epilepsy were displayed in the HADS. The HADS has promising psychometric properties on construct validity in people with epilepsy. Moreover, the additive item score is supported for calculating the cutoff. Copyright © 2016 British Epilepsy Association. Published by Elsevier Ltd. All rights reserved.
The development of an instrument to assess chemistry perceptions

NASA Astrophysics Data System (ADS)

Wells, Raymond R.

The instrument, developed in this study, attempted to correct the deficiencies of previous instruments. Statements of belief and opinion can be validly included under the construct of chemistry perceptions. Further, statements that might be better characterized as science attitudes, math attitudes, or attitudes toward a specific course or program were not included. Eliminating statements of math anxiety and test anxiety insured that responses to statements of anxiety were perceptions of anxiety solely related to chemistry. The results of the expert judges' responses to the Validation of Proposed Perception Statements forms were detailed to establish construct and content validity. The nature of Likert scale construction and calculation of internal consistency also supported the validity of the instrument. A pilot Chemistry Perception Questionnaire (CPQ) was then constructed based on agreement of the appropriate subscale and mean importance of the perception statements. The pilot CPQ results were subjected to an item analysis based on three sets of statistics: the frequency of each response and the percentage of respondents making each response for each perception statement, the mean and standard deviations for each item, and the item discrimination index which correlated the item scores with the subscale scores. With no zero or negative correlations to the subscale scores, it was not necessary to replace any of the perception statements contained in the pilot instrument. Therefore, the piloted Chemistry Perception Questionnaire became the final instrument. Factor analysis confirmed the multidimensionality of the instrument. The instrument was administered twice with a separation interval of approximately one month in order to perform a test-retest reliability analysis. One hundred and forty-one pairs were matched and results detailed. The correlation between forms, for the total instrument, was 0.9342. The mean coefficient alpha, for the total instrument, was 0.9495. With test-retest correlations and alphas exceeding 0.70 for all seven subscales and the total instrument, it was determined that the Chemistry Perception Questionnaire instrument achieved reasonably high reliability estimations.
Psychometric properties of the neck disability index amongst patients with chronic neck pain using item response theory.

PubMed

Saltychev, Mikhail; Mattie, Ryan; McCormick, Zachary; Laimi, Katri

2017-05-13

The Neck Disability Index (NDI) is commonly used for clinical and research assessment for chronic neck pain, yet the original version of this tool has not undergone significant validity testing, and in particular, there has been minimal assessment using Item Response Theory. The goal of the present study was to investigate the psychometric properties of the original version of the NDI in a large sample of individuals with chronic neck pain by defining its internal consistency, construct structure and validity, and its ability to discriminate between different degrees of functional limitation. This is a cross-sectional cohort study of 585 consecutive patients with chronic neck pain seen in a university hospital rehabilitation clinic. Internal consistency was evaluated using Cronbach's alpha, construct structure was evaluated by exploratory factor analysis, and discrimination ability was determined by Item Response Theory. The NDI demonstrated good internal consistency assessed by Cronbach's alpha (0.87). The exploratory factor analysis identified only one factor with eigenvalue considered significant (cutoff 1.0). When analyzed by Item Response Theory, eight out of 10 items demonstrated almost ideal difficulty parameter estimates. In addition, eight out of 10 items showed high to perfect estimates of discrimination ability (overall range 0.8 to 2.9). Amongst patients with chronic neck pain, the NDI was found to have good internal consistency, have unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. Implications for Rehabilitation The Neck Disability Index has good internal consistency, unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. The Neck Disability Index is recommended for use when selecting patients for rehabilitation, setting rehabilitation goals, and measuring the outcome of intervention.
The Trunk Impairment Scale - modified to ordinal scales in the Norwegian version.

PubMed

Gjelsvik, Bente; Breivik, Kyrre; Verheyden, Geert; Smedal, Tori; Hofstad, Håkon; Strand, Liv Inger

2012-01-01

To translate the Trunk Impairment Scale (TIS), a measure of trunk control in patients after stroke, into Norwegian (TIS-NV), and to explore its construct validity, internal consistency, intertester and test-retest reliability. TIS was translated according to international guidelines. The validity study was performed on data from 201 patients with acute stroke. Fifty patients with stroke and acquired brain injury were recruited to examine intertester and test-retest reliability. Construct validity was analyzed with exploratory and confirmatory factor analysis and item response theory, internal consistency with Cronbach's alpha test, and intertester and test-retest reliability with kappa and intraclass correlation coefficient tests. The back-translated version of TIS-NV was validated by the original developer. The subscale Static sitting balance was removed. By combining items from the subscales Dynamic sitting balance and Coordination, six ordinal superitems (testlets) were constructed. The TIS-NV was renamed the modified TIS-NV (TIS-modNV). After modifications the TIS-modNV fitted well to a locally dependent unidimensional item response theory model. It demonstrated good construct validity, excellent internal consistency, and high intertester and test-retest reliability for the total score. This study supports that the TIS-modNV is a valid and reliable scale for use in clinical practice and research.
The Mindful Attention Awareness Scale: Further Examination of Dimensionality, Reliability, and Concurrent Validity Estimates.

PubMed

Osman, Augustine; Lamis, Dorian A; Bagge, Courtney L; Freedenthal, Stacey; Barnes, Sean M

2016-01-01

We examined the factor structure and psychometric properties of the Mindful Attention Awareness Scale (MAAS) in a sample of 810 undergraduate students. Using common exploratory factor analysis (EFA), we obtained evidence for a 1-factor solution (41.84% common variance). To confirm unidimensionality of the 15-item MAAS, we conducted a 1-factor confirmatory factor analysis (CFA). Results of the EFA and CFA, respectively, provided support for a unidimensional model. Using differential item functioning analysis methods within item response theory modeling (IRT-based DIF), we found that individuals with high and low levels of nonattachment responded similarly to the MAAS items. Following a detailed item analysis, we proposed a 5-item short version of the instrument and present descriptive statistics and composite score reliability for the short and full versions of the MAAS. Finally, correlation analyses showed that scores on the full and short versions of the MAAS were associated with measures assessing related constructs. The 5-item MAAS is as useful as the original MAAS in enhancing our understanding of the mindfulness construct.
Applying Item Response Theory Methods to Examine the Impact of Different Response Formats

ERIC Educational Resources Information Center

Hohensinn, Christine; Kubinger, Klaus D.

2011-01-01

In aptitude and achievement tests, different response formats are usually used. A fundamental distinction must be made between the class of multiple-choice formats and the constructed response formats. Previous studies have examined the impact of different response formats applying traditional statistical approaches, but these influences can also…
Combination of classical test theory (CTT) and item response theory (IRT) analysis to study the psychometric properties of the French version of the Quality of Life Enjoyment and Satisfaction Questionnaire-Short Form (Q-LES-Q-SF).

PubMed

Bourion-Bédès, Stéphanie; Schwan, Raymund; Epstein, Jonathan; Laprevote, Vincent; Bédès, Alex; Bonnet, Jean-Louis; Baumann, Cédric

2015-02-01

The study aimed to examine the construct validity and reliability of the Quality of Life Enjoyment and Satisfaction Questionnaire-Short Form (Q-LES-Q-SF) according to both classical test and item response theories. The psychometric properties of the French version of this instrument were investigated in a cross-sectional, multicenter study. A total of 124 outpatients with a substance dependence diagnosis participated in the study. Psychometric evaluation included descriptive analysis, internal consistency, test-retest reliability, and validity. The dimensionality of the instrument was explored using a combination of the classical test, confirmatory factor analysis (CFA), and an item response theory analysis, the Person Separation Index (PSI), in a complementary manner. The results of the Q-LES-Q-SF revealed that the questionnaire was easy to administer and the acceptability was good. The internal consistency and the test-retest reliability were 0.9 and 0.88, respectively. All items were significantly correlated with the total score and the SF-12 used in the study. The CFA with one factor model was good, and for the unidimensional construct, the PSI was found to be 0.902. The French version of the Q-LES-Q-SF yielded valid and reliable clinical assessments of the quality of life for future research and clinical practice involving French substance abusers. In response to recent questioning regarding the unidimensionality or bidimensionality of the instrument and according to the underlying theoretical unidimensional construct used for its development, this study suggests the Q-LES-Q-SF as a one-dimension questionnaire in French QoL studies.
Negative Mood and Obsessive-Compulsive Related Clinical Constructs: An Examination of Underlying Factors

PubMed Central

Britton, Gary I.; Davey, Graham C. L.

2017-01-01

Emerging evidence suggests that many of the clinical constructs used to help understand and explain obsessive-compulsive (OC) symptoms, and negative mood, may be causally interrelated. One approach to understanding this interrelatedness is a motivational systems approach. This approach suggests that rather than considering clinical constructs and negative affect as separable entities, they are all features of an integrated threat management system, and as such are highly coordinated and interdependent. The aim of the present study was to examine if clinical constructs related to OC symptoms and negative mood are best treated as separable or, alternatively, if these clinical constructs and negative mood are best seen as indicators of an underlying superordinate variable, as would be predicted by a motivational systems approach. A sample of 370 student participants completed measures of mood and the clinical constructs of inflated responsibility, intolerance of uncertainty, not just right experiences, and checking stop rules. An exploratory factor analysis suggested two plausible factor structures, one where all construct items and negative mood items loaded onto one underlying superordinate variable, and a second structure comprising of five factors, where each item loaded onto a factor representative of what the item was originally intended to measure. A confirmatory factor analysis showed that the five factor model was preferential to the one factor model, suggesting the four constructs and negative mood are best conceptualized as separate variables. Given the predictions of a motivational systems approach were not supported in the current study, other possible explanations for the causal interrelatedness between clinical constructs and negative mood are discussed. PMID:28959224
Construction of a memory battery for computerized administration, using item response theory.

PubMed

Ferreira, Aristides I; Almeida, Leandro S; Prieto, Gerardo

2012-10-01

In accordance with Item Response Theory, a computer memory battery with six tests was constructed for use in the Portuguese adult population. A factor analysis was conducted to assess the internal structure of the tests (N = 547 undergraduate students). According to the literature, several confirmatory factor models were evaluated. Results showed better fit of a model with two independent latent variables corresponding to verbal and non-verbal factors, reproducing the initial battery organization. Internal consistency reliability for the six tests were alpha = .72 to .89. IRT analyses (Rasch and partial credit models) yielded good Infit and Outfit measures and high precision for parameter estimation. The potential utility of these memory tasks for psychological research and practice willbe discussed.
Test-retest reliability and construct validity of the ENERGY-parent questionnaire on parenting practices, energy balance-related behaviours and their potential behavioural determinants: the ENERGY-project.

PubMed

Singh, Amika S; Chinapaw, Mai J M; Uijtdewilligen, Léonie; Vik, Froydis N; van Lippevelde, Wendy; Fernández-Alvira, Juan M; Stomfai, Sarolta; Manios, Yannis; van der Sluijs, Maria; Terwee, Caroline; Brug, Johannes

2012-08-13

Insight in parental energy balance-related behaviours, their determinants and parenting practices are important to inform childhood obesity prevention. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. The objective of the current study was to examine the test-retest reliability and construct validity of the parent questionnaire used in the ENERGY-project, assessing parental energy balance-related behaviours, their determinants, and parenting practices among parents of 10-12 year old children. We collected data among parents (n = 316 in the test-retest reliability study; n = 109 in the construct validity study) of 10-12 year-old children in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent interview was assessed using ICC and percentage agreement.All but one item showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Construct validity appeared to be good to excellent for 92 out of 121 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 29 items, construct validity was moderate for 24 and poor for 5 items. The reliability and construct validity of the items of the ENERGY-parent questionnaire on multiple energy balance-related behaviours, their potential determinants, and parenting practices appears to be good. Based on the results of the validity study, we strongly recommend adapting parts of the ENERGY-parent questionnaire if used in future research.
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods.

PubMed

Diviani, Nicola; Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-04-11

The eHealth Literacy Scale (eHEALS) is a tool to assess consumers' comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers' eHealth literacy. ©Nicola Diviani, Alexandra Lelia Dima, Peter Johannes Schulz. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.04.2017.
Construct and Differential Item Functioning in the Assessment of Prescription Opioid Use Disorders among American Adolescents

ERIC Educational Resources Information Center

Wu, Li-Tzy; Ringwalt, Christopher L.; Yang, Chongming; Reeve, Bryce B.; Pan, Jeng-Jong; Blazer, Dan G.

2009-01-01

DSM-IV's hierarchical distinction between abuse of and dependence on prescription opioids is not supported since the symptoms of abuse in adolescents are not less severe than dependence. The finding is based on the examination of the DSM-IV criteria for opioid use disorders using item response theory.
The tinnitus functional index: development of a new clinical measure for chronic, intrusive tinnitus.

PubMed

Meikle, Mary B; Henry, James A; Griest, Susan E; Stewart, Barbara J; Abrams, Harvey B; McArdle, Rachel; Myers, Paula J; Newman, Craig W; Sandridge, Sharon; Turk, Dennis C; Folmer, Robert L; Frederick, Eric J; House, John W; Jacobson, Gary P; Kinney, Sam E; Martin, William H; Nagler, Stephen M; Reich, Gloria E; Searchfield, Grant; Sweetow, Robert; Vernon, Jack A

2012-01-01

Chronic subjective tinnitus is a prevalent condition that causes significant distress to millions of Americans. Effective tinnitus treatments are urgently needed, but evaluating them is hampered by the lack of standardized measures that are validated for both intake assessment and evaluation of treatment outcomes. This work was designed to develop a new self-report questionnaire, the Tinnitus Functional Index (TFI), that would have documented validity both for scaling the severity and negative impact of tinnitus for use in intake assessment and for measuring treatment-related changes in tinnitus (responsiveness) and that would provide comprehensive coverage of multiple tinnitus severity domains. To use preexisting knowledge concerning tinnitus-related problems, an Item Selection Panel (17 expert judges) surveyed the content (175 items) of nine widely used tinnitus questionnaires. From those items, the Panel identified 13 separate domains of tinnitus distress and selected 70 items most likely to be responsive to treatment effects. Eliminating redundant items while retaining good content validity and adding new items to achieve the recommended minimum of 3 to 4 items per domain yielded 43 items, which were then used for constructing TFI Prototype 1.Prototype 1 was tested at five clinics. The 326 participants included consecutive patients receiving tinnitus treatment who provided informed consent-constituting a convenience sample. Construct validity of Prototype 1 as an outcome measure was evaluated by measuring responsiveness of the overall scale and its individual items at 3 and 6 mo follow-up with 65 and 42 participants, respectively. Using a predetermined list of criteria, the 30 best-functioning items were selected for constructing TFI Prototype 2.Prototype 2 was tested at four clinics with 347 participants, including 155 and 86 who provided 3 and 6 mo follow-up data, respectively. Analyses were the same as for Prototype 1. Results were used to select the 25 best-functioning items for the final TFI. Both prototypes and the final TFI displayed strong measurement properties, with few missing data, high validity for scaling of tinnitus severity, and good reliability. All TFI versions exhibited the same eight factors characterizing tinnitus severity and negative impact. Responsiveness, evaluated by computing effect sizes for responses at follow-up, was satisfactory in all TFI versions.In the final TFI, Cronbach's alpha was 0.97 and test-retest reliability 0.78. Convergent validity (r = 0.86 with Tinnitus Handicap Inventory [THI]; r = 0.75 with Visual Analog Scale [VAS]) and discriminant validity (r = 0.56 with Beck Depression Inventory-Primary Care [BDI-PC]) were good. The final TFI was successful at detecting improvement from the initial clinic visit to 3 mo with moderate to large effect sizes and from initial to 6 mo with large effect sizes. Effect sizes for the TFI were generally larger than those obtained for the VAS and THI. After careful evaluation, a 13-point reduction was considered a preliminary criterion for meaningful reduction in TFI outcome scores. The TFI should be useful in both clinical and research settings because of its responsiveness to treatment-related change, validity for scaling the overall severity of tinnitus, and comprehensive coverage of multiple domains of tinnitus severity.
Development of the Sexual Minority Adolescent Stress Inventory

PubMed Central

Schrager, Sheree M.; Goldbach, Jeremy T.; Mamey, Mary Rose

2018-01-01

Although construct measurement is critical to explanatory research and intervention efforts, rigorous measure development remains a notable challenge. For example, though the primary theoretical model for understanding health disparities among sexual minority (e.g., lesbian, gay, bisexual) adolescents is minority stress theory, nearly all published studies of this population rely on minority stress measures with poor psychometric properties and development procedures. In response, we developed the Sexual Minority Adolescent Stress Inventory (SMASI) with N = 346 diverse adolescents ages 14–17, using a comprehensive approach to de novo measure development designed to produce a measure with desirable psychometric properties. After exploratory factor analysis on 102 candidate items informed by a modified Delphi process, we applied item response theory techniques to the remaining 72 items. Discrimination and difficulty parameters and item characteristic curves were estimated overall, within each of 12 initially derived factors, and across demographic subgroups. Two items were removed for excessive discrimination and three were removed following reliability analysis. The measure demonstrated configural and scalar invariance for gender and age; a three-item factor was excluded for demonstrating substantial differences by sexual identity and race/ethnicity. The final 64-item measure comprised 11 subscales and demonstrated excellent overall (α = 0.98), subscale (α range 0.75–0.96), and test–retest (scale r > 0.99; subscale r range 0.89–0.99) reliabilities. Subscales represented a mix of proximal and distal stressors, including domains of internalized homonegativity, identity management, intersectionality, and negative expectancies (proximal) and social marginalization, family rejection, homonegative climate, homonegative communication, negative disclosure experiences, religion, and work domains (distal). Thus, the SMASI development process illustrates a method to incorporate information from multiple sources, including item response theory models, to guide item selection in building a psychometrically sound measure. We posit that similar methods can be used to improve construct measurement across all areas of psychological research, particularly in areas where a strong theoretical framework exists but existing measures are limited. PMID:29599737
Exploratory Item Classification Via Spectral Graph Clustering

PubMed Central

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2017-01-01

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Exploring the Validity of the Affect Balance Scale With a Sample of Family Caregivers

PubMed Central

Perkinson, Margaret A.; Albert, Steven M.; Luborsky, Mark; Moss, Miriam; Glicksman, Allen

2014-01-01

Open-ended responses of caregiving daughters and daughters-in-law were generated by a modified random probe technique to investigate the construct validity of the two subscales of the Affect Balance Scale (ABS), i.e., the 5-item Positive Affect Scale (PAS) and the 5-item Negative Affect Scale (NAS). A set of criteria were developed to distinguish between responses that did and did not correspond to Bradburn’s assumptions concerning affect. While most responses met at least one of the criteria, very few met all. In exploring the nature of affect, we found that positive affect was based to a large extent on personal accomplishments and the recognition of others. The assessment of negative affect was a more interior, or self-focused process. For a significant subset of the sample, a negative response to a closed-ended PAS or NAS item implied disagreement or discontent with the wording or the implications of the item itself, rather than an absence of affect. Not all of the ABS items were equally valid measures of affect. PMID:8056955

Toward a Measure of Accountability in Nursing: A Three-Stage Validation Study.

PubMed

Drach-Zahavy, Anat; Leonenko, Marina; Srulovici, Einav

2018-06-04

To develop and psychometrically evaluate a three-dimensional questionnaire suitable for evaluating personal and organizational accountability in nurses. Accountability is defined as a three-dimensional value, directing professionals to take responsibility for their decisions and actions, to be willing to explain them (transparency) and to be judged according to society's accepted values (answerability). Despite the relatively clear definition, measurement of accountability lags well behind. Existing self-report questionnaires do not fully capture the complexity of the concept; nor do they capture the different sources of accountability (e.g., personal accountability, organizational accountability). A three-stage measure development. Data were collected during 2015-2016. In Phase 1, an initial database of items (N = 74) was developed, based on literature review and qualitative study, establishing face and content validity. In Phase 2, the face, content, construct and criterion-related validity of the initial questionnaires (19 items for personal and organizational accountability questionnaire) was established with a sample of 229 nurses. In Phase 3, the final questionnaires (19 items each) were validated with a new sample of 329 nurses and established construct validity. The final version of the instruments comprised 19 items, suitable for assessing personal and organizational accountability. The questionnaire referred to the dimensions of responsibility, transparency and answerability. The findings established the instrument's content, construct and criterion-related validity, as well as good internal reliability. The questionnaire portrays accountability in nursing, by capturing nurses' subjective perceptions of accountability dimensions (responsibility, transparency, answerability), as demonstrated by personal and organizational values. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Item Selection, Evaluation, and Simple Structure in Personality Data

PubMed Central

Pettersson, Erik; Turkheimer, Eric

2010-01-01

We report an investigation of the genesis and interpretation of simple structure in personality data using two very different self-reported data sets. The first consists of a set of relatively unselected lexical descriptors, whereas the second is based on responses to a carefully constructed instrument. In both data sets, we explore the degree of simple structure by comparing factor solutions to solutions from simulated data constructed to have either strong or weak simple structure. The analysis demonstrates that there is little evidence of simple structure in the unselected items, and a moderate degree among the selected items. In both instruments, however, much of the simple structure that could be observed originated in a strong dimension of positive vs. negative evaluation. PMID:20694168
Two objective measures of self-esteem.

PubMed

Lorr, M; Wunderlich, R A

1986-01-01

Two scales were constructed to assess self-esteem, conceptualized as reflecting (a) feelings of competence and efficacy, and (b) perceived positive appraisal from significant others. To control for response bias a paired choice format was chosen for the items constructed. A buffer scale designed to measure social assertiveness was also included. Data were collected on three samples of high school boys. The item intercorrelations were subjected to principal component analyses followed by Varimax rotations. In each of the three analyses factors of Confidence, Popularity (Social Approval), and Social Assertiveness emerged. The revised self-esteem scales, each defined by 11 items, have been shown to have acceptable reliability and some concurrent validity based on correlations with the well-known Rosenberg Self-Esteem Scale.
The Utrecht questionnaire (U-CEP) measuring knowledge on clinical epidemiology proved to be valid.

PubMed

Kortekaas, Marlous F; Bartelink, Marie-Louise E L; de Groot, Esther; Korving, Helen; de Wit, Niek J; Grobbee, Diederick E; Hoes, Arno W

2017-02-01

Knowledge on clinical epidemiology is crucial to practice evidence-based medicine. We describe the development and validation of the Utrecht questionnaire on knowledge on Clinical epidemiology for Evidence-based Practice (U-CEP); an assessment tool to be used in the training of clinicians. The U-CEP was developed in two formats: two sets of 25 questions and a combined set of 50. The validation was performed among postgraduate general practice (GP) trainees, hospital trainees, GP supervisors, and experts. Internal consistency, internal reliability (item-total correlation), item discrimination index, item difficulty, content validity, construct validity, responsiveness, test-retest reliability, and feasibility were assessed. The questionnaire was externally validated. Internal consistency was good with a Cronbach alpha of 0.8. The median item-total correlation and mean item discrimination index were satisfactory. Both sets were perceived as relevant to clinical practice. Construct validity was good. Both sets were responsive but failed on test-retest reliability. One set took 24 minutes and the other 33 minutes to complete, on average. External GP trainees had comparable results. The U-CEP is a valid questionnaire to assess knowledge on clinical epidemiology, which is a prerequisite for practicing evidence-based medicine in daily clinical practice. Copyright © 2016 Elsevier Inc. All rights reserved.
Constructing Multiple-Choice Items to Measure Higher-Order Thinking

ERIC Educational Resources Information Center

Scully, Darina

2017-01-01

Across education, certification and licensure, there are repeated calls for the development of assessments that target "higher-order thinking," as opposed to mere recall of facts. A common assumption is that this necessitates the use of constructed response or essay-style test questions; however, empirical evidence suggests that this may…
Constructing the Exact Significance Level for a Person-Fit Statistic.

ERIC Educational Resources Information Center

Liou, Michelle; Chang, Chih-Hsin

1992-01-01

An extension is proposed for the network algorithm introduced by C.R. Mehta and N.R. Patel to construct exact tail probabilities for testing the general hypothesis that item responses are distributed according to the Rasch model. A simulation study indicates the efficiency of the algorithm. (SLD)
A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.

PubMed

Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily

2018-02-23

The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.
Measuring Graph Comprehension, Critique, and Construction in Science

NASA Astrophysics Data System (ADS)

Lai, Kevin; Cabrera, Julio; Vitale, Jonathan M.; Madhok, Jacquie; Tinker, Robert; Linn, Marcia C.

2016-08-01

Interpreting and creating graphs plays a critical role in scientific practice. The K-12 Next Generation Science Standards call for students to use graphs for scientific modeling, reasoning, and communication. To measure progress on this dimension, we need valid and reliable measures of graph understanding in science. In this research, we designed items to measure graph comprehension, critique, and construction and developed scoring rubrics based on the knowledge integration (KI) framework. We administered the items to over 460 middle school students. We found that the items formed a coherent scale and had good reliability using both item response theory and classical test theory. The KI scoring rubric showed that most students had difficulty linking graphs features to science concepts, especially when asked to critique or construct graphs. In addition, students with limited access to computers as well as those who speak a language other than English at home have less integrated understanding than others. These findings point to the need to increase the integration of graphing into science instruction. The results suggest directions for further research leading to comprehensive assessments of graph understanding.
Evaluating Job Demands and Control Measures for Use in Farm Worker Health Surveillance.

PubMed

Alterman, Toni; Gabbard, Susan; Grzywacz, Joseph G; Shen, Rui; Li, Jia; Nakamoto, Jorge; Carroll, Daniel J; Muntaner, Carles

2015-10-01

Workplace stress likely plays a role in health disparities; however, applying standard measures to studies of immigrants requires thoughtful consideration. The goal of this study was to determine the appropriateness of two measures of occupational stressors ('decision latitude' and 'job demands') for use with mostly immigrant Latino farm workers. Cross-sectional data from a pilot module containing a four-item measure of decision latitude and a two-item measure of job demands were obtained from a subsample (N = 409) of farm workers participating in the National Agricultural Workers Survey. Responses to items for both constructs were clustered toward the low end of the structured response-set. Percentages of responses of 'very often' and 'always' for each of the items were examined by educational attainment, birth country, dominant language spoken, task, and crop. Cronbach's α, when stratified by subgroups of workers, for the decision latitude items were (0.65-0.90), but were less robust for the job demands items (0.25-0.72). The four-item decision latitude scale can be applied to occupational stress research with immigrant farm workers, and potentially other immigrant Latino worker groups. The short job demands scale requires further investigation and evaluation before suggesting widespread use.
Measuring Alexithymia via Trait Approach-I: A Alexithymia Scale Item Selection and Formation of Factor Structure

PubMed Central

TATAR, Arkun; SALTUKOĞLU, Gaye; ALİOĞLU, Seda; ÇİMEN, Sümeyye; GÜVEN, Hülya; AY, Çağla Ebru

2017-01-01

Introduction It is not clear in the literature whether available instruments are sufficient to measure alexithymia because of its theoretical structure. Moreover, it has been reported that several measuring instruments are needed to measure this construct, and all the instruments have different error sources. The old and the new forms of Toronto Alexithymia Scale are the only instruments available in Turkish. Thus, the purpose of this study was to develop a new scale to measure alexithymia, selecting items and constructing the factor structure. Methods A total of 1117 patients aged from 19 to 82 years (mean = 35.05 years) were included. A 100-item pool was prepared and applied to 628 women and 489 men. Data were analyzed using Explanatory Factor Analysis, Confirmatory Factor Analysis, and Item Response Theory and 28 items were selected. The new form of 28 items was applied to 415 university students, including 271 women and 144 men aged from 18 to 30 (mean=21.44). Results The results of Explanatory Factor Analysis revealed a five-factor construct of “Solving and Expressing Affective Experiences,” “External Locused Cognitive Style,” “Tendency to Somatize Affections,” “Imaginary Life and Visualization,” and “Acting Impulsively,” along with a two-factor construct representing the “Affective” and “Cognitive” components. All the components of the construct showed good model fit and high internal consistency. The new form was tested in terms of internal consistency, test-retest reliability, and concurrent validity using Toronto Alexithymia Scale as criteria and discriminative validity using Five-Factor Personality Inventory Short Form. Conclusion The results showed that the new scale met the basic psychometric requirements. Results have been discussed in line with related studies. PMID:29033633
Rasch-built Overall Disability Scale for Multifocal motor neuropathy (MMN-RODS(©) ).

PubMed

Vanhoutte, Els K; Faber, Catharina G; van Nes, Sonja I; Cats, Elisabeth A; Van der Pol, W-Ludo; Gorson, Kenneth C; van Doorn, Pieter A; Cornblath, David R; van den Berg, Leonard H; Merkies, Ingemar S J

2015-09-01

Clinical trials in multifocal motor neuropathy (MMN) have often used ordinal-based measures that may not accurately capture changes. We aimed to construct a disability interval outcome measure specifically for MMN using the Rasch model and to examine its clinimetric properties. A total of 146 preliminary activity and participation items were assessed twice (reliability studies) in 96 clinically stable MMN patients. These patients also assessed the ordinal-based overall disability sum score (construct, sample-dependent validity). The final Rasch-built overall disability scale for MMN (MMN-RODS(©) ) was serially applied in 26 patients with newly diagnosed or relapsing MMN, treated with intravenous immunoglobulin (IVIg) (1-year follow-up; responsiveness study). The magnitude of change for each patient was calculated using the minimum clinically important difference technique related to the individually obtained standard errors. A total of 121 items not fulfilling Rasch requirements were removed. The final 25-item MMN-RODS(©) fulfilled all Rasch model's expectations and showed acceptable reliability and validity including good discriminatory capacity. Most serially examined patients improved, but its magnitude was low, reflecting poor responsiveness. The constructed MMN-RODS(©) is a disease-specific, interval measure to detect activity limitations in patients with MMN and overcomes the shortcomings of ordinal scales. However, future clinimetric studies are needed to improve the MMN-RODS(©) 's responsiveness by longer observations and/or more rigorous treatment regimens. © 2015 Peripheral Nerve Society.
Measuring the Quality of Life of Visually Impaired Children: First Stage Psychometric Evaluation of the Novel VQoL_CYP Instrument.

PubMed

Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S

2016-01-01

To report piloting and initial validation of the VQoL_CYP, a novel age-appropriate vision-related quality of life (VQoL) instrument for self-reporting by children with visual impairment (VI). Participants were a random patient sample of children with VI aged 10-15 years. 69 patients, drawn from patient databases at Great Ormond Street Hospital and Moorfields Eye Hospital, United Kingdom, participated in piloting of the draft 47-item VQoL instrument, which enabled preliminary item reduction. Subsequent administration of the instrument, alongside functional vision (FV) and generic health-related quality of life (HRQoL) self-report measures, to 101 children with VI comprising a nationally representative sample enabled further item reduction and evaluation of psychometric properties using Rasch analysis. Construct validity was assessed through Pearson correlation coefficients. Item reduction through piloting (8 items removed for skewness and individual item response pattern) and validation (1 item removed for skewness and 3 for misfit in Rasch) produced a 35-item scale, with fit values within acceptable limits, no notable differential item functioning, good measurement precision, ordered response categories and acceptable targeting in Rasch. The VQoL_CYP showed good construct validity, correlating strongly with HRQoL scores, moderately with FV scores but not with acuity. Robust child-appropriate self-report VQoL measures for children with VI are necessary for understanding the broader impacts of living with a visual disability, distinguishing these from limited functioning per se. Future planned use in larger patient samples will allow further psychometric development of the VQoL_CYP as an adjunct to objective outcomes assessment.
A New Tool for Nutrition App Quality Evaluation (AQEL): Development, Validation, and Reliability Testing

PubMed Central

Huang, Wenhao; Chapman-Novakofski, Karen M

2017-01-01

Background The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. Objective The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps’ educational quality and technical functionality. Methods Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Results Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Conclusions Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps’ qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. PMID:29079554
Development of a Self-Determination Measure for College Students: Validity Evidence for the Basic Needs Satisfaction at College Scale

ERIC Educational Resources Information Center

Jenkins-Guarnieri, Michael A.; Vaughan, Angela L.; Wright, Stephen L.

2015-01-01

We adapted a work self-determination measure to create the Basic Needs Satisfaction at College Scale. Confirmatory factor analysis and item response theory analyses with data from 525 adults supported a 3-factor model with 13 items most sensitive for lower to middle range levels of the autonomy, competence, and relatedness constructs.
Automated Scoring for the "TOEFL Junior"® Comprehensive Writing and Speaking Test. Research Report. ETS RR-15-09

ERIC Educational Resources Information Center

Evanini, Keelan; Heilman, Michael; Wang, Xinhao; Blanchard, Daniel

2015-01-01

This report describes the initial automated scoring results that were obtained using the constructed responses from the Writing and Speaking sections of the pilot forms of the "TOEFL Junior"® Comprehensive test administered in late 2011. For all of the items except one (the edit item in the Writing section), existing automated scoring…
The Communicative Participation Item Bank (CPIB): Item bank calibration and development of a disorder-generic short form

PubMed Central

Baylor, Carolyn; Yorkston, Kathryn; Eadie, Tanya; Kim, Jiseon; Chung, Hyewon; Amtmann, Dagmar

2015-01-01

Purpose The purpose of this study was to calibrate the items for the Communicative Participation Item Bank (CPIB) using Item Response Theory (IRT). One overriding objective was to examine if the IRT item parameters would be consistent across different diagnostic groups, thereby allowing creation of a disorder-generic instrument. The intended outcomes were the final item bank and a short form ready for clinical and research applications. Methods Self-report data were collected from 701 individuals representing four diagnoses: multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis and head and neck cancer. Participants completed the CPIB and additional self-report questionnaires. CPIB data were analyzed using the IRT Graded Response Model (GRM). Results The initial set of 94 candidate CPIB items were reduced to an item bank of 46 items demonstrating unidimensionality, local independence, good item fit, and good measurement precision. Differential item function (DIF) analyses detected no meaningful differences across diagnostic groups. A 10-item, disorder-generic short form was generated. Conclusions The CPIB provides speech-language pathologists with a unidimensional, self-report outcomes measurement instrument dedicated to the construct of communicative participation. This instrument may be useful to clinicians and researchers wanting to implement measures of communicative participation in their work. PMID:23816661
Older adults' drug benefit beliefs: construct definition and measure development.

PubMed

Cline, Richard R; Gupta, Kiran; Singh, Reshmi L

2008-03-01

The Medicare Prescription Drug, Improvement and Modernization Act of 2003 provides coverage of outpatient prescription drugs for Medicare beneficiaries. Although much has been learned since the program's implementation, a context within which this information can be understood is lacking. The purpose of this study was to develop a reliable and valid multi-item instrument measuring beliefs about Medicare prescription drug benefits. Survey items were generated using focus group transcripts, other surveys on the Medicare Part "D" program, and past studies of choice and satisfaction in drug insurance programs. Using data from the survey pilot test, item and reliability analyses were used to reduce and refine an initial pool of items. Data then were collected from a cross-sectional, mail survey of older adults living in Minnesota. Data were analyzed using exploratory factor analysis. Summated rating scales then were constructed and assessed further using reliability analyses. Construct validity of summated scales was examined by comparing scale scores across response categories of survey items that collected information on general political attitudes, perceptions of the Medicare Part "D" program, health status, and health care utilization and demographics. The adjusted response rate for the main survey was 55.98% (744/1329). Iterative factor analysis produced 2 interpretable scales. The first, termed "access/equity" (13 items, Cronbach's alpha=0.89) measures beliefs that a Medicare drug benefit should both provide affordable prescription drugs for beneficiaries and do this in a manner that is equitable for all participants. The second, termed "comprehensibility" (6 items, Cronbach's alpha=0.80) assesses beliefs that regulations governing a Medicare drug benefit should be easily understood. Discriminant validity tests suggest that these measures behave in a manner consistent with related research in these areas. Measures of 2 facets of older adults' drug benefit beliefs were developed using a multiple step procedure. Future research could focus on developing a better understanding of other facets of these beliefs and sound methods of measurement.
Psychometric Evaluation of the Ford Insomnia Response to Stress Test (FIRST) in Early Pregnancy.

PubMed

Gelaye, Bizu; Zhong, Qiu-Yue; Barrios, Yasmin V; Redline, Susan; Drake, Christopher L; Williams, Michelle A

2016-04-15

To evaluate the construct validity and factor structure of the Spanish-language version of the Ford Insomnia Response to Stress Test questionnaire (FIRST-S) when used in early pregnancy. A cohort of 647 women were interviewed at ≤ 16 weeks of gestation to collect information regarding lifestyle, demographic, and sleep characteristics. The factorial structure of the FIRST-S was tested through exploratory and confirmatory factor analyses (EFA and CFA). Internal consistency and construct validity were also assessed by evaluating the association between the FIRST-S with symptoms of depression, anxiety, and sleep quality. Item response theory (IRT) analyses were conducted to complement classical test theory (CTT) analytic approaches. The mean score of the FIRST-S was 13.8 (range: 9-33). The results of the EFA showed that the FIRST-S contained a one-factor solution that accounted for 69.8% of the variance. The FIRST-S items showed good internal consistency (Cronbach α = 0.81). CFA results corroborated the one-factor structure finding from the EFA; and yielded measures indicating goodness of fit (comparative fit index of 0.902) and accuracy (root mean square error of approximation of 0.057). The FIRST-S had good construct validity as demonstrated by statistically significant associations of FIRST-S scores with sleep quality, antepartum depression and anxiety symptoms. Finally, results from IRT analyses suggested excellent item infit and outfit measures. The FIRST-S was found to have good construct validity and internal consistency for assessing vulnerability to insomnia during early pregnancy. © 2016 American Academy of Sleep Medicine.
National Reading Tests in Denmark, Norway, and Sweden: A Comparison of Construct Definitions, Cognitive Targets, and Response Formats

ERIC Educational Resources Information Center

Tengberg, Michael

2017-01-01

Reading comprehension tests are often assumed to measure the same, or at least similar, constructs. Yet, reading is not a single but a multidimensional form of processing, which means that variations in terms of reading material and item design may emphasize one aspect of the construct at the cost of another. The educational systems in Denmark,…
Incorporating Response Times in Item Response Theory Models of Reading Comprehension Fluency

ERIC Educational Resources Information Center

Su, Shiyang

2017-01-01

With the online assessment becoming mainstream and the recording of response times becoming straightforward, the importance of response times as a measure of psychological constructs has been recognized and the literature of modeling times has been growing during the last few decades. Previous studies have tried to formulate models and theories to…

Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test

PubMed Central

Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi

2018-01-01

Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings. PMID:29561879
Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test.

PubMed

Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi; Chen, Kuan-Lin

2018-01-01

The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings.
Development and psychometric properties of the Suicidality of Adolescent Screening Scale (SASS) using Multidimensional Item Response Theory.

PubMed

Sukhawaha, Supattra; Arunpongpaisal, Suwanna; Hurst, Cameron

2016-09-30

Suicide prevention in adolescents by early detection using screening tools to identify high suicidal risk is a priority. Our objective was to build a multidimensional scale namely "Suicidality of Adolescent Screening Scale (SASS)" to identify adolescents at risk of suicide. An initial pool of items was developed by using in-depth interview, focus groups and a literature review. Initially, 77 items were administered to 307 adolescents and analyzed using the exploratory Multidimensional Item Response Theory (MIRT) to remove unnecessary items. A subsequent exploratory factor analysis revealed 35 items that collected into 4 factors: Stressors, Pessimism, Suicidality and Depression. To confirm this structure, a new sample of 450 adolescents were collected and confirmatory MIRT factor analysis was performed. The resulting scale was shown to be both construct valid and able to discriminate well between adolescents that had, and hadn't previous attempted suicide. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A modular approach for item response theory modeling with the R package flirt.

PubMed

Jeon, Minjeong; Rijmen, Frank

2016-06-01

The new R package flirt is introduced for flexible item response theory (IRT) modeling of psychological, educational, and behavior assessment data. flirt integrates a generalized linear and nonlinear mixed modeling framework with graphical model theory. The graphical model framework allows for efficient maximum likelihood estimation. The key feature of flirt is its modular approach to facilitate convenient and flexible model specifications. Researchers can construct customized IRT models by simply selecting various modeling modules, such as parametric forms, number of dimensions, item and person covariates, person groups, link functions, etc. In this paper, we describe major features of flirt and provide examples to illustrate how flirt works in practice.
Development and validation of a measure of workplace climate for healthy weight maintenance.

PubMed

Sliter, Katherine A

2013-07-01

Due to the obesity epidemic, an increasing amount of research is being conducted to better understand the antecedents and consequences of excess employee weight. One construct often of interest to researchers in this area is organizational climate. Unfortunately, a viable measure of climate, as related to employee weight, does not exist. The purpose of this study was to remedy this by developing and validating a concise, psychometrically sound measure of climate for healthy weight. An item pool was developed based on surveys of full-time employees, and a sorting task was used to eliminate ambiguous items. Items were pilot tested by a sample of 338 full-time employees, and the item pool was reduced through item response theory (IRT) and reliability analyses. Finally, the retained 14 items, comprising 3 subscales, were completed by a sample of 360 full-time employees, representing 26 different organizations from across the United States. Multilevel modeling indicated that sufficient variance was explained by group membership to support aggregation, and confirmatory factor analysis (CFA) supported the hypothesized model of 3 subscale factors and an overall climate factor. Nine hypotheses specific to construct validation were tested. Scores on the new scale correlated significantly with individual-level reports of psychological constructs (e.g., health motivation, general leadership support for health) and physiological phenomena (e.g., body mass index [BMI], physical health problems) to which they should theoretically relate, supporting construct validity. Implications for the use of this scale in both applied and research settings are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Predicting sugar-sweetened behaviours with theory of planned behaviour constructs: Outcome and process results from the SIPsmartER behavioural intervention.

PubMed

Zoellner, Jamie M; Porter, Kathleen J; Chen, Yvonnes; Hedrick, Valisa E; You, Wen; Hickman, Maja; Estabrooks, Paul A

2017-05-01

Guided by the theory of planned behaviour (TPB) and health literacy concepts, SIPsmartER is a six-month multicomponent intervention effective at improving SSB behaviours. Using SIPsmartER data, this study explores prediction of SSB behavioural intention (BI) and behaviour from TPB constructs using: (1) cross-sectional and prospective models and (2) 11 single-item assessments from interactive voice response (IVR) technology. Quasi-experimental design, including pre- and post-outcome data and repeated-measures process data of 155 intervention participants. Validated multi-item TPB measures, single-item TPB measures, and self-reported SSB behaviours. Hypothesised relationships were investigated using correlation and multiple regression models. TPB constructs explained 32% of the variance cross sectionally and 20% prospectively in BI; and explained 13-20% of variance cross sectionally and 6% prospectively. Single-item scale models were significant, yet explained less variance. All IVR models predicting BI (average 21%, range 6-38%) and behaviour (average 30%, range 6-55%) were significant. Findings are interpreted in the context of other cross-sectional, prospective and experimental TPB health and dietary studies. Findings advance experimental application of the TPB, including understanding constructs at outcome and process time points and applying theory in all intervention development, implementation and evaluation phases.
Development and psychometric characteristics of the SCI-QOL Ability to Participate and Satisfaction with Social Roles and Activities item banks and short forms.

PubMed

Heinemann, Allen W; Kisala, Pamela A; Hahn, Elizabeth A; Tulsky, David S

2015-05-01

To develop a spinal cord injury (SCI)-focused version of PROMIS and Neuro-QOL social domain item banks; evaluate the psychometric properties of items developed for adults with SCI; and report information to facilitate clinical and research use. We used a mixed-methods design to develop and evaluate Ability to Participate in Social Roles and Activities and Satisfaction with Social Roles and Activities items. Focus groups helped define the constructs; cognitive interviews helped revise items; and confirmatory factor analysis and item response theory methods helped calibrate item banks and evaluate differential item functioning related to demographic and injury characteristics. Five SCI Model System sites and one Veterans Administration medical center. The calibration sample consisted of 641 individuals; a reliability sample consisted of 245 individuals residing in the community. A subset of 27 Ability to Participate and 35 Satisfaction items demonstrated good measurement properties and negligible differential item functioning related to demographic and injury characteristics. The SCI-specific measures correlate strongly with the PROMIS and Neuro-QOL versions. Ten item short forms correlate >0.96 with the full banks. Variable-length CATs with a minimum of 4 items, variable-length CATs with a minimum of 8 items, fixed-length CATs of 10 items, and the 10-item short forms demonstrate construct coverage and measurement error that is comparable to the full item bank. The Ability to Participate and Satisfaction with Social Roles and Activities CATs and short forms demonstrate excellent psychometric properties and are suitable for clinical and research applications.
Bees Algorithm for Construction of Multiple Test Forms in E-Testing

ERIC Educational Resources Information Center

Songmuang, Pokpong; Ueno, Maomi

2011-01-01

The purpose of this research is to automatically construct multiple equivalent test forms that have equivalent qualities indicated by test information functions based on item response theory. There has been a trade-off in previous studies between the computational costs and the equivalent qualities of test forms. To alleviate this problem, we…
TEST BOOKLET FOR HIGH SCHOOL BIOLOGY, EXPERIMENTAL MATERIALS FOR USE 1966-1968.

ERIC Educational Resources Information Center

Biological Sciences Curriculum Study, Boulder, CO.

SUPPLEMENTARY TEST QUESTIONS FOR USE BY SECONDARY BIOLOGICAL SCIENCES CURRICULUM STUDY GREEN VERSION BIOLOGY TEACHERS IN THE CONSTRUCTION OF EXAMINATIONS ARE CONTAINED IN THIS EXPERIMENTAL MANUAL. THE ITEMS WERE PREPARED BY THE BIOLOGICAL SCIENCES CURRICULUM STUDY TEST CONSTRUCTION COMMITTEE IN RESPONSE TO TEACHER REQUESTS FOR SHORT-RANGE TESTS.…
Psychometric properties of the SDM-Q-9 questionnaire for shared decision-making in multiple sclerosis: item response theory modelling and confirmatory factor analysis.

PubMed

Ballesteros, Javier; Moral, Ester; Brieva, Luis; Ruiz-Beato, Elena; Prefasi, Daniel; Maurino, Jorge

2017-04-22

Shared decision-making is a cornerstone of patient-centred care. The 9-item Shared Decision-Making Questionnaire (SDM-Q-9) is a brief self-assessment tool for measuring patients' perceived level of involvement in decision-making related to their own treatment and care. Information related to the psychometric properties of the SDM-Q-9 for multiple sclerosis (MS) patients is limited. The objective of this study was to assess the performance of the items composing the SDM-Q-9 and its dimensional structure in patients with relapsing-remitting MS. A non-interventional, cross-sectional study in adult patients with relapsing-remitting MS was conducted in 17 MS units throughout Spain. A nonparametric item response theory (IRT) analysis was used to assess the latent construct and dimensional structure underlying the observed responses. A parametric IRT model, General Partial Credit Model, was fitted to obtain estimates of the relationship between the latent construct and item characteristics. The unidimensionality of the SDM-Q-9 instrument was assessed by confirmatory factor analysis. A total of 221 patients were studied (mean age = 42.1 ± 9.9 years, 68.3% female). Median Expanded Disability Status Scale score was 2.5 ± 1.5. Most patients reported taking part in each step of the decision-making process. Internal reliability of the instrument was high (Cronbach's α = 0.91) and the overall scale scalability score was 0.57, indicative of a strong scale. All items, except for the item 1, showed scalability indices higher than 0.30. Four items (items 6 through to 9) conveyed more than half of the SDM-Q-9 overall information (67.3%). The SDM-Q-9 was a good fit for a unidimensional latent structure (comparative fit index = 0.98, root-mean-square error of approximation = 0.07). All freely estimated parameters were statistically significant (P < 0.001). All items presented standardized parameter estimates with salient loadings (>0.40) with the exception of item 1 which presented the lowest loading (0.26). Items 6 through to 8 were the most relevant items for shared decision-making. The SDM-Q-9 presents appropriate psychometric properties and is therefore useful for assessing different aspects of shared decision-making in patients with multiple sclerosis.
The Development of a Multiple-Item Annoyance Scale (MIAS) for Transportation Noise Annoyance

PubMed Central

Belke, Christin; Spilski, Jan

2018-01-01

In 2001, Team#6 of the International Commission on Biological Effects of Noise (ICBEN) recommended the use of two single international standardised questions and response scales. This recommendation has been widely accepted in the scientific community. Nevertheless, annoyance can be regarded as a multidimensional construct comprising the three elements: (1) experience of an often repeated noise-related disturbance and the behavioural response to cope with it, (2) an emotional/attitudinal response to the sound and its disturbing impact, and (3) the perceived control or coping capacity with regard to the noise situation. The psychometric properties of items reflecting these three elements have been explored for aircraft noise annoyance. Analyses were conducted using data of the NORAH-Study (Noise-Related Annoyance, Cognition, and Health), and a multi-item noise annoyance scale (MIAS) has been developed and tested post hoc by using a stepwise process (exploratory and confirmatory factor analyses). Preliminary results were presented to the 12th ICBEN Congress in 2017. In this study, the validation of MIAS is done for aircraft noise and extended to railway and road traffic noise. The results largely confirm the concept of MIAS as a second-order construct of annoyance for all of the investigated transportation noise sources; however, improvements can be made, in particular with regard to items addressing the perceived coping capacity. PMID:29757228
Development and validation of the Measure of Indigenous Racism Experiences (MIRE)

PubMed Central

Paradies, Yin C; Cunningham, Joan

2008-01-01

Background In recent decades there has been increasing evidence of a relationship between self-reported racism and health. Although a plethora of instruments to measure racism have been developed, very few have been described conceptually or psychometrically Furthermore, this research field has been limited by a dearth of instruments that examine reactions/responses to racism and by a restricted focus on African American populations. Methods In response to these limitations, the 31-item Measure of Indigenous Racism Experiences (MIRE) was developed to assess self-reported racism for Indigenous Australians. This paper describes the development of the MIRE together with an opportunistic examination of its content, construct and convergent validity in a population health study involving 312 Indigenous Australians. Results Focus group research supported the content validity of the MIRE, and inter-item/scale correlations suggested good construct validity. A good fit with a priori conceptual dimensions was demonstrated in factor analysis, and convergence with a separate item on discrimination was satisfactory. Conclusion The MIRE has considerable utility as an instrument that can assess multiple facets of racism together with responses/reactions to racism among indigenous populations and, potentially, among other ethnic/racial groups. PMID:18426602
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda.

PubMed

Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert

2008-12-02

The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda.
The reliability and validity of the SF-8 with a conflict-affected population in northern Uganda

PubMed Central

Roberts, Bayard; Browne, John; Ocaka, Kaducu Felix; Oyok, Thomas; Sondorp, Egbert

2008-01-01

Background The SF-8 is a health-related quality of life instrument that could provide a useful means of assessing general physical and mental health amongst populations affected by conflict. The purpose of this study was to test the validity and reliability of the SF-8 with a conflict-affected population in northern Uganda. Methods A cross-sectional multi-staged, random cluster survey was conducted with 1206 adults in camps for internally displaced persons in Gulu and Amuru districts of northern Uganda. Data quality was assessed by analysing the number of incomplete responses to SF-8 items. Response distribution was analysed using aggregate endorsement frequency. Test-retest reliability was assessed in a separate smaller survey using the intraclass correlation test. Construct validity was measured using principal component analysis, and the Pearson Correlation test for item-summary score correlation and inter-instrument correlations. Known groups validity was assessed using a two sample t-test to evaluates the ability of the SF-8 to discriminate between groups known to have, and not have, physical and mental health problems. Results The SF-8 showed excellent data quality. It showed acceptable item response distribution based upon analysis of aggregate endorsement frequencies. Test-retest showed a good intraclass correlation of 0.61 for PCS and 0.68 for MCS. The principal component analysis indicated strong construct validity and concurred with the results of the validity tests by the SF-8 developers. The SF-8 also showed strong construct validity between the 8 items and PCS and MCS summary score, moderate inter-instrument validity, and strong known groups validity. Conclusion This study provides evidence on the reliability and validity of the SF-8 amongst IDPs in northern Uganda. PMID:19055716
Evaluation of the Fecal Incontinence Quality of Life Scale (FIQL) using item response theory reveals limitations and suggests revisions.

PubMed

Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A

2018-06-01

The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
A multi-agent safety response model in the construction industry.

PubMed

Meliá, José L

2015-01-01

The construction industry is one of the sectors with the highest accident rates and the most serious accidents. A multi-agent safety response approach allows a useful diagnostic tool in order to understand factors affecting risk and accidents. The special features of the construction sector can influence the relationships among safety responses along the model of safety influences. The purpose of this paper is to test a model explaining risk and work-related accidents in the construction industry as a result of the safety responses of the organization, the supervisors, the co-workers and the worker. 374 construction employees belonging to 64 small Spanish construction companies working for two main companies participated in the study. Safety responses were measured using a 45-item Likert-type questionnaire. The structure of the measure was analyzed using factor analysis and the model of effects was tested using a structural equation model. Factor analysis clearly identifies the multi-agent safety dimensions hypothesized. The proposed safety response model of work-related accidents, involving construction specific results, showed a good fit. The multi-agent safety response approach to safety climate is a useful framework for the assessment of organizational and behavioral risks in construction.
[Design and validation of a questionnaire for psychosocial nursing diagnosis in Primary Care].

PubMed

Brito-Brito, Pedro Ruymán; Rodríguez-Álvarez, Cristobalina; Sierra-López, Antonio; Rodríguez-Gómez, José Ángel; Aguirre-Jaime, Armando

2012-01-01

To develop a valid, reliable and easy-to-use questionnaire for a psychosocial nursing diagnosis. The study was performed in two phases: first phase, questionnaire design and construction; second phase, validity and reliability tests. A bank of items was constructed using the NANDA classification as a theoretical framework. Each item was assigned a Likert scale or dichotomous response. The combination of responses to the items constituted the diagnostic rules to assign up to 28 labels. A group of experts carried out the validity test for content. Other validated scales were used as reference standards for the criterion validity tests. Forty-five nurses provided the questionnaire to the patients on three separate occasions over a period of three weeks, and the other validated scales only once to 188 randomly selected patients in Primary Care centres in Tenerife (Spain). Validity tests for construct confirmed the six dimensions of the questionnaire with 91% of total variance explained. Validity tests for criterion showed a specificity of 66%-100%, and showed high correlations with the reference scales when the questionnaire was assigning nursing diagnoses. Reliability tests showed agreement of 56%-91% (P<.001), and a 93% internal consistency. The Questionnaire for Psychosocial Nursing Diagnosis was called CdePS, and included 61 items. The CdePS is a valid, reliable and easy-to-use tool in Primary Care centres to improve the assigning of a psychosocial nursing diagnosis. Copyright © 2011 Elsevier España, S.L. All rights reserved.
Pedagogy of Science Teaching Tests: Formative assessments of science teaching orientations

NASA Astrophysics Data System (ADS)

Cobern, William W.; Schuster, David; Adams, Betty; Skjold, Brandy Ann; Zeynep Muğaloğlu, Ebru; Bentz, Amy; Sparks, Kelly

2014-09-01

A critical aspect of teacher education is gaining pedagogical content knowledge of how to teach science for conceptual understanding. Given the time limitations of college methods courses, it is difficult to touch on more than a fraction of the science topics potentially taught across grades K-8, particularly in the context of relevant pedagogies. This research and development work centers on constructing a formative assessment resource to help expose pre-service teachers to a greater number of science topics within teaching episodes using various modes of instruction. To this end, 100 problem-based, science pedagogy assessment items were developed via expert group discussions and pilot testing. Each item contains a classroom vignette followed by response choices carefully crafted to include four basic pedagogies (didactic direct, active direct, guided inquiry, and open inquiry). The brief but numerous items allow a substantial increase in the number of science topics that pre-service students may consider. The intention is that students and teachers will be able to share and discuss particular responses to individual items, or else record their responses to collections of items and thereby create a snapshot profile of their teaching orientations. Subsets of items were piloted with students in pre-service science methods courses, and the quantitative results of student responses were spread sufficiently to suggest that the items can be effective for their intended purpose.
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

PubMed

Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

2014-01-01

Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
A Comparison of Three IRT Approaches to Examinee Ability Change Modeling in a Single-Group Anchor Test Design

ERIC Educational Resources Information Center

Paek, Insu; Park, Hyun-Jeong; Cai, Li; Chi, Eunlim

2014-01-01

Typically a longitudinal growth modeling based on item response theory (IRT) requires repeated measures data from a single group with the same test design. If operational or item exposure problems are present, the same test may not be employed to collect data for longitudinal analyses and tests at multiple time points are constructed with unique…

Studies of a Latent Class Signal Detection Model for Constructed Response Scoring II: Incomplete and Hierarchical Designs. Research Report. ETS RR-10-08

ERIC Educational Resources Information Center

DeCarlo, Lawrence T.

2010-01-01

A basic consideration in large-scale assessments that use constructed response (CR) items, such as essays, is how to allocate the essays to the raters that score them. Designs that are used in practice are incomplete, in that each essay is scored by only a subset of the raters, and also unbalanced, in that the number of essays scored by each rater…
Construct validity of the Heart Failure Screening Tool (Heart-FaST) to identify heart failure patients at risk of poor self-care: Rasch analysis.

PubMed

Reynolds, Nicholas A; Ski, Chantal F; McEvedy, Samantha M; Thompson, David R; Cameron, Jan

2018-02-14

The aim of this study was to psychometrically evaluate the Heart Failure Screening Tool (Heart-FaST) via: (1) examination of internal construct validity; (2) testing of scale function in accordance with design; and (3) recommendation for change/s, if items are not well adjusted, to improve psychometric credential. Self-care is vital to the management of heart failure. The Heart-FaST may provide a prospective assessment of risk, regarding the likelihood that patients with heart failure will engage in self-care. Psychometric validation of the Heart-FaST using Rasch analysis. The Heart-FaST was administered to 135 patients (median age = 68, IQR = 59-78 years; 105 males) enrolled in a multidisciplinary heart failure management program. The Heart-FaST is a nurse-administered tool for screening patients with HF at risk of poor self-care. A Rasch analysis of responses was conducted which tested data against Rasch model expectations, including whether items serve as unbiased, non-redundant indicators of risk and measure a single construct and that rating scales operate as intended. The results showed that data met Rasch model expectations after rescoring or deleting items due to poor discrimination, disordered thresholds, differential item functioning, or response dependence. There was no evidence of multidimensionality which supports the use of total scores from Heart-FaST as indicators of risk. Aggregate scores from this modified screening tool rank heart failure patients according to their "risk of poor self-care" demonstrating that the Heart-FaST items constitute a meaningful scale to identify heart failure patients at risk of poor engagement in heart failure self-care. © 2018 John Wiley & Sons Ltd.
Evaluating Job Demands and Control Measures for Use in Farm Worker Health Surveillance

PubMed Central

Alterman, Toni; Gabbard, Susan; Grzywacz, Joseph G.; Shen, Rui; Li, Jia; Nakamoto, Jorge; Carroll, Daniel J.; Muntaner, Carles

2015-01-01

Workplace stress likely plays a role in health disparities; however, applying standard measures to studies of immigrants requires thoughtful consideration. The goal of this study was to determine the appropriateness of two measures of occupational stressors (‘decision latitude’ and ‘job demands’) for use with mostly immigrant Latino farm workers. Cross-sectional data from a pilot module containing a four-item measure of decision latitude and a two-item measure of job demands were obtained from a subsample (N = 409) of farm workers participating in the National Agricultural Workers Survey. Responses to items for both constructs were clustered toward the low end of the structured response-set. Percentages of responses of ‘very often’ and ‘always’ for each of the items were examined by educational attainment, birth country, dominant language spoken, task, and crop. Cronbach’s α, when stratified by subgroups of workers, for the decision latitude items were (0.65–0.90), but were less robust for the job demands items (0.25–0.72). The four-item decision latitude scale can be applied to occupational stress research with immigrant farm workers, and potentially other immigrant Latino worker groups. The short job demands scale requires further investigation and evaluation before suggesting widespread use. PMID:25138138
The Dutch-Flemish PROMIS Physical Function item bank exhibited strong psychometric properties in patients with chronic pain.

PubMed

Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D

2017-07-01

The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Exploring the Full-Information Bifactor Model in Vertical Scaling with Construct Shift

ERIC Educational Resources Information Center

Li, Ying; Lissitz, Robert W.

2012-01-01

To address the lack of attention to construct shift in item response theory (IRT) vertical scaling, a multigroup, bifactor model was proposed to model the common dimension for all grades and the grade-specific dimensions. Bifactor model estimation accuracy was evaluated through a simulation study with manipulated factors of percentage of common…
Managing What We Can Measure: Quantifying the Susceptibility of Automated Scoring Systems to Gaming Behavior

ERIC Educational Resources Information Center

Higgins, Derrick; Heilman, Michael

2014-01-01

As methods for automated scoring of constructed-response items become more widely adopted in state assessments, and are used in more consequential operational configurations, it is critical that their susceptibility to gaming behavior be investigated and managed. This article provides a review of research relevant to how construct-irrelevant…
Development of a survey instrument to measure connectivity to evaluate national public health preparedness and response performance.

PubMed

Dorn, Barry C; Savoia, Elena; Testa, Marcia A; Stoto, Michael A; Marcus, Leonard J

2007-01-01

Survey instruments for evaluating public health preparedness have focused on measuring the structure and capacity of local, state, and federal agencies, rather than linkages among structure, process, and outcomes. To focus evaluation on the latter, we evaluated the linkages among individuals, organizations, and systems using the construct of "connectivity" and developed a measurement instrument. Results from focus groups of emergency preparedness first responders generated 62 items used in the development sample of 187 respondents. Item reduction and factors analyses were conducted to confirm the scale's components. The 62 items were reduced to 28. Five scales explained 70% of the total variance (number of items, percent variance explained, Cronbach's alpha) including connectivity with the system (8, 45%, 0.94), coworkers (7, 7%, 0.91), organization (7, 12%, 0.93), and perceptions (6, 6%, 0.90). Discriminant validity was found to be consistent with the factor structure. We developed a Connectivity Measurement Tool for the public health workforce consisting of a 34-item questionnaire found to be a reliable measure of connectivity with preliminary evidence of construct validity.
Are life satisfaction and self-esteem distinct constructs? A black South African perspective.

PubMed

Westaway, Margaret S; Maluka, Constance S

2005-10-01

As part of a longitudinal project on Quality of Life, a study was undertaken to extend the applicability of the 5-item Satisfaction With Life Scale, developed in the USA, in South Africa. Data on basic sociodemographic characteristics, the scale, and the 10-item Rosenberg Self-esteem scale were available for 360 Black South Africans (151 men and 209 women), ages 21 to 83 years (M = 38.6 yr., SD = 10.3). Factor analysis applied to scale scores gave two factors, accounting for 71% of the variance. Factor I was loaded by 10 Self-esteem items and Factor II by four of the five Life Satisfaction items. Coefficient alpha was .77 for the Satisfaction With Life Scale and .97 for the Rosenberg Self-esteem Scale. Life Satisfaction was related to Self-esteem (r = .17, p < .01). It was concluded that Life Satisfaction and Self-esteem appear to be distinct, unitary constructs, but responses to Item 5 on the Satisfaction With Life Scale require cautious interpretation and may contribute to the weak r, although so may the collectivist culture of Black South Africans.
A New Tool for Nutrition App Quality Evaluation (AQEL): Development, Validation, and Reliability Testing.

PubMed

DiFilippo, Kristen Nicole; Huang, Wenhao; Chapman-Novakofski, Karen M

2017-10-27

The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps' educational quality and technical functionality. Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps' qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. ©Kristen Nicole DiFilippo, Wenhao Huang, Karen M. Chapman-Novakofski. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 27.10.2017.
A natural language screening measure for motivation to change.

PubMed

Miller, William R; Johnson, Wendy R

2008-09-01

Client motivation for change, a topic of high interest to addiction clinicians, is multidimensional and complex, and many different approaches to measurement have been tried. The current effort drew on psycholinguistic research on natural language that is used by clients to describe their own motivation. Seven addiction treatment sites participated in the development of a simple scale to measure client motivation. Twelve items were drafted to represent six potential dimensions of motivation for change that occur in natural discourse. The maximum self-rating of motivation (10 on a 0-10 scale) was the median score on all items, and 43% of respondents rated 10 on all 12 items - a substantial ceiling effect. From 1035 responses, three factors emerged representing importance, ability, and commitment - constructs that are also reflected in several theoretical models of motivation. A 3-item version of the scale, with one marker item for each of these constructs, accounted for 81% of variance in the full scale. The three items are: 1. It is important for me to . . . 2. I could . . . and 3. I am trying to . . . This offers a quick (1-minute) assessment of clients' self-reported motivation for change.
Development and psychometric evaluation of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions.

PubMed

Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike

2018-01-01

To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.
Investigating Assessment Bias for Constructed Response Explanation Tasks: Implications for Evaluating Performance Expectations for Scientific Practice

NASA Astrophysics Data System (ADS)

Federer, Meghan Rector

Assessment is a key element in the process of science education teaching and research. Understanding sources of performance bias in science assessment is a major challenge for science education reforms. Prior research has documented several limitations of instrument types on the measurement of students' scientific knowledge (Liu et al., 2011; Messick, 1995; Popham, 2010). Furthermore, a large body of work has been devoted to reducing assessment biases that distort inferences about students' science understanding, particularly in multiple-choice [MC] instruments. Despite the above documented biases, much has yet to be determined for constructed response [CR] assessments in biology and their use for evaluating students' conceptual understanding of scientific practices (such as explanation). Understanding differences in science achievement provides important insights into whether science curricula and/or assessments are valid representations of student abilities. Using the integrative framework put forth by the National Research Council (2012), this dissertation aimed to explore whether assessment biases occur for assessment practices intended to measure students' conceptual understanding and proficiency in scientific practices. Using a large corpus of undergraduate biology students' explanations, three studies were conducted to examine whether known biases of MC instruments were also apparent in a CR instrument designed to assess students' explanatory practice and understanding of evolutionary change (ACORNS: Assessment of COntextual Reasoning about Natural Selection). The first study investigated the challenge of interpreting and scoring lexically ambiguous language in CR answers. The incorporation of 'multivalent' terms into scientific discourse practices often results in statements or explanations that are difficult to interpret and can produce faulty inferences about student knowledge. The results of this study indicate that many undergraduate biology majors frequently incorporate multivalent concepts into explanations of change, resulting in explanatory practices that were scientifically non-normative. However, use of follow-up question approaches was found to resolve this source of bias and thereby increase the validity of inferences about student understanding. The second study focused on issues of item and instrument structure, specifically item feature effects and item position effects, which have been shown to influence measures of student performance across assessment tasks. Results indicated that, along the instrument item sequence, items with similar surface features produced greater sequencing effects than sequences of items with dissimilar surface features. This bias could be addressed by use of a counterbalanced design (i.e., Latin Square) at the population level of analysis. Explanation scores were also highly correlated with student verbosity, despite verbosity being an intrinsically trivial aspect of explanation quality. Attempting to standardize student response length was one proposed solution to the verbosity bias. The third study explored gender differences in students' performance on constructed-response explanation tasks using impact (i.e., mean raw scores) and differential item function (i.e., item difficulties) patterns. While prior research in science education has suggested that females tend to perform better on constructed-response items, the results of this study revealed no overall differences in gender achievement. However, evaluation of specific item features patterns suggested that female respondents have a slight advantage on unfamiliar explanation tasks. That is, male students tended to incorporate fewer scientifically normative concepts (i.e., key concepts) than females for unfamiliar taxa. Conversely, females tended to incorporate more scientifically non-normative ideas (i.e., naive ideas) than males for familiar taxa. Together these results indicate that gender achievement differences for this CR instrument may be a result of differences in how males and females interpret and respond to combinations of item features. Overall, the results presented in the subsequent chapters suggest that as science education shifts toward the evaluation of fused scientific knowledge and practice (e.g., explanation), it is essential that educators and researchers investigate potential sources of bias inherent to specific assessment practices. This dissertation revealed significant sources of CR assessment bias, and provided solutions to address these problems.
A Maximin Model for Test Design with Practical Constraints. Project Psychometric Aspects of Item Banking No. 25. Research Report 87-10.

ERIC Educational Resources Information Center

van der Linden, Wim J.; Boekkooi-Timminga, Ellen

A "maximin" model for item response theory based test design is proposed. In this model only the relative shape of the target test information function is specified. It serves as a constraint subject to which a linear programming algorithm maximizes the information in the test. In the practice of test construction there may be several…
Dimensionality of the Knee Numeric-Entity Evaluation Score (KNEES-ACL): a condition-specific questionnaire.

PubMed

Comins, J D; Krogsgaard, M R; Kreiner, S; Brodersen, J

2013-10-01

The benefit of anterior cruciate ligament (ACL) reconstruction has been questioned based on patient-reported outcome measures (PROMs). Valid interpretation of such results requires confirmation of the psychometric properties of the PROM. Rasch analysis is the gold standard for validation of PROMs, yet PROMs used for ACL reconstruction have not been validated using Rasch analysis. We used Rasch analysis to investigate the psychometric properties of the Knee Numeric-Entity Evaluation Score (KNEES-ACL), a newly developed PROM for patients treated for ACL deficiency. Two-hundred forty-two patients pre- and post-ACL reconstruction completed the pilot PROM. Rasch models were used to assess the psychometric properties (e.g., unidimensionality, local response dependency, and differential item functioning). Forty-one items distributed across seven unidimensional constructs measuring impairment, functional limitations, and psychosocial consequences were confirmed to fit Rasch models. Fourteen items were removed because of statistical lack of fit and inadequate face validity. Local response dependency and differential item functioning were identified and adjusted. The KNEES-ACL is the first Rasch-validated condition-specific PROM constructed for patients with ACL deficiency and patients with ACL reconstruction. Thus, this instrument can be used for within- and between-group comparisons. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Which kind of psychometrics is adequate for patient satisfaction questionnaires?

PubMed

Konerding, Uwe

2016-01-01

The construction and psychometric analysis of patient satisfaction questionnaires are discussed. The discussion is based upon the classification of multi-item questionnaires into scales or indices. Scales consist of items that describe the effects of the latent psychological variable to be measured, and indices consist of items that describe the causes of this variable. Whether patient satisfaction questionnaires should be constructed and analyzed as scales or as indices depends upon the purpose for which these questionnaires are required. If the final aim is improving care with regard to patients' preferences, then these questionnaires should be constructed and analyzed as indices. This implies two requirements: 1) items for patient satisfaction questionnaires should be selected in such a way that the universe of possible causes of patient satisfaction is covered optimally and 2) Cronbach's alpha, principal component analysis, exploratory factor analysis, confirmatory factor analysis, and analyses with models from item response theory, such as the Rasch Model, should not be applied for psychometric analyses. Instead, multivariate regression analyses with a direct rating of patient satisfaction as the dependent variable and the individual questionnaire items as independent variables should be performed. The coefficients produced by such an analysis can be applied for selecting the best items and for weighting the selected items when a sum score is determined. The lower boundaries of the validity of the unweighted and the weighted sum scores can be estimated by their correlations with the direct satisfaction rating. While the first requirement is fulfilled in the majority of the previous patient satisfaction questionnaires, the second one deviates from previous practice. Hence, if patient satisfaction is actually measured with the final aim of improving care with regard to patients' preferences, then future practice should be changed so that the second requirement is also fulfilled.
Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales in school children.

PubMed

Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra

2012-03-13

Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
Psychological distress in cancer survivors: the further development of an item bank.

PubMed

Smith, Adam B; Armes, Jo; Richardson, Alison; Stark, Dan P

2013-02-01

Assessment of psychological distress by patient report is necessary to meet patients' needs throughout the cancer journey. We have previously developed an item bank to assess psychological distress but not evaluated it for cancer survivors. Our first aim in this study was to test whether we could extend our item bank to include cancer survivors. The second aim was to examine whether the item bank could assess positive affect as a single construct alongside negative psychological symptoms. Responses from 1315 cancer survivors to the Hospital Anxiety and Depression Scale (HADS) and the Positive and Negative Affect Scale (PANAS) were considered for inclusion in a pre-existing item bank created from a heterogeneous sample of 4914 cancer patients. Differential item functioning (DIF) was used to assess whether HADS responses drawn from the two samples were equivalent. Common-item equating was used to anchor the shared (HADS) items, whilst the PANAS items were added. Item fit was evaluated at each stage, and misfitting items were removed. Unidimensionality was assessed with a principal components factor analysis. The DIF analysis did not reveal any differences between the HADS item locations from the two samples. Three misfitting PANAS items were removed, resulting in a final unidimensional bank of 80 items with good internal reliability (α = 0.85). The new item bank is valid for use across the cancer journey, including cancer survivors, and modestly improves the assessment of all levels of psychological distress and positive psychological function. Copyright © 2011 John Wiley & Sons, Ltd.
Predicting sugar-sweetened behaviours with theory of planned behaviour constructs: Outcome and process results from the SIPsmartER behavioural intervention

PubMed Central

Zoellner, Jamie M.; Porter, Kathleen J.; Chen, Yvonnes; Hedrick, Valisa E.; You, Wen; Hickman, Maja; Estabrooks, Paul A.

2017-01-01

Objective Guided by the theory of planned behaviour (TPB) and health literacy concepts, SIPsmartER is a six-month multicomponent intervention effective at improving SSB behaviours. Using SIPsmartER data, this study explores prediction of SSB behavioural intention (BI) and behaviour from TPB constructs using: (1) cross-sectional and prospective models and (2) 11 single-item assessments from interactive voice response (IVR) technology. Design Quasi-experimental design, including pre- and post-outcome data and repeated-measures process data of 155 intervention participants. Main Outcome Measures Validated multi-item TPB measures, single-item TPB measures, and self-reported SSB behaviours. Hypothesised relationships were investigated using correlation and multiple regression models. Results TPB constructs explained 32% of the variance cross sectionally and 20% prospectively in BI; and explained 13–20% of variance cross sectionally and 6% prospectively. Single-item scale models were significant, yet explained less variance. All IVR models predicting BI (average 21%, range 6–38%) and behaviour (average 30%, range 6–55%) were significant. Conclusion Findings are interpreted in the context of other cross-sectional, prospective and experimental TPB health and dietary studies. Findings advance experimental application of the TPB, including understanding constructs at outcome and process time points and applying theory in all intervention development, implementation and evaluation phases. PMID:28165771
Eligibility of Indoor Plumbing Under Alaska Sanitation Infrastructure Grant Program

EPA Pesticide Factsheets

Memorandum response to questions that relate to whether indoor plumbing of homes, as part of a wastewater construction project, is an eligible cost item under the EPA Alaska Sanitation Infrastructure Grant Program.
The feeding practices and structure questionnaire: construction and initial validation in a sample of Australian first-time mothers and their 2-year olds.

PubMed

Jansen, Elena; Mallan, Kimberley M; Nicholson, Jan M; Daniels, Lynne A

2014-06-04

Early feeding practices lay the foundation for children's eating habits and weight gain. Questionnaires are available to assess parental feeding but overlapping and inconsistent items, subscales and terminology limit conceptual clarity and between study comparisons. Our aim was to consolidate a range of existing items into a parsimonious and conceptually robust questionnaire for assessing feeding practices with very young children (<3 years). Data were from 462 mothers and children (age 21-27 months) from the NOURISH trial. Items from five questionnaires and two study-specific items were submitted to a priori item selection, allocation and verification, before theoretically-derived factors were tested using Confirmatory Factor Analysis. Construct validity of the new factors was examined by correlating these with child eating behaviours and weight. Following expert review 10 factors were specified. Of these, 9 factors (40 items) showed acceptable model fit and internal reliability (Cronbach's α: 0.61-0.89). Four factors reflected non-responsive feeding practices: 'Distrust in Appetite', 'Reward for Behaviour', 'Reward for Eating', and 'Persuasive Feeding'. Five factors reflected structure of the meal environment and limits: 'Structured Meal Setting', 'Structured Meal Timing', 'Family Meal Setting', 'Overt Restriction' and 'Covert Restriction'. Feeding practices generally showed the expected pattern of associations with child eating behaviours but none with weight. The Feeding Practices and Structure Questionnaire (FPSQ) provides a new reliable and valid measure of parental feeding practices, specifically maternal responsiveness to children's hunger/satiety signals facilitated by routine and structure in feeding. Further validation in more diverse samples is required.

Evaluation of the Irritable Bowel Syndrome Quality of Life (IBS-QOL) questionnaire in diarrheal-predominant irritable bowel syndrome patients

PubMed Central

2013-01-01

Background Diarrhea-predominant irritable bowel syndrome (IBS-d) significantly diminishes the health-related quality of life (HRQOL) of patients. Psychological and social impacts are common with many IBS-d patients reporting comorbid depression, anxiety, decreased intimacy, and lost working days. The Irritable Bowel Syndrome Quality of Life (IBS-QOL) questionnaire is a 34-item instrument developed and validated for measurement of HRQOL in non-subtyped IBS patients. The current paper assesses this previously-validated instrument employing data collected from 754 patients who participated in a randomized clinical trial of a novel treatment, eluxadoline, for IBS-d. Methods Psychometric methods common to HRQOL research were employed to evaluate the IBS-QOL. Many of the historical analyses of the IBS-QOL validations were used. Other techniques that extended the original methods were applied where more appropriate for the current dataset. In IBS-d patients, we analyzed the items and substructure of the IBS-QOL via item reduction, factor structure, internal consistency, reproducibility, construct validity, and ability to detect change. Results This study supports the IBS-QOL as a psychometrically valid measure. Factor analyses suggested that IBS-specific QOL as measured by the IBS-QOL is a unidimensional construct. Construct validity was further buttressed by significant correlations between IBS-QOL total scores and related measures of IBS-d severity including the historically-relevant Irritable Bowel Syndrome Adequate Relief (IBS-AR) item and the FDA’s Clinical Responder definition. The IBS-QOL also showed a significant ability to detect change as evidenced by analysis of treatment effects. A minority of the items, unrelated to the IBS-d, performed less well by the standards set by the original authors. Conclusions We established that the IBS-QOL total score is a psychometrically valid measure of HRQOL in IBS-d patients enrolled in this study. Our analyses suggest that the IBS-QOL items demonstrate very good construct validity and ability to detect changes due to treatment effects. Furthermore, our analyses suggest that the IBS-QOL items measure a univariate construct and we believe further modeling of the IBS-QOL from an item response theory (IRT) approach under both non-treatment and treatment conditions would greatly further our understanding as item-based methods could be used to develop a short form. PMID:24330412
Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

PubMed

Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

2014-05-01

The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
Validation of a condition-specific measure for women having an abnormal screening mammography.

PubMed

Brodersen, John; Thorsen, Hanne; Kreiner, Svend

2007-01-01

The aim of this study is to assess the validity of a new condition-specific instrument measuring psychosocial consequences of abnormal screening mammography (PCQ-DK33). The draft version of the PCQ-DK33 was completed on two occasions by 184 women who had received an abnormal screening mammography and on one occasion by 240 women who had received a normal screening result. Item Response Theories and Classical Test Theories were used to analyze data. Construct validity, concurrent validity, known group validity, objectivity and reliability were established by item analysis examining the fit between item responses and Rasch models. Six dimensions covering anxiety, behavioral impact, sense of dejection, impact on sleep, breast examination, and sexuality were identified. One item belonging to the dejection dimension had uniform differential item functioning. Two items not fitting the Rasch models were retained because of high face validity. A sick leave item added useful information when measuring side effects and socioeconomic consequences of breast cancer screening. Five "poor items" were identified and should be deleted from the final instrument. Preliminary evidence for a valid and reliable condition-specific measure for women having an abnormal screening mammography was established. The measure includes 27 "good" items measuring different attributes of the same overall latent structure-the psychosocial consequences of abnormal screening mammography.
Person Heterogeneity of the BDI-II-C and Its Effects on Dimensionality and Construct Validity: Using Mixture Item Response Models

ERIC Educational Resources Information Center

Wu, Pei-Chen; Huang, Tsai-Wei

2010-01-01

This study was to apply the mixed Rasch model to investigate person heterogeneity of Beck Depression Inventory-II-Chinese version (BDI-II-C) and its effects on dimensionality and construct validity. Person heterogeneity was reflected by two latent classes that differ qualitatively. Additionally, person heterogeneity adversely affected the…
Motivations for Older Adults' Participation in Distance Education: A Study at the National Open University of Taiwan

ERIC Educational Resources Information Center

Mulenga, Derek; Liang, Jr-Shiuan

2008-01-01

This study investigated the factor structure of motivational constructs as expressed by older adult learners and examined how these constructs correlated with selected socio-demographic characteristics at the National Open University of Taiwan (NOUT). Results were based on the responses of 371 elders to the 32-item Reasons for Participation Scale…
Investigating Cognitive Effort and Response Quality of Question Formats in Web Surveys Using Paradata

ERIC Educational Resources Information Center

Höhne, Jan Karem; Schlosser, Stephan; Krebs, Dagmar

2017-01-01

Measuring attitudes and opinions employing agree/disagree (A/D) questions is a common method in social research because it appears to be possible to measure different constructs with identical response scales. However, theoretical considerations suggest that A/D questions require a considerable cognitive processing. Item-specific (IS) questions,…
The Social Physique Anxiety Scale: an example of the potential consequence of negatively worded items in factorial validity studies.

PubMed

Motl, R W; Conroy, D E; Horan, P M

2000-01-01

Social physique anxiety (SPA) based on Hart, Leary, and Rejeski's (1989) Social Physique Anxiety Scale (SPAS) was originally conceptualized to be a unidimensional construct. Empirical evidence on the factorial validity of the SPAS has been contradictory, yielding both one- and two-factor models. The two-factor model, which consists of separate factors associated with positively and negatively worded items, has stimulated an ongoing debate about the dimensionality and content of the SPAS. The present study employed confirmatory factor analysis (CFA) to examine whether the two-factor solution to the 12-item SPAS was substantively meaningful or a methodological artifact. Results of the CFAs, which were performed on responses from four different samples (Eklund, Kelley, and Wilson, 1997; Eklund, Mack, and Hart, 1996), supported the existence of a single substantive SPA factor underlying responses to the 12-item SPAS. There were, in addition, method effects associated with the negatively worded items that could be modeled to achieve good fit. Therefore, it was concluded that a single substantive factor and a non-substantive method effect primarily related to the negatively worded items best represented the 12-item SPAS.
The prioritization of symptom beliefs over illness beliefs: The development and validation of the Pain Perception Questionnaire for Young People.

PubMed

Ghio, Daniela; Thomson, Wendy; Calam, Rachel; Ulph, Fiona; Baildam, Eileen M; Hyrich, Kimme; Cordingley, Lis

2018-02-01

To investigate the suitability of the revised Illness Perception Questionnaire (IPQ-R) for use with adolescents with a long-term pain condition and to validate a new questionnaire for use with this age group. A three-phase mixed-methods study. Phase 1 comprised in-depth qualitative analyses of audio-recorded cognitive interviews with 20 adolescents with juvenile idiopathic arthritis who were answering IPQ-R items. Transcripts were coded using framework analysis. A content analysis of their intended responses to individual items was also conducted. In Phase 2, a new questionnaire was developed and its linguistic and face validity were assessed with 18 adolescents without long-term conditions. In Phase 3, the construct validity of the new questionnaire was assessed with 240 adolescents with juvenile idiopathic arthritis. A subset of 43 adolescents completed the questionnaire a second time to assess test-retest reliability. All participants were aged 11-16 years. Participants described both conceptual and response format difficulties when answering IPQ-R items. In response, the Pain Perception Questionnaire for Young People (PPQ-YP) was designed which incorporated significant modifications to both wording and response formats when compared with the IPQ-R. A principal component analysis of the PPQ-YP identified ten constructs in the new questionnaire. Emotional representations were separated into two constructs, responsive and anticipatory emotions. The PPQ-YP showed high test-retest reliability. Symptom beliefs appear to be more salient to adolescents with a long-term pain condition than beliefs about the illness as a whole. A new questionnaire to assess pain beliefs of adolescents was designed. Further validation work may be needed to assess its suitability for use with other pain conditions. Statement of contribution What is already known on this subject? Versions of the adult Revised Illness Perception Questionnaire (IPQ-R) have been adapted for adolescents and children by changing item wording; however, research to assess the degree to which the underlying IPQ-R constructs are relevant to adolescents with a long-term condition had not been performed. What the present study adds? In adolescents, beliefs about symptoms of their condition are more salient than beliefs about the illness as a whole. Question response formats for children and young people need to take account of age-specific abilities. A new questionnaire has been designed for adolescents with pain. It is theoretically congruent with the CS-SRM. © 2017 The Authors. British Journal of Health Psychology published by John Wiley & Sons Ltd on behalf of British Psychological Society.
Tier One Performance Screen Initial Operational Test and Evaluation: 2012 Interim Report

DTIC Science & Technology

2013-12-01

are known to predict outcomes in work settings. Because the TAPAS uses item response theory (IRT) methods to construct and score items, it can be...Qualification Test (AFQT), to select new Soldiers. Although the AFQT is useful for selecting new Soldiers, other personal attributes are important to...to be and will continue to serve as a useful metric for selecting new Soldiers, other personal attributes, in particular non-cognitive attributes
Does Linking Mixed-Format Tests Using a Multiple-Choice Anchor Produce Comparable Results for Male and Female Subgroups? Research Report. ETS RR-11-44

ERIC Educational Resources Information Center

Kim, Sooyeon; Walker, Michael E.

2011-01-01

This study examines the use of subpopulation invariance indices to evaluate the appropriateness of using a multiple-choice (MC) item anchor in mixed-format tests, which include both MC and constructed-response (CR) items. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using an MC-only anchor set for 4…
Psychometric properties of a revised version of the Assisting Hand Assessment (Kids-AHA 5.0).

PubMed

Holmefur, Marie M; Krumlinde-Sundholm, Lena

2016-06-01

The aim of this study was to scrutinize the Assisting Hand Assessment (AHA) version 4.4 for possible improvements and to evaluate the psychometric properties regarding internal scale validity and aspects of reliability of a revised version of the AHA. In collaboration with experts, scoring criteria were changed for four items, and one fully new item was constructed. Twenty-two original, one new, and four revised items were scored for 164 assessments of children with unilateral cerebral palsy aged 18 months to 12 years. Rasch measurement analysis was used to evaluate internal scale validity by exploring rating-scale functioning, item and person goodness-of-fit, and principal component analysis. Targeting and scale reliability were also evaluated. After removal of misfitting items, a 20-item scale showed satisfactory goodness-of-fit. Unidimensionality was confirmed by principal component analysis. The rating scale functioned well for the 20 items, and the item difficulty was well suited to the ability level of the sample. The person reliability coefficient was 0.98, indicating high separation ability of the scale. A conversion table of AHA scores between the previous version (4.4) and the new version (5.0) was constructed. The new, 20-item version of the Kids-AHA (version 5.0), demonstrated excellent internal scale validity, suggesting improved responsiveness to changes and shortened scoring time. For comparison of scores from version 4.4 to 5.0, a transformation table is presented. © 2015 Mac Keith Press.
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.

PubMed

Gibbons, C J; Skevington, S M

2018-04-01

Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

PubMed Central

Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

PubMed

Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
Construct Validation of the Self-Efficacy Teaching and Knowledge Instrument for Science Teachers-Revised (SETAKIST-R): Lessons Learned

NASA Astrophysics Data System (ADS)

Pruski, Linda A.; Blanco, Sharon L.; Riggs, Rosemary A.; Grimes, Kandi K.; Fordtran, Chase W.; Barbola, Gina M.; Cornell, John E.; Lichtenstein, Michael J.

2013-11-01

Described herein is the academic lineage and independent validation of the Self-Efficacy Teaching and Knowledge Instrument for Science Teachers-Revised (SETAKIST-R). Data from 334 K-12 science teachers were analyzed using Partial Credit Rasch models. Principal components analysis on the person-item residuals suggest two latent dimensions: Knowledge and Teaching Self-Efficacies. Item-fit statistics were used to select items for each subscale. Person and item separation (reliability) indices were quite low, and we noted disordered response patterns on the person-item maps that revealed problems with item content and/or scaling for both subscales. These issues include the presence of: verbal negatives, ambiguous modifiers, counter-intuitive scaling, and an "undecided/uncertain" option. The SETAKIST-R, in its current form, cannot be recommended as a measure of science teacher self-efficacy.
Development and validation of the coping with terror scale.

PubMed

Stein, Nathan R; Schorr, Yonit; Litz, Brett T; King, Lynda A; King, Daniel W; Solomon, Zahava; Horesh, Danny

2013-10-01

Terrorism creates lingering anxiety about future attacks. In prior terror research, the conceptualization and measurement of coping behaviors were constrained by the use of existing coping scales that index reactions to daily hassles and demands. The authors created and validated the Coping with Terror Scale to fill the measurement gap. The authors emphasized content validity, leveraging the knowledge of terror experts and groups of Israelis. A multistep approach involved construct definition and item generation, trimming and refining the measure, exploring the factor structure underlying item responses, and garnering evidence for reliability and validity. The final scale comprised six factors that were generally consistent with the authors' original construct specifications. Scores on items linked to these factors demonstrate good reliability and validity. Future studies using the Coping with Terror Scale with other populations facing terrorist threats are needed to test its ability to predict resilience, functional impairment, and psychological distress.
Development and validation of a socioculturally competent trust in physician scale for a developing country setting.

PubMed

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-05-03

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. To develop and validate a new trust in physician scale for a developing country setting. Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. The final 12 item trust in physician scale has a good construct validity and internal consistency. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Development and validation of a socioculturally competent trust in physician scale for a developing country setting

PubMed Central

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-01-01

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. Objectives To develop and validate a new trust in physician scale for a developing country setting. Methods Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Results Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. Conclusions The final 12 item trust in physician scale has a good construct validity and internal consistency. PMID:25941182
Exploring the impact of disability on self-determination measurement.

PubMed

Mumbardó-Adam, Cristina; Guàrdia-Olmos, Joan; Giné, Climent

2018-07-01

Self-determination is a psychological construct that applies to both the general population and to individuals with disabilities that can be self-determined with adequate accommodations and opportunities. As the relevance of self-determination-related skills in life has been recently acknowledged, researchers have created a measure to assess self-determination in adolescents and young adults with and without disabilities. The Self-Determination Inventory: Student Report (Spanish interim version) is empirically being validated into Spanish. As this scale is the first assessment addressed to all youth, further exploration of its psychometric properties is required to ensure the reliability of the self-determination measurement and gain further insight into the construct when applied to youth with and without disabilities. More than 600 participants were asked to complete the scale. The impact of disability on the item response distributions across the dimensions of self-determination was explored. Differential item functioning (DIF) was found in only 5 of the scale's 45 items. Differences primary favored youth without disabilities. The weak presence of DIF across the items supports the instrument's psychometrical robustness when measuring self-determination in youth with and without disabilities and provides further understanding of the self-determination construct. Implications and future research directions are also discussed. Copyright © 2018 Elsevier Ltd. All rights reserved.
Development of an item bank for food parenting practices based on published instruments and reports from Canadian and US parents.

PubMed

O'Connor, Teresia M; Pham, Truc; Watts, Allison W; Tu, Andrew W; Hughes, Sheryl O; Beauchamp, Mark R; Baranowski, Tom; Mâsse, Louise C

2016-08-01

Research to understand how parents influence their children's dietary intake and eating behaviors has expanded in the past decades and a growing number of instruments are available to assess food parenting practices. Unfortunately, there is no consensus on how constructs should be defined or operationalized, making comparison of results across studies difficult. The aim of this study was to develop a food parenting practice item bank with items from published scales and supplement with parenting practices that parents report using. Items from published scales were identified from two published systematic reviews along with an additional systematic review conducted for this study. Parents (n = 135) with children 5-12 years old from the US and Canada, stratified to represent the demographic distribution of each country, were recruited to participate in an online semi-qualitative survey on food parenting. Published items and parent responses were coded using the same framework to reduce the number of items into representative concepts using a binning and winnowing process. The literature contributed 1392 items and parents contributed 1985 items, which were reduced to 262 different food parenting concepts (26% exclusive from literature, 12% exclusive from parents, and 62% represented in both). Food parenting practices related to 'Structure of Food Environment' and 'Behavioral and Educational' were emphasized more by parent responses, while practices related to 'Consistency of Feeding Environment' and 'Emotional Regulation' were more represented among published items. The resulting food parenting item bank should next be calibrated with item response modeling for scientists to use in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.

Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.

PubMed

Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J

2018-02-01

Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.
Psychometric properties of the PROMIS Physical Function item bank in patients receiving physical therapy.

PubMed

Crins, Martine H P; van der Wees, Philip J; Klausch, Thomas; van Dulmen, Simone A; Roorda, Leo D; Terwee, Caroline B

2018-01-01

The Patient-Reported Outcomes Measurement Information System (PROMIS) is a universally applicable set of instruments, including item banks, short forms and computer adaptive tests (CATs), measuring patient-reported health across different patient populations. PROMIS CATs are highly efficient and the use in practice is considered feasible with little administration time, offering standardized and routine patient monitoring. Before an item bank can be used as CAT, the psychometric properties of the item bank have to be examined. Therefore, the objective was to assess the psychometric properties of the Dutch-Flemish PROMIS Physical Function item bank (DF-PROMIS-PF) in Dutch patients receiving physical therapy. Cross-sectional study. 805 patients >18 years, who received any kind of physical therapy in primary care in the past year, completed the full DF-PROMIS-PF (121 items). Unidimensionality was examined by Confirmatory Factor Analysis and local dependence and monotonicity were evaluated. A Graded Response Model was fitted. Construct validity was examined with correlations between DF-PROMIS-PF T-scores and scores on two legacy instruments (SF-36 Health Survey Physical Functioning scale [SF36-PF10] and the Health Assessment Questionnaire Disability-Index [HAQ-DI]). Reliability (standard errors of theta) was assessed. The results for unidimensionality were mixed (scaled CFI = 0.924, TLI = 0.923, RMSEA = 0.045, 1th factor explained 61.5% of variance). Some local dependence was found (8.2% of item pairs). The item bank showed a broad coverage of the physical function construct (threshold-parameters range: -4.28-2.33) and good construct validity (correlation with SF36-PF10 = 0.84 and HAQ-DI = -0.85). Furthermore, the DF-PROMIS-PF showed greater reliability over a broader score-range than the SF36-PF10 and HAQ-DI. The psychometric properties of the DF-PROMIS-PF item bank are sufficient. The DF-PROMIS-PF can now be used as short forms or CAT to measure the level of physical function of physiotherapy patients.
Construction of an efficient evaluative instrument for myasthenia gravis: the MG composite.

PubMed

Burns, Ted M; Conaway, Mark R; Cutter, Gary R; Sanders, Donald B

2008-12-01

We assessed the performance of items from the Quantitative Myasthenia Gravis (QMG), MMT (Manual Muscle Test), and MG-ADL (Myasthenia Gravis - Activities of Daily Living) scales, using data from two recently completed treatment trials of generalized MG. Items were selected that were relevant to manifestations of MG, meaningful to both the physician and the patient, and responsive to clinical change. After the 10 items were chosen, they were weighted based on input from MG experts from around the world, considering factors such as quality of life, disease severity, risk, prognosis, validity, and reliability. The MG Composite is easy to administer, takes less than 5 minutes to complete, and requires no equipment. Weighting of the response options of the 10 items should result in ordinal scores that better represent MG status and are more responsive to meaningful clinical change. To better determine its suitability for clinical use and for treatment trials, the MG Composite will be tested prospectively at several academic medical centers and will be used as a secondary measure of efficacy in pending clinical trials of MG.
Development of the Oxford Participation and Activities Questionnaire: constructing an item pool

PubMed Central

Kelly, Laura; Jenkinson, Crispin; Dummett, Sarah; Dawson, Jill; Fitzpatrick, Ray; Morley, David

2015-01-01

Purpose The Oxford Participation and Activities Questionnaire is a patient-reported outcome measure in development that is grounded on the World Health Organization International Classification of Functioning, Disability, and Health (ICF). The study reported here aimed to inform and generate an item pool for the new measure, which is specifically designed for the assessment of participation and activity in patients experiencing a range of health conditions. Methods Items were informed through in-depth interviews conducted with 37 participants spanning a range of conditions. Interviews aimed to identify how their condition impacted their ability to participate in meaningful activities. Conditions included arthritis, cancer, chronic back pain, diabetes, motor neuron disease, multiple sclerosis, Parkinson’s disease, and spinal cord injury. Transcripts were analyzed using the framework method. Statements relating to ICF themes were recast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n=13) were used to assess items for face and content validity. Results ICF themes relevant to activities and participation in everyday life were explored, and a total of 222 items formed the initial item pool. This item pool was refined by the research team and 28 generic items were mapped onto all nine chapters of the ICF construct, detailing activity and participation. Cognitive interviewing confirmed the questionnaire instructions, items, and response options were acceptable to participants. Conclusion Using a clear conceptual basis to inform item generation, 28 items have been identified as suitable to undergo further psychometric testing. A large-scale postal survey will follow in order to refine the instrument further and to assess its psychometric properties. The final instrument is intended for use in clinical trials and interventions targeted at maintaining or improving activity and participation. PMID:26056503
Development of the Oxford Participation and Activities Questionnaire: constructing an item pool.

PubMed

Kelly, Laura; Jenkinson, Crispin; Dummett, Sarah; Dawson, Jill; Fitzpatrick, Ray; Morley, David

2015-01-01

The Oxford Participation and Activities Questionnaire is a patient-reported outcome measure in development that is grounded on the World Health Organization International Classification of Functioning, Disability, and Health (ICF). The study reported here aimed to inform and generate an item pool for the new measure, which is specifically designed for the assessment of participation and activity in patients experiencing a range of health conditions. Items were informed through in-depth interviews conducted with 37 participants spanning a range of conditions. Interviews aimed to identify how their condition impacted their ability to participate in meaningful activities. Conditions included arthritis, cancer, chronic back pain, diabetes, motor neuron disease, multiple sclerosis, Parkinson's disease, and spinal cord injury. Transcripts were analyzed using the framework method. Statements relating to ICF themes were recast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n=13) were used to assess items for face and content validity. ICF themes relevant to activities and participation in everyday life were explored, and a total of 222 items formed the initial item pool. This item pool was refined by the research team and 28 generic items were mapped onto all nine chapters of the ICF construct, detailing activity and participation. Cognitive interviewing confirmed the questionnaire instructions, items, and response options were acceptable to participants. Using a clear conceptual basis to inform item generation, 28 items have been identified as suitable to undergo further psychometric testing. A large-scale postal survey will follow in order to refine the instrument further and to assess its psychometric properties. The final instrument is intended for use in clinical trials and interventions targeted at maintaining or improving activity and participation.
An Evaluation of the Kernel Equating Method: A Special Study with Pseudotests Constructed from Real Test Data. Research Report. ETS RR-06-02

ERIC Educational Resources Information Center

von Davier, Alina A.; Holland, Paul W.; Livingston, Samuel A.; Casabianca, Jodi; Grant, Mary C.; Martin, Kathleen

2006-01-01

This study examines how closely the kernel equating (KE) method (von Davier, Holland, & Thayer, 2004a) approximates the results of other observed-score equating methods--equipercentile and linear equatings. The study used pseudotests constructed of item responses from a real test to simulate three equating designs: an equivalent groups (EG)…
Item-level psychometrics and predictors of performance for Spanish/English bilingual speakers on an object and action naming battery.

PubMed

Edmonds, Lisa A; Donovan, Neila J

2012-04-01

There is a pressing need for psychometrically sound naming materials for Spanish/English bilingual adults. To address this need, in this study the authors examined the psychometric properties of An Object and Action Naming Battery (An O&A Battery; Druks & Masterson, 2000) in bilingual speakers. Ninety-one Spanish/English bilinguals named O&A Battery items in English and Spanish. Responses underwent a Rasch analysis. Using correlation and regression analyses, the authors evaluated the effect of psycholinguistic (e.g., imageability) and participant (e.g., proficiency ratings) variables on accuracy. Rasch analysis determined unidimensionality across English and Spanish nouns and verbs and robust item-level psychometric properties, evidence for content validity. Few items did not fit the model, there were no ceiling or floor effects after uninformative and misfit items were removed, and items reflected a range of difficulty. Reliability coefficients were high, and the number of statistically different ability levels provided indices of sensitivity. Regression analyses revealed significant correlations between psycholinguistic variables and accuracy, providing preliminary construct validity. The participant variables that contributed most to accuracy were proficiency ratings and time of language use. Results suggest adequate content and construct validity of O&A items retained in the analysis for Spanish/English bilingual adults and support future efforts to evaluate naming in older bilinguals and persons with bilingual aphasia.
Development of a scale for attitude toward condom use for migrant workers in India.

PubMed

Talukdar, Arunansu; Bal, Runa; Sanyal, Debasis; Roy, Krishnendu; Talukdar, Payel Sengupta

2008-02-01

The propaganda for the use of condoms remains one of the mainstay for prevention of human immunodeficiency virus (HIV) transmission. In spite of the proven efficacy of condom, some moral, social and psychological obstacles are still prevalent, hindering the use of condoms. The study tried to construct a short condom-attitude scale for use among the migrant workers, a major bridge population in India. The study was conducted among the male migrant workers who were 18-49 years old, sexually active and had heard about condoms and were engaged in nonformal jobs. We recruited 234 and 280 candidates for Phase 1 and Phase 2 respectively. Ten items from the original 40-item Brown's ATC (attitude towards condom) scale were selected in Phase 1. After analysis of Phase 1 results, using principal component analysis six items were found appropriate for measuring attitude towards condom use. These six items were then administered in another group in Phase 2. Utilizing Pearson's correlations, scale items were examined in terms of their mean response scores and the correlation matrix between items. Cornbach's alpha and construct validity were also assessed for the entire sample. Study subjects were categorized as condom users and nonusers. The scale structure was explored by analyzing response scores with respect to the items, using principal component analysis followed by varimax rotation analysis. Principal component analysis revealed that the first factor accounted for 71% of the variance, with eigenvalue greater than one. Eigenvalues of the second factor was less than one. Application of screen test suggests only one factor was dominant. Mean score of six items among condom users was 20.45 and that among nonusers was 16.67, which was statistically significant (P<0.01). Cornbach's alpha coefficient was 0.92. This tailor-made attitude-toward-condom-use scale, targeted for most vulnerable people in India, can be included in any rapid survey for assessing the existing beliefs and attitudes toward condoms and also for evaluating efficacy of an intervention program.
Scale Refinement and Initial Evaluation of a Behavioral Health Function Measurement Tool for Work Disability Evaluation

PubMed Central

Marfeo, Elizabeth E.; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Rasch, Elizabeth K.; Brandt, Diane E.; Jette, Alan M.

2014-01-01

Objectives To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Design Cross-sectional survey followed by item response theory (IRT) calibration data simulations Setting Community Participants A sample of individuals applying for SSA disability benefits, claimants (N=1015), and a normative comparative sample of US adults (N=1000) Interventions None. Main Outcome Measure Social Security Administration Behavioral Health Function (SSA-BH) measurement instrument Results Item response theory analyses supported the unidimensionality of four SSA-BH scales: Mood and Emotions (35 items), Self-Efficacy (23 items), Social Interactions (6 items), and Behavioral Control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10- item CATs with the full item bank indicated robust ability of the CAT approach to comprehensively characterize behavioral health function along four distinct dimensions. Conclusions Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all four scales. Behavioral function profiles of SSA claimants were generated and compared to age and sex matched norms along four scales: Mood and Emotions, Behavioral Control, Social Interactions, and Self-Efficacy. Utilizing the CAT based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the SSA’s work disability programs. PMID:23542404
Research applications for an Object and Action Naming Battery to assess naming skills in adult Spanish-English bilingual speakers.

PubMed

Edmonds, Lisa A; Donovan, Neila J

2014-06-01

Virtually no valid materials are available to evaluate confrontation naming in Spanish-English bilingual adults in the U.S. In a recent study, a large group of young Spanish-English bilingual adults were evaluated on An Object and Action Naming Battery (Edmonds & Donovan in Journal of Speech, Language, and Hearing Research 55:359-381, 2012). Rasch analyses of the responses resulted in evidence for the content and construct validity of the retained items. However, the scope of that study did not allow for extensive examination of individual item characteristics, group analyses of participants, or the provision of testing and scoring materials or raw data, thereby limiting the ability of researchers to administer the test to Spanish-English bilinguals and to score the items with confidence. In this study, we present the in-depth information described above on the basis of further analyses, including (1) online searchable spreadsheets with extensive empirical (e.g., accuracy and name agreeability) and psycholinguistic item statistics; (2) answer sheets and instructions for scoring and interpreting the responses to the Rasch items; (3) tables of alternative correct responses for English and Spanish; (4) ability strata determined for all naming conditions (English and Spanish nouns and verbs); and (5) comparisons of accuracy across proficiency groups (i.e., Spanish dominant, English dominant, and balanced). These data indicate that the Rasch items from An Object and Action Naming Battery are valid and sensitive for the evaluation of naming in young Spanish-English bilingual adults. Additional information based on participant responses for all of the items on the battery can provide researchers with valuable information to aid in stimulus development and response interpretation for experimental studies in this population.
Measurement versus prediction in the construction of patient-reported outcome questionnaires: can we have our cake and eat it?

PubMed

Smits, Niels; van der Ark, L Andries; Conijn, Judith M

2017-11-02

Two important goals when using questionnaires are (a) measurement: the questionnaire is constructed to assign numerical values that accurately represent the test taker's attribute, and (b) prediction: the questionnaire is constructed to give an accurate forecast of an external criterion. Construction methods aimed at measurement prescribe that items should be reliable. In practice, this leads to questionnaires with high inter-item correlations. By contrast, construction methods aimed at prediction typically prescribe that items have a high correlation with the criterion and low inter-item correlations. The latter approach has often been said to produce a paradox concerning the relation between reliability and validity [1-3], because it is often assumed that good measurement is a prerequisite of good prediction. To answer four questions: (1) Why are measurement-based methods suboptimal for questionnaires that are used for prediction? (2) How should one construct a questionnaire that is used for prediction? (3) Do questionnaire-construction methods that optimize measurement and prediction lead to the selection of different items in the questionnaire? (4) Is it possible to construct a questionnaire that can be used for both measurement and prediction? An empirical data set consisting of scores of 242 respondents on questionnaire items measuring mental health is used to select items by means of two methods: a method that optimizes the predictive value of the scale (i.e., forecast a clinical diagnosis), and a method that optimizes the reliability of the scale. We show that for the two scales different sets of items are selected and that a scale constructed to meet the one goal does not show optimal performance with reference to the other goal. The answers are as follows: (1) Because measurement-based methods tend to maximize inter-item correlations by which predictive validity reduces. (2) Through selecting items that correlate highly with the criterion and lowly with the remaining items. (3) Yes, these methods may lead to different item selections. (4) For a single questionnaire: Yes, but it is problematic because reliability cannot be estimated accurately. For a test battery: Yes, but it is very costly. Implications for the construction of patient-reported outcome questionnaires are discussed.
The Generic Short Patient Experiences Questionnaire (GS-PEQ): identification of core items from a survey in Norway

PubMed Central

2011-01-01

Background Questionnaires are commonly used to collect patient, or user, experiences with health care encounters; however, their adaption to specific target groups limits comparison between groups. We present the construction of a generic questionnaire (maximum of ten questions) for user evaluation across a range of health care services. Methods Based on previous testing of six group-specific questionnaires, we first constructed a generic questionnaire with 23 items related to user experiences. All questions included a "not applicable" response option, as well as a follow-up question about the item's importance. Nine user groups from one health trust were surveyed. Seven groups received questionnaires by mail and two by personal distribution. Selection of core questions was based on three criteria: applicability (proportion "not applicable"), importance (mean scores on follow-up questions), and comprehensiveness (content coverage, maximum two items per dimension). Results 1324 questionnaires were returned providing subsample sizes ranging from 52 to 323. Ten questions were excluded because the proportion of "not applicable" responses exceeded 20% in at least one user group. The number of remaining items was reduced to ten by applying the two other criteria. The final short questionnaire included items on outcome (2), clinician services (2), user involvement (2), incorrect treatment (1), information (1), organisation (1), and accessibility (1). Conclusion The Generic Short Patient Experiences Questionnaire (GS-PEQ) is a short, generic set of questions on user experiences with specialist health care that covers important topics for a range of groups. It can be used alone or with other instruments in quality assessment or in research. The psychometric properties and the relevance of the GS-PEQ in other health care settings and countries need further evaluation. PMID:21510871
The feeding practices and structure questionnaire: construction and initial validation in a sample of Australian first-time mothers and their 2-year olds

PubMed Central

2014-01-01

Background Early feeding practices lay the foundation for children’s eating habits and weight gain. Questionnaires are available to assess parental feeding but overlapping and inconsistent items, subscales and terminology limit conceptual clarity and between study comparisons. Our aim was to consolidate a range of existing items into a parsimonious and conceptually robust questionnaire for assessing feeding practices with very young children (<3 years). Methods Data were from 462 mothers and children (age 21–27 months) from the NOURISH trial. Items from five questionnaires and two study-specific items were submitted to a priori item selection, allocation and verification, before theoretically-derived factors were tested using Confirmatory Factor Analysis. Construct validity of the new factors was examined by correlating these with child eating behaviours and weight. Results Following expert review 10 factors were specified. Of these, 9 factors (40 items) showed acceptable model fit and internal reliability (Cronbach’s α: 0.61-0.89). Four factors reflected non-responsive feeding practices: ‘Distrust in Appetite’, ‘Reward for Behaviour’, ‘Reward for Eating’, and ‘Persuasive Feeding’. Five factors reflected structure of the meal environment and limits: ‘Structured Meal Setting’, ‘Structured Meal Timing’, ‘Family Meal Setting’, ‘Overt Restriction’ and ‘Covert Restriction’. Feeding practices generally showed the expected pattern of associations with child eating behaviours but none with weight. Conclusion The Feeding Practices and Structure Questionnaire (FPSQ) provides a new reliable and valid measure of parental feeding practices, specifically maternal responsiveness to children’s hunger/satiety signals facilitated by routine and structure in feeding. Further validation in more diverse samples is required. PMID:24898364
Development of a Symptom-Focused Patient-Reported Outcome Measure for Functional Dyspepsia: The Functional Dyspepsia Symptom Diary (FDSD)

PubMed Central

Taylor, Fiona; Higgins, Sophie; Carson, Robyn T; Eremenco, Sonya; Foley, Catherine; Lacy, Brian E; Parkman, Henry P; Reasner, David S; Shields, Alan L; Tack, Jan; Talley, Nicholas J

2018-01-01

Objectives: The Functional Dyspepsia Symptom Diary (FDSD) was developed to address the lack of symptom-focused, patient-reported outcome (PRO) measures designed for use in functional dyspepsia (FD) patients and meeting Food and Drug Administration recommendations for PRO instrument development. Methods: Concept elicitation interviews were conducted with FD participants to identify symptoms important and relevant to FD patients. A preliminary version of the FDSD was constructed, then completed by FD participants on an electronic device in cognitive interviews to evaluate the readability, comprehensibility, relevance, and comprehensiveness of the FDSD, and to preliminarily evaluate its measurement properties. Results: During concept elicitation interviews, 45 participants spontaneously reported 19 symptom concepts. Of those, seven symptoms were selected for assessment by the eight-item FDSD. Cognitive interviews with 57 participants confirmed that participants were able to comprehend and provide meaningful responses to the FDSD, and that the handheld electronic FDSD format was suitable for use in the target population. Scores of the FDSD were well-distributed among response options, item discrimination indices suggested that the FDSD items differentiate among patients with varying degrees of FD severity, and inter-item correlations suggested that no items of the FDSD were capturing redundant information. Internal consistency estimates (0.87) and construct-related validity estimates using known-groups methods were within acceptable ranges. Conclusions: The FDSD is a content-valid PRO measure, with preliminary psychometric evidence providing support for the FDSD’s items and total score. Further psychometric evaluations are recommended to more fully test the FDSD’s score performance and other measurement properties in the target patient population. PMID:28925989
The Nursing Home Physical Performance Test: A Secondary Data Analysis of Women in Long-Term Care Using Item Response Theory.

PubMed

Perera, Subashan; Nace, David A; Resnick, Neil M; Greenspan, Susan L

2017-04-11

The Nursing Home Physical Performance Test (NHPPT) was developed to measure function among nursing home residents using sit-to-stand, scooping applesauce, face washing, dialing phone, putting on sweater, and ambulating tasks. Using item response theory, we explore its measurement characteristics at item level and opportunities for improvements. We used data from long-term care women. We fitted a graded response model, estimated parameters, and constructed probability and information curves. We identified items to be targeted toward lower and higher functioning persons to increase the range of abilities to which the instrument is applicable. We revised the scoring by making sit-to-stand and sweater items harder and dialing phone easier. We examined changes to concurrent validity with activities of daily living (ADL), frailty, and cognitive function. Participants were 86 years old, had more than three comorbidities, and a NHPPT of 19.4. All items had high discrimination and were targeted toward the lower middle range of performance continuum. After revision, sit-to-stand and sweater items demonstrated greater discrimination among the higher functioning and/or greater spread of thresholds for response categories. The overall test showed discrimination over a wider range of individuals. Concurrent validity correlation improved from 0.60 to 0.68 for instrumental ADL and explained variability (R2) from 22% to 36% for frailty. NHPPT has good measurement characteristics at the item level. NHPPT can be improved, implemented in computerized adaptive testing, and combined with self-report for greater utility, but a definitive study is needed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Reliability and Validity of the Alcohol Short Index of Problems and a Newly Constructed Drug Short Index of Problems*

PubMed Central

Alterman, Arthur I.; Cacciola, John S.; Ivey, Megan A.; Lynch, Kevin G.

2009-01-01

Objective: This study evaluated the psychometric properties of the 15-item alcohol Short Index of Problems (SIP) instrument and those of a newly constructed 15-item drug Short Index of Problems (SIP-D) instrument in 277 newly entered substance-abuse patients. Method: The SIP is derived from the longer, 50-item Drinker Inventory of Consequences (DrInC), which was designed to assess adverse consequences of alcohol use. The SIP-D was constructed by substituting the term “drug use” for the term “drinking” in each SIP item. A 3-month recall interval was employed. Results: Factor analyses of each of the instruments revealed similar solutions, with only one main factor accounting for the majority of variance. Nonparametric item response theory methods produced the same finding. Internal consistency reliability estimates for the SIP and SIP-D total scores were .98 and .97, respectively. Concurrent validity was demonstrated by examining the correlations of the total scores for each of the instruments with the recent summary indexes of the newly revised Addiction Severity Index (ASI-Version 6): alcohol, drug, medical, economic, legal, family/social, and psychiatric problems. Conclusions: This study is the first to confirm the psychometric validity of the SIP when used as an independent instrument unembedded within the DrInC. The study also supports the use of the SIP-D as a brief measure of adverse consequences of drug use. The findings strongly support the unidimensional structure of both measures. PMID:19261243
Reliability and validity of the alcohol short index of problems and a newly constructed drug short index of problems.

PubMed

Alterman, Arthur I; Cacciola, John S; Ivey, Megan A; Habing, Brian; Lynch, Kevin G

2009-03-01

This study evaluated the psychometric properties of the 15-item alcohol Short Index of Problems (SIP) instrument and those of a newly constructed 15-item drug Short Index of Problems (SIP-D) instrument in 277 newly entered substance-abuse patients. The SIP is derived from the longer, 50-item Drinker Inventory of Consequences (DrInC), which was designed to assess adverse consequences of alcohol use. The SIP-D was constructed by substituting the term "drug use" for the term "drinking" in each SIP item. A 3-month recall interval was employed. Factor analyses of each of the instruments revealed similar solutions, with only one main factor accounting for the majority of variance. Nonparametric item response theory methods produced the same finding. Internal consistency reliability estimates for the SIP and SIP-D total scores were .98 and .97, respectively. Concurrent validity was demonstrated by examining the correlations of the total scores for each of the instruments with the recent summary indexes of the newly revised Addiction Severity Index (ASI-Version 6): alcohol, drug, medical, economic, legal, family/social, and psychiatric problems. This study is the first to confirm the psychometric validity of the SIP when used as an independent instrument unembedded within the DrInC. The study also supports the use of the SIP-D as a brief measure of adverse consequences of drug use. The findings strongly support the unidimensional structure of both measures.
Measurement Invariance and the Five-Factor Model of Personality: Asian International and Euro American Cultural Groups.

PubMed

Rollock, David; Lui, P Priscilla

2016-10-01

This study examined measurement invariance of the NEO Five-Factor Inventory (NEO-FFI), assessing the five-factor model (FFM) of personality among Euro American (N = 290) and Asian international (N = 301) students (47.8% women, Mage = 19.69 years). The full 60-item NEO-FFI data fit the expected five-factor structure for both groups using exploratory structural equation modeling, and achieved configural invariance. Only 37 items significantly loaded onto the FFM-theorized factors for both groups and demonstrated metric invariance. Threshold invariance was not supported with this reduced item set. Groups differed the most in the item-factor relationships for Extraversion and Agreeableness, as well as in response styles. Asian internationals were more likely to use midpoint responses than Euro Americans. While the FFM can characterize broad nomothetic patterns of personality traits, metric invariance with only the subset of NEO-FFI items identified limits direct group comparisons of correlation coefficients among personality domains and with other constructs, and of mean differences on personality domains. © The Author(s) 2015.
What do Demand-Control and Effort-Reward work stress questionnaires really measure? A discriminant content validity study of relevance and representativeness of measures.

PubMed

Bell, Cheryl; Johnston, Derek; Allan, Julia; Pollard, Beth; Johnston, Marie

2017-05-01

The Demand-Control (DC) and Effort-Reward Imbalance (ERI) models predict health in a work context. Self-report measures of the four key constructs (demand, control, effort, and reward) have been developed and it is important that these measures have good content validity uncontaminated by content from other constructs. We assessed relevance (whether items reflect the constructs) and representativeness (whether all aspects of the construct are assessed, and all items contribute to that assessment) across the instruments and items. Two studies examined fourteen demand/control items from the Job Content Questionnaire and seventeen effort/reward items from the Effort-Reward Imbalance measure using discriminant content validation and a third study developed new methods to assess instrument representativeness. Both methods use judges' ratings and construct definitions to get transparent quantitative estimates of construct validity. Study 1 used dictionary definitions while studies 2 and 3 used published phrases to define constructs. Overall, 3/5 demand items, 4/9 control items, 1/6 effort items, and 7/11 reward items were uniquely classified to the appropriate theoretical construct and were therefore 'pure' items with discriminant content validity (DCV). All pure items measured a defining phrase. However, both the DC and ERI assessment instruments failed to assess all defining aspects. Finding good discriminant content validity for demand and reward measures means these measures are usable and our quantitative results can guide item selection. By contrast, effort and control measures had limitations (in relevance and representativeness) presenting a challenge to the implementation of the theories. Statement of contribution What is already known on this subject? While the reliability and construct validity of Demand-Control and Effort-Reward-Imbalance (DC and ERI) work stress measures are routinely reported, there has not been adequate investigation of their content validity. This paper investigates their content validity in terms of both relevance and representativeness and provides a model for the investigation of content validity of measures in health psychology more generally. What does this study add? A new application of an existing method, discriminant content validity, and a new method of assessing instrument representativeness. 'Pure' DC and ERI items are identified, as are constructs that are not fully represented by their assessment instruments. The findings are important for studies attempting to distinguish between the main DC and ERI work stress constructs. The quantitative results can be used to guide item selection for future studies. © 2017 The British Psychological Society.
Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers

PubMed Central

2012-01-01

Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12) – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension. Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware. PMID:22686586

Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers.

PubMed

Stochl, Jan; Jones, Peter B; Croudace, Tim J

2012-06-11

Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension.Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware.
A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

PubMed

Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

2018-04-10

To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales in school children

PubMed Central

2012-01-01

Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135
Genes, Culture and Conservatism-A Psychometric-Genetic Approach.

PubMed

Schwabe, Inga; Jonker, Wilfried; van den Berg, Stéphanie M

2016-07-01

The Wilson-Patterson conservatism scale was psychometrically evaluated using homogeneity analysis and item response theory models. Results showed that this scale actually measures two different aspects in people: on the one hand people vary in their agreement with either conservative or liberal catch-phrases and on the other hand people vary in their use of the "?" response category of the scale. A 9-item subscale was constructed, consisting of items that seemed to measure liberalism, and this subscale was subsequently used in a biometric analysis including genotype-environment interaction, correcting for non-homogeneous measurement error. Biometric results showed significant genetic and shared environmental influences, and significant genotype-environment interaction effects, suggesting that individuals with a genetic predisposition for conservatism show more non-shared variance but less shared variance than individuals with a genetic predisposition for liberalism.
Partial validation of a French version of the ADHD-rating scale IV on a French population of children with ADHD and epilepsy. Factorial structure, reliability, and responsiveness.

PubMed

Mercier, Catherine; Roche, Sylvain; Gaillard, Ségolène; Kassai, Behrouz; Arzimanoglou, Alexis; Herbillon, Vania; Roy, Pascal; Rheims, Sylvain

2016-05-01

Attention deficit hyperactivity disorder (ADHD) is a well-known comorbidity in children with epilepsy. In English-speaking countries, the scores of the original ADHD-rating scale IV are currently used as main outcomes in various clinical trials in children with epilepsy. In French-speaking countries, several French versions are in use though none has been fully validated yet. We sought here for a partial validation of a French version of the ADHD-RS IV regarding construct validity, internal consistency (i.e., scale reliability), item reliability, and responsiveness in a group of French children with ADHD and epilepsy. The study involved 167 children aged 6-15years in 10 French neuropediatric units. The factorial structure and item reliability were assessed with a confirmatory factorial analysis for ordered categorical variables. The dimensions' internal consistency was assessed with Guttman's lambda 6 coefficient. The responsiveness was assessed by the change in score under methylphenidate and in comparison with a control group. The results confirmed the original two-dimensional factorial structure (inattention, hyperactivity/impulsivity) and showed a satisfactory reliability of most items, a good dimension internal consistency, and a good responsiveness of the total score and the two subscores. The studied French version of the ADHD-RS IV is thus validated regarding construct validity, reliability, and responsiveness. It can now be used in French-speaking countries in clinical trials of treatments involving children with ADHD and epilepsy. The full validation requires further investigations. Copyright © 2016 Elsevier Inc. All rights reserved.
Examining Two Strategies to Link Mixed-Format Tests Using Multiple-Choice Anchors. Research Report. ETS RR-10-18

ERIC Educational Resources Information Center

Walker, Michael E.; Kim, Sooyeon

2010-01-01

This study examined the use of an all multiple-choice (MC) anchor for linking mixed format tests containing both MC and constructed-response (CR) items, in a nonequivalent groups design. An MC-only anchor could effectively link two such test forms if either (a) the MC and CR portions of the test measured the same construct, so that the MC anchor…
Construction of a web-based questionnaire for longitudinal investigation of work exposure, musculoskeletal pain and performance impairments in high-performance marine craft populations

PubMed Central

de Alwis, Manudul Pahansen; Äng, Björn Olov; Garme, Karl

2017-01-01

Objective High-performance marine craft personnel (HPMCP) are regularly exposed to vibration and repeated shock (VRS) levels exceeding maximum limitations stated by international legislation. Whereas such exposure reportedly is detrimental to health and performance, the epidemiological data necessary to link these adverse effects causally to VRS are not available in the scientific literature, and no suitable tools for acquiring such data exist. This study therefore constructed a questionnaire for longitudinal investigations in HPMCP. Methods A consensus panel defined content domains, identified relevant items and outlined a questionnaire. The relevance and simplicity of the questionnaire’s content were then systematically assessed by expert raters in three consecutive stages, each followed by revisions. An item-level content validity index (I-CVI) was computed as the proportion of experts rating an item as relevant and simple, and a scale-level content validity index (S-CVI/Ave) as the average I-CVI across items. The thresholds for acceptable content validity were 0.78 and 0.90, respectively. Finally, a dynamic web version of the questionnaire was constructed and pilot tested over a 1-month period during a marine exercise in a study population sample of eight subjects, while accelerometers simultaneously quantified VRS exposure. Results Content domains were defined as work exposure, musculoskeletal pain and human performance, and items were selected to reflect these constructs. Ratings from nine experts yielded S-CVI/Ave of 0.97 and 1.00 for relevance and simplicity, respectively, and the pilot test suggested that responses were sensitive to change in acceleration and that the questionnaire, following some adjustments, was feasible for its intended purpose. Conclusions A dynamic web-based questionnaire for longitudinal survey of key variables in HPMCP was constructed. Expert ratings supported that the questionnaire content is relevant, simple and sufficiently comprehensive, and the pilot test suggested that the questionnaire is feasible for longitudinal measurements in the study population. PMID:28729320
Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Questionnaire to assess patient satisfaction with pharmaceutical care in Spanish language.

PubMed

Traverso, María Luz; Salamano, Mercedes; Botta, Carina; Colautti, Marisel; Palchik, Valeria; Pérez, Beatriz

2007-08-01

To develop and validate a questionnaire, in Spanish, for assessing patient satisfaction with pharmaceutical care received in community pharmacies. Selection and translation of questionnaire's items; definition of response scale and demographic questions. Evaluation of face and content validity, feasibility, factor structure, reliability and construct validity. Forty-one community pharmacies of the province of Santa Fe. Argentina. Questionnaire administered to patients receiving pharmaceutical care or traditional pharmacy services. Pilot test to assess feasibility. Factor analysis used principal components and varimax rotation. Reliability established using internal consistency with Cronbach's alpha. Construct validity determined with extreme group method. A self-administered questionnaire with 27 items, 5-point Likert response scale and demographic questions was designed considering multidimensional structure of patient satisfaction. Questionnaire evaluates cumulative experience of patients with comprehensive pharmaceutical care practice in community pharmacies. Two hundred and seventy-four complete questionnaires were obtained. Factor analysis resulted in three factors: Managing therapy, Interpersonal relationship and General satisfaction, with a cumulative variance of 62.51%. Cronbach's alpha for the whole questionnaire was 0.96, and 0.95, 0.88 and 0.76 for the three factors, respectively. Mann-Whitney test for construct validity did not showed significant differences between pharmacies that provide pharmaceutical care and those that do not, however, 23 items showed significant differences between the two groups of pharmacies. The questionnaire developed can be a reliable and valid instrument to assess patient satisfaction with pharmaceutical care in community pharmacies in Spanish. Further research is needed to deepen the validation process.
A test of the International Personality Item Pool representation of the Revised NEO Personality Inventory and development of a 120-item IPIP-based measure of the five-factor model.

PubMed

Maples, Jessica L; Guan, Li; Carter, Nathan T; Miller, Joshua D

2014-12-01

There has been a substantial increase in the use of personality assessment measures constructed using items from the International Personality Item Pool (IPIP) such as the 300-item IPIP-NEO (Goldberg, 1999), a representation of the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992). The IPIP-NEO is free to use and can be modified to accommodate its users' needs. Despite the substantial interest in this measure, there is still a dearth of data demonstrating its convergence with the NEO PI-R. The present study represents an investigation of the reliability and validity of scores on the IPIP-NEO. Additionally, we used item response theory (IRT) methodology to create a 120-item version of the IPIP-NEO. Using an undergraduate sample (n = 359), we examined the reliability, as well as the convergent and criterion validity, of scores from the 300-item IPIP-NEO, a previously constructed 120-item version of the IPIP-NEO (Johnson, 2011), and the newly created IRT-based IPIP-120 in comparison to the NEO PI-R across a range of outcomes. Scores from all 3 IPIP measures demonstrated strong reliability and convergence with the NEO PI-R and a high degree of similarity with regard to their correlational profiles across the criterion variables (rICC = .983, .972, and .976, respectively). The replicability of these findings was then tested in a community sample (n = 757), and the results closely mirrored the findings from Sample 1. These results provide support for the use of the IPIP-NEO and both 120-item IPIP-NEO measures as assessment tools for measurement of the five-factor model. (c) 2014 APA, all rights reserved.
Evaluation of the Hospital Anxiety and Depression Scale (HADS) in screening stroke patients for symptoms: Item Response Theory (IRT) analysis.

PubMed

Ayis, Salma A; Ayerbe, Luis; Ashworth, Mark; DA Wolfe, Charles

2018-03-01

Variations have been reported in the number of underlying constructs and choice of thresholds that determine caseness of anxiety and /or depression using the Hospital Anxiety and Depression scale (HADS). This study examined the properties of each item of HADS as perceived by stroke patients, and assessed the information these items convey about anxiety and depression between 3 months to 5 years after stroke. The study included 1443 stroke patients from the South London Stroke Register (SLSR). The dimensionality of HADS was examined using factor analysis methods, and items' properties up to 5 years after stroke were tested using Item Response Theory (IRT) methods, including graded response models (GRMs). The presence of two dimensions of HADS (anxiety and depression) for stroke patients was confirmed. Items that accurately inferred about the severity of anxiety and depression, and offered good discrimination of caseness were identified as "I can laugh and see the funny side of things" (Q4) and "I get sudden feelings of panic" (Q13), discrimination 2.44 (se = 0.26), and 3.34 (se = 0.35), respectively. Items that shared properties, hence replicate inference were: "I get a sort of frightened feeling as if something awful is about to happen" (Q3), "I get a sort of frightened feeling like butterflies in my stomach" (Q6), and "Worrying thoughts go through my mind" (Q9). Item properties were maintained over time. Approximately 20% of patients were lost to follow up. A more concise selection of items based on their properties, would provide a precise approach for screening patients and for an optimal allocation of patients into clinical trials. Copyright © 2017 Elsevier B.V. All rights reserved.
Development and validation of brief scales to measure emotional and behavioural problems among Chinese adolescents

PubMed Central

Shen, Minxue; Hu, Ming; Sun, Zhenqiu

2017-01-01

Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
Pressure ulcers: development and psychometric evaluation of the attitude towards pressure ulcer prevention instrument (APuP).

PubMed

Beeckman, D; Defloor, T; Demarré, L; Van Hecke, A; Vanderwee, K

2010-11-01

Pressure ulcers continue to be a significant problem in hospitals, nursing homes and community care settings. Pressure ulcer incidence is widely accepted as an indicator for the quality of care. Negative attitudes towards pressure ulcer prevention may result in suboptimal preventive care. A reliable and valid instrument to assess attitudes towards pressure ulcer prevention is lacking. Development and psychometric evaluation of the Attitude towards Pressure ulcer Prevention instrument (APuP). Prospective psychometric instrument validation study. A literature review was performed to design the instrument. Content validity was evaluated by nine European pressure ulcer experts and five experts in psychometric instrument validation in a double Delphi procedure. A convenience sample of 258 nurses and 291 nursing students from Belgium and The Netherlands participated in order to evaluate construct validity and stability reliability of the instrument. The data were collected between February and May 2008. A factor analysis indicated the construct of a 13 item instrument in a five factor solution: (1) attitude towards personal competency to prevent pressure ulcers (three items); (2) attitude towards the priority of pressure ulcer prevention (three items); (3) attitude towards the impact of pressure ulcers (three items); (4) attitude towards personal responsibility in pressure ulcer prevention (two items); and (5) attitude towards confidence in the effectiveness of prevention (two items). This five factor solution accounted for 61.4% of the variance in responses related to attitudes towards pressure ulcer prevention. All items demonstrated factor loadings over 0.60. The instrument produced similar results during stability testing [ICC=0.88 (95% CI=0.84-0.91, P<0.001)]. For the total instrument, the internal consistency (Cronbachs alpha) was 0.79. The APuP is a psychometrically sound instrument that can be used to effectively assess attitudes towards pressure ulcer prevention in patient care, education, and research. In further research, the association between attitude, knowledge and clinical performance should be explored. Copyright 2010 Elsevier Ltd. All rights reserved.
The initial development of the WebMedQual scale: domain assessment of the construct of quality of health web sites.

PubMed

Provost, Mélanie; Koompalum, Dayin; Dong, Diane; Martin, Bradley C

2006-01-01

To develop a comprehensive instrument assessing quality of health-related web sites. Phase I consisted of a literature review to identify constructs thought to indicate web site quality and to identify items. During content analysis, duplicate items were eliminated and items that were not clear, meaningful, or measurable were reworded or removed. Some items were generated by the authors. Phase II: a panel consisting of six healthcare and MIS reviewers was convened to assess each item for its relevance and importance to the construct and to assess item clarity and measurement feasibility. Three hundred and eighty-four items were generated from 26 sources. The initial content analysis reduced the scale to 104 items. Four of the six expert reviewers responded; high concordance on the relevance, importance and measurement feasibility of each item was observed: 3 out of 4, or all raters agreed on 76-85% of items. Based on the panel ratings, 9 items were removed, 3 added, and 10 revised. The WebMedQual consists of 8 categories, 8 sub-categories, 95 items and 3 supplemental items to assess web site quality. The constructs are: content (19 items), authority of source (18 items), design (19 items), accessibility and availability (6 items), links (4 items), user support (9 items), confidentiality and privacy (17 items), e-commerce (6 items). The "WebMedQual" represents a first step toward a comprehensive and standard quality assessment of health web sites. This scale will allow relatively easy assessment of quality with possible numeric scoring.
The Patient Assessment Questionnaire: initial validation of a measure of treatment effectiveness for patients with schizophrenia and schizoaffective disorder.

PubMed

Mojtabai, Ramin; Corey-Lisle, Patricia K; Ip, Edward Hak-Sing; Kopeykina, Irina; Haeri, Sophia; Cohen, Lisa Janet; Shumaker, Sally

2012-12-30

Investigation of patients' subjective perspective regarding the effectiveness - as opposed to efficacy - of antipsychotic medication has been hampered by a relative shortage of self-report measures of global clinical outcome. This paper presents data supporting the feasibility, inter-item consistency, and construct validity of the Patient Assessment Questionnaire (PAQ)-a self-report measure of psychiatric symptoms, medication side effects and general wellbeing, ultimately intended to assess effectiveness of interventions for schizophrenia-spectrum patients. The original 53-item instrument was developed by a multidisciplinary team which utilized brainstorming sessions for item generation and content analysis, patient focus groups, and expert panel reviews. This instrument and additional validation measures were administered, via Audio Computer-Assisted Self-Interviewing (ACASI), to 300 stable, medicated outpatients diagnosed with schizophrenia or schizoaffective disorder. Item elimination was based on psychometric properties and Item-Response Theory information functions and characteristic curves. Exploratory factor analysis of the resulting 40-item scale yielded a five factor solution. The five subscales (General Distress, Side Effects, Psychotic Symptoms, Cognitive Symptoms, Sleep) showed robust convergent (β's=0.34-0.75, average β=0.49) and discriminant validity. The PAQ demonstrates feasibility, reliability, and construct validity as a self-report measure of multiple domains pertinent to effectiveness. Future research needs to establish the PAQ's sensitivity to change. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Measuring Access to Information and Technology: Environmental Factors Affecting Persons With Neurologic Disorders.

PubMed

Hahn, Elizabeth A; Garcia, Sofia F; Lai, Jin-Shei; Miskovic, Ana; Jerousek, Sara; Semik, Patrick; Wong, Alex; Heinemann, Allen W

2016-08-01

To develop and validate a patient-reported measure of access to information and technology (AIT) for persons with spinal cord injury, stroke, or traumatic brain injury. A mixed-methods approach was used to develop items, refine them through cognitive interviews, and evaluate their psychometric properties. Item responses were evaluated with the Rasch rating scale model. Correlational and analysis-of-variance methods were used to evaluate construct validity. Community-dwelling individuals participated in telephone interviews or traveled to the academic medical centers where this research took place. Individuals with a diagnosis of spinal cord injury, stroke, or traumatic brain injury (aged ≥18y, English speaking) participated in cognitive interviews (n=12 persons), field testing of the items (n=305 persons), and validation testing of the final set of items (n=604 persons). Not applicable. A set of items to measure AIT for people with disabilities. A user-friendly multimedia touchscreen was used for self-administration of the items. A 23-item AIT measure demonstrated good evidence of internal consistency reliability, and content and construct validity. This new AIT measure will enable researchers and clinicians to determine to what extent environmental factors influence health outcomes and social participation in people with disabilities. The AIT measure could also provide disability advocates with more specific and detailed information about environmental factors to lobby for elimination of barriers. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Evaluating Instrument Quality in Science Education: Rasch-based analyses of a Nature of Science test

NASA Astrophysics Data System (ADS)

Neumann, Irene; Neumann, Knut; Nehm, Ross

2011-07-01

Given the central importance of the Nature of Science (NOS) and Scientific Inquiry (SI) in national and international science standards and science learning, empirical support for the theoretical delineation of these constructs is of considerable significance. Furthermore, tests of the effects of varying magnitudes of NOS knowledge on domain-specific science understanding and belief require the application of instruments validated in accordance with AERA, APA, and NCME assessment standards. Our study explores three interrelated aspects of a recently developed NOS instrument: (1) validity and reliability; (2) instrument dimensionality; and (3) item scales, properties, and qualities within the context of Classical Test Theory and Item Response Theory (Rasch modeling). A construct analysis revealed that the instrument did not match published operationalizations of NOS concepts. Rasch analysis of the original instrument-as well as a reduced item set-indicated that a two-dimensional Rasch model fit significantly better than a one-dimensional model in both cases. Thus, our study revealed that NOS and SI are supported as two separate dimensions, corroborating theoretical distinctions in the literature. To identify items with unacceptable fit values, item quality analyses were used. A Wright Map revealed that few items sufficiently distinguished high performers in the sample and excessive numbers of items were present at the low end of the performance scale. Overall, our study outlines an approach for how Rasch modeling may be used to evaluate and improve Likert-type instruments in science education.
Questionnaire Construction Manual

DTIC Science & Technology

1976-07-01

fwiW ........ ..., „.,. , r-m-lili^fa^BMiai igMiit VI-C Page 3 1 Jul 76 (2) All questionnaire items should be gramatically correct. (3) All...kept in mind: a. All response alternatives should follow the stem both gramatically and logically, and if possible, be parallel in structure. b
A Music-Related Quality of Life Measure to Guide Music Rehabilitation for Adult Cochlear Implant Users.

PubMed

Dritsakis, Giorgos; van Besouw, Rachel M; Kitterick, Pádraig; Verschuur, Carl A

2017-09-18

A music-related quality of life (MuRQoL) questionnaire was developed for the evaluation of music rehabilitation for adult cochlear implant (CI) users. The present studies were aimed at refinement and validation. Twenty-four experts reviewed the MuRQoL items for face validity. A refined version was completed by 147 adult CI users, and psychometric techniques were used for item selection, assessment of reliability, and definition of the factor structure. The same participants completed the Short Form Health Survey for construct validation. MuRQoL responses from 68 CI users were compared with those of a matched group of adults with normal hearing. Eighteen items measuring music perception and engagement and 18 items measuring their importance were selected; they grouped together into 2 domains. The final questionnaire has high internal consistency and repeatability. Significant differences between CI users and adults with normal hearing and a correlation between music engagement and quality of life support construct validity. Scores of music perception and engagement and importance for the 18 items can be combined to assess the impact of music on the quality of life. The MuRQoL questionnaire is a reliable and valid measure of self-reported music perception, engagement, and their importance for adult CI users with potential to guide music aural rehabilitation.
Category-Specific Neural Oscillations Predict Recall Organization During Memory Search

PubMed Central

Morton, Neal W.; Kahana, Michael J.; Rosenberg, Emily A.; Baltuch, Gordon H.; Litt, Brian; Sharan, Ashwini D.; Sperling, Michael R.; Polyn, Sean M.

2013-01-01

Retrieved-context models of human memory propose that as material is studied, retrieval cues are constructed that allow one to target particular aspects of past experience. We examined the neural predictions of these models by using electrocorticographic/depth recordings and scalp electroencephalography (EEG) to characterize category-specific oscillatory activity, while participants studied and recalled items from distinct, neurally discriminable categories. During study, these category-specific patterns predict whether a studied item will be recalled. In the scalp EEG experiment, category-specific activity during study also predicts whether a given item will be recalled adjacent to other same-category items, consistent with the proposal that a category-specific retrieval cue is used to guide memory search. Retrieved-context models suggest that integrative neural circuitry is involved in the construction and maintenance of the retrieval cue. Consistent with this hypothesis, we observe category-specific patterns that rise in strength as multiple same-category items are studied sequentially, and find that individual differences in this category-specific neural integration during study predict the degree to which a participant will use category information to organize memory search. Finally, we track the deployment of this retrieval cue during memory search: Category-specific patterns are stronger when participants organize their responses according to the category of the studied material. PMID:22875859

Intelligent topical sentiment analysis for the classification of e-learners and their topics of interest.

PubMed

Ravichandran, M; Kulanthaivel, G; Chellatamilan, T

2015-01-01

Every day, huge numbers of instant tweets (messages) are published on Twitter as it is one of the massive social media for e-learners interactions. The options regarding various interesting topics to be studied are discussed among the learners and teachers through the capture of ideal sources in Twitter. The common sentiment behavior towards these topics is received through the massive number of instant messages about them. In this paper, rather than using the opinion polarity of each message relevant to the topic, authors focus on sentence level opinion classification upon using the unsupervised algorithm named bigram item response theory (BIRT). It differs from the traditional classification and document level classification algorithm. The investigation illustrated in this paper is of threefold which are listed as follows: (1) lexicon based sentiment polarity of tweet messages; (2) the bigram cooccurrence relationship using naïve Bayesian; (3) the bigram item response theory (BIRT) on various topics. It has been proposed that a model using item response theory is constructed for topical classification inference. The performance has been improved remarkably using this bigram item response theory when compared with other supervised algorithms. The experiment has been conducted on a real life dataset containing different set of tweets and topics.
Psychometric properties of responses by clinicians and older adults to a 6-item Hebrew version of the Hamilton Depression Rating Scale (HAM-D6)

PubMed Central

2013-01-01

Background The Hamilton Depression Rating Scale (HAM-D) is commonly used as a screening instrument, as a continuous measure of change in depressive symptoms over time, and as a means to compare the relative efficacy of treatments. Among several abridged versions, the 6-item HAM-D6 is used most widely in large degree because of its good psychometric properties. The current study compares both self-report and clinician-rated versions of the Hebrew version of this scale. Methods A total of 153 Israelis 75 years of age on average participated in this study. The HAM-D6 was examined using confirmatory factor analytic (CFA) models separately for both patient and clinician responses. Results Reponses to the HAM-D6 suggest that this instrument measures a unidimensional construct with each of the scales’ six items contributing significantly to the measurement. Comparisons between self-report and clinician versions indicate that responses do not significantly differ for 4 of the 6 items. Moreover, 100% sensitivity (and 91% specificity) was found between patient HAM-D6 responses and clinician diagnoses of depression. Conclusion These results indicate that the Hebrew HAM-D6 can be used to measure and screen for depressive symptoms among elderly patients. PMID:23281688
Internal consistency and validity of a new physical workload questionnaire

PubMed Central

Bot, S; Terwee, C; van der Windt, D A W M; Feleus, A; Bierma-Zeinstra, S; Knol, D; Bouter, L; Dekker, J

2004-01-01

Aims: To examine the dimensionality, internal consistency, and construct validity of a new physical workload questionnaire in employees with musculoskeletal complaints. Methods: Factor analysis was applied to the responses in three study populations with musculoskeletal disorders (n = 406, 300, and 557) on 26 items related to physical workload. The internal consistency of the resulting subscales was examined. It was hypothesised that physical workload would vary among different occupational groups. The occupations of all subjects were classified into four groups on the basis of expected workload (heavy physical load; long lasting postures and repetitive movements; both; no physical load). Construct validity of the subscales created was tested by comparing the subscale scores among these occupational groups. Results: The pattern of the factor loadings of items was almost identical for the three study populations. Two interpretable factors were found: items related to heavy physical workload loaded highly on the first factor, and items related to static postures or repetitive work loaded highly on the second factor. The first constructed subscale "heavy physical work" had a Cronbach's α of 0.92 to 0.93 and the second subscale "long lasting postures and repetitive movements", of 0.86 to 0.87. Six of eight hypotheses regarding the construct validity of the subscales were confirmed. Conclusions: The results support the internal structure, internal consistency, and validity of the new physical workload questionnaire. Testing this questionnaire in non-symptomatic employees and comparing its performance with objective assessments of physical workload are important next steps in the validation process. PMID:15550603
Rasch analysis of the Italian Lower Extremity Functional Scale: insights on dimensionality and suggestions for an improved 15-item version.

PubMed

Bravini, Elisabetta; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano

2017-04-01

To investigate dimensionality and the measurement properties of the Italian Lower Extremity Functional Scale using both classical test theory and Rasch analysis methods, and to provide insights for an improved version of the questionnaire. Rasch analysis of individual patient data. Rehabilitation centre. A total of 135 patients with musculoskeletal diseases of the lower limb. Patients were assessed with the Lower Extremity Functional Scale before and after the rehabilitation. Rasch analysis showed some problems related to rating scale category functioning, items fit, and items redundancy. After an iterative process, which resulted in the reduction of rating scale categories from 5 to 4, and in the deletion of 5 items, the psychometric properties of the Italian Lower Extremity Functional Scale improved. The retained 15 items with a 4-level response format fitted the Rasch model (internal construct validity), and demonstrated unidimensionality and good reliability indices (person-separation reliability 0.92; Cronbach's alpha 0.94). Then, the analysis showed differential item functioning for six of the retained items. The sensitivity to change of the Italian 15-item Lower Extremity Functional Scale was nearly equal to the one of the original version (effect size: 0.93 and 0.98; standardized response mean: 1.20 and 1.28, respectively for the 15-item and 20-item versions). The Italian Lower Extremity Functional Scale had unsatisfactory measurement properties. However, removing five items and simplifying the scoring from 5 to 4 levels resulted in a more valid measure with good reliability and sensitivity to change.
Psychometric assessment of the IBS-D Daily Symptom Diary and Symptom Event Log.

PubMed

Rosa, Kathleen; Delgado-Herrera, Leticia; Zeiher, Bernie; Banderas, Benjamin; Arbuckle, Rob; Spears, Glen; Hudgens, Stacie

2016-12-01

Diarrhea-predominant irritable bowel syndrome (IBS-D) can considerably impact patients' lives. Patient-reported symptoms are crucial in understanding the diagnosis and progression of IBS-D. This study psychometrically evaluates the newly developed IBS-D Daily Symptom Diary and Symptom Event Log (hereafter, "Event Log") according to US regulatory recommendations. A US-based observational field study was conducted to understand cross-sectional psychometric properties of the IBS-D Daily Symptom Diary and Event Log. Analyses included item descriptive statistics, item-to-item correlations, reliability, and construct validity. The IBS-D Daily Symptom Diary and Event Log had no items with excessive missing data. With the exception of two items ("frequency of gas" and "accidents"), moderate to high inter-item correlations were observed among all items of the IBS-D Daily Symptom Diary and Event Log (day 1 range 0.67-0.90). Item scores demonstrated reliability, with the exception of the "frequency of gas" and "accidents" items of the Diary and "incomplete evacuation" item of the Event Log. The pattern of correlations of the IBS-D Daily Symptom Diary and Event Log item scores with generic and disease-specific measures was as expected, moderate for similar constructs and low for dissimilar constructs, supporting construct validity. Known-groups methods showed statistically significant differences and monotonic trends in each of the IBS-D Daily Symptom Diary item scores among groups defined by patients' IBS-D severity ratings ("none"/"mild," "moderate," or "severe"/"very severe"), supporting construct validity. Initial psychometric results support the reliability and validity of the items of the IBS-D Daily Symptom Diary and Event Log.
A multi-level differential item functioning analysis of trends in international mathematics and science study: Potential sources of gender and minority difference among U.S. eighth graders' science achievement

NASA Astrophysics Data System (ADS)

Qian, Xiaoyu

Science is an area where a large achievement gap has been observed between White and minority, and between male and female students. The science minority gap has continued as indicated by the National Assessment of Educational Progress and the Trends in International Mathematics and Science Studies (TIMSS). TIMSS also shows a gender gap favoring males emerging at the eighth grade. Both gaps continue to be wider in the number of doctoral degrees and full professorships awarded (NSF, 2008). The current study investigated both minority and gender achievement gaps in science utilizing a multi-level differential item functioning (DIF) methodology (Kamata, 2001) within fully Bayesian framework. All dichotomously coded items from TIMSS 2007 science assessment at eighth grade were analyzed. Both gender DIF and minority DIF were studied. Multi-level models were employed to identify DIF items and sources of DIF at both student and teacher levels. The study found that several student variables were potential sources of achievement gaps. It was also found that gender DIF favoring male students was more noticeable in the content areas of physics and earth science than biology and chemistry. In terms of item type, the majority of these gender DIF items were multiple choice than constructed response items. Female students also performed less well on items requiring visual-spatial ability. Minority students performed significantly worse on physics and earth science items as well. A higher percentage of minority DIF items in earth science and biology were constructed response than multiple choice items, indicating that literacy may be the cause of minority DIF. Three-level model results suggested that some teacher variables may be the cause of DIF variations from teacher to teacher. It is essential for both middle school science teachers and science educators to find instructional methods that work more effectively to improve science achievement of both female and minority students. Physics and earth science are two areas to be improved for both groups. Curriculum and instruction need to enhance female students' learning interests and give them opportunities to improve their visual perception skills. Science instruction should address improving minority students' literacy skills while teaching science.
Validating Measurement of Knowledge Integration in Science Using Multiple-Choice and Explanation Items

ERIC Educational Resources Information Center

Lee, Hee-Sun; Liu, Ou Lydia; Linn, Marcia C.

2011-01-01

This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item…
The Multidimensional Assessment of Interoceptive Awareness (MAIA)

PubMed Central

Mehling, Wolf E.; Price, Cynthia; Daubenmier, Jennifer J.; Acree, Mike; Bartmess, Elizabeth; Stewart, Anita

2012-01-01

This paper describes the development of a multidimensional self-report measure of interoceptive body awareness. The systematic mixed-methods process involved reviewing the current literature, specifying a multidimensional conceptual framework, evaluating prior instruments, developing items, and analyzing focus group responses to scale items by instructors and patients of body awareness-enhancing therapies. Following refinement by cognitive testing, items were field-tested in students and instructors of mind-body approaches. Final item selection was achieved by submitting the field test data to an iterative process using multiple validation methods, including exploratory cluster and confirmatory factor analyses, comparison between known groups, and correlations with established measures of related constructs. The resulting 32-item multidimensional instrument assesses eight concepts. The psychometric properties of these final scales suggest that the Multidimensional Assessment of Interoceptive Awareness (MAIA) may serve as a starting point for research and further collaborative refinement. PMID:23133619
Calibrating well-being, quality of life and common mental disorder items: psychometric epidemiology in public mental health research.

PubMed

Böhnke, Jan R; Croudace, Tim J

2016-08-01

The assessment of 'general health and well-being' in public mental health research stimulates debates around relative merits of questionnaire instruments and their items. Little evidence regarding alignment or differential advantages of instruments or items has appeared to date. Population-based psychometric study of items employed in public mental health narratives. Multidimensional item response theory was applied to General Health Questionnaire (GHQ-12), Warwick-Edinburgh Mental Well-being Scale (WEMWBS) and EQ-5D items (Health Survey for England, 2010-2012; n = 19 290). A bifactor model provided the best account of the data and showed that the GHQ-12 and WEMWBS items assess mainly the same construct. Only one item of the EQ-5D showed relevant overlap with this dimension (anxiety/depression). Findings were corroborated by comparisons with alternative models and cross-validation analyses. The consequences of this lack of differentiation (GHQ-12 v. WEMWBS) for mental health and well-being narratives deserves discussion to enrich debates on priorities in public mental health and its assessment. © The Royal College of Psychiatrists 2015.
Comparing five depression measures in depressed Chinese patients using item response theory: an examination of item properties, measurement precision and score comparability.

PubMed

Zhao, Yue; Chan, Wai; Lo, Barbara Chuen Yee

2017-04-04

Item response theory (IRT) has been increasingly applied to patient-reported outcome (PRO) measures. The purpose of this study is to apply IRT to examine item properties (discrimination and severity of depressive symptoms), measurement precision and score comparability across five depression measures, which is the first study of its kind in the Chinese context. A clinical sample of 207 Hong Kong Chinese outpatients was recruited. Data analyses were performed including classical item analysis, IRT concurrent calibration and IRT true score equating. The IRT assumptions of unidimensionality and local independence were tested respectively using confirmatory factor analysis and chi-square statistics. The IRT linking assumptions of construct similarity, equity and subgroup invariance were also tested. The graded response model was applied to concurrently calibrate all five depression measures in a single IRT run, resulting in the item parameter estimates of these measures being placed onto a single common metric. IRT true score equating was implemented to perform the outcome score linking and construct score concordances so as to link scores from one measure to corresponding scores on another measure for direct comparability. Findings suggested that (a) symptoms on depressed mood, suicidality and feeling of worthlessness served as the strongest discriminating indicators, and symptoms concerning suicidality, changes in appetite, depressed mood, feeling of worthlessness and psychomotor agitation or retardation reflected high levels of severity in the clinical sample. (b) The five depression measures contributed to various degrees of measurement precision at varied levels of depression. (c) After outcome score linking was performed across the five measures, the cut-off scores led to either consistent or discrepant diagnoses for depression. The study provides additional evidence regarding the psychometric properties and clinical utility of the five depression measures, offers methodological contributions to the appropriate use of IRT in PRO measures, and helps elucidate cultural variation in depressive symptomatology. The approach of concurrently calibrating and linking multiple PRO measures can be applied to the assessment of PROs other than the depression context.
[KON-2006--Neurotic Personality Questionnaire].

PubMed

Aleksandrowicz, Jerzy W; Klasa, Katarzyna; Sobański, Jerzy A; Stolarska, Dorota

2007-01-01

Construction of a questionnaire describing personality traits connected to the occurrence and persistence of neurotic disorders. Responses of 794 patients (before treatment) and 520 persons from the control group on items of the constructed personality questionnaire and the symptom checklist "0". Analyses of subscales reliability and item-scale correlations, test-retest and split-half reliability. Factor analyses estimating internal reliability of the questionnaire. Cross-validation with the KO"0". symptom checklist Psychometric properties of KON-2006 questionnaire indicate that it is consistent and reliable enough. Validity analyses indicate a large probability that the X-KON coefficient informs on personality dysfunctions related to neurotic disorders. The Neurotic Personality Questionnaire KON-2006 may serve to estimate personality traits connected to the occurrence and persistence of neurotic disorders as well as changes resulting from psychotherapy.
Computer-adaptive test to measure community reintegration of Veterans.

PubMed

Resnik, Linda; Tian, Feng; Ni, Pengsheng; Jette, Alan

2012-01-01

The Community Reintegration of Injured Service Members (CRIS) measure consists of three scales measuring extent of, perceived limitations in, and satisfaction with community reintegration. Length of the CRIS may be a barrier to its widespread use. Using item response theory (IRT) and computer-adaptive test (CAT) methodologies, this study developed and evaluated a briefer community reintegration measure called the CRIS-CAT. Large item banks for each CRIS scale were constructed. A convenience sample of 517 Veterans responded to all items. Exploratory and confirmatory factor analyses (CFAs) were used to identify the dimensionality within each domain, and IRT methods were used to calibrate items. Accuracy and precision of CATs of different lengths were compared with the full-item bank, and data were examined for differential item functioning (DIF). CFAs supported unidimensionality of scales. Acceptable item fit statistics were found for final models. Accuracy of 10-, 15-, 20-, and variable-item CATs for all three scales was 0.88 or above. CAT precision increased with number of items administered and decreased at the upper ranges of each scale. Three items exhibited moderate DIF by sex. The CRIS-CAT demonstrated promising measurement properties and is recommended for use in community reintegration assessment.
Experimentally Manipulating Items Informs on the (Limited) Construct and Criterion Validity of the Humor Styles Questionnaire

PubMed Central

Ruch, Willibald; Heintz, Sonja

2017-01-01

How strongly does humor (i.e., the construct-relevant content) in the Humor Styles Questionnaire (HSQ; Martin et al., 2003) determine the responses to this measure (i.e., construct validity)? Also, how much does humor influence the relationships of the four HSQ scales, namely affiliative, self-enhancing, aggressive, and self-defeating, with personality traits and subjective well-being (i.e., criterion validity)? The present paper answers these two questions by experimentally manipulating the 32 items of the HSQ to only (or mostly) contain humor (i.e., construct-relevant content) or to substitute the humor content with non-humorous alternatives (i.e., only assessing construct-irrelevant context). Study 1 (N = 187) showed that the HSQ affiliative scale was mainly determined by humor, self-enhancing and aggressive were determined by both humor and non-humorous context, and self-defeating was primarily determined by the context. This suggests that humor is not the primary source of the variance in three of the HQS scales, thereby limiting their construct validity. Study 2 (N = 261) showed that the relationships of the HSQ scales to the Big Five personality traits and subjective well-being (positive affect, negative affect, and life satisfaction) were consistently reduced (personality) or vanished (subjective well-being) when the non-humorous contexts in the HSQ items were controlled for. For the HSQ self-defeating scale, the pattern of relationships to personality was also altered, supporting an positive rather than a negative view of the humor in this humor style. The present findings thus call for a reevaluation of the role that humor plays in the HSQ (construct validity) and in the relationships to personality and well-being (criterion validity). PMID:28473794
Validation of the Middlesex Elderly Assessment of Mental State (MEAMS) as a cognitive screening test in patients with acquired brain injury in Turkey.

PubMed

Kutlay, Sehim; Kuçukdeveci, Ayse A; Elhan, Atilla H; Yavuzer, Gunes; Tennant, Alan

2007-02-28

Assessment of cognitive impairment with a valid cognitive screening tool is essential in neurorehabilitation. The aim of this study was to test the reliability and validity of the Turkish-adapted version of the Middlesex Elderly Assessment of Mental State (MEAMS) among acquired brain injury patients in Turkey. Some 155 patients with acquired brain injury admitted for rehabilitation were assessed by the adapted version of MEAMS at admission and discharge. Reliability was tested by internal consistency, intra-class correlation coefficient (ICC) and person separation index; internal construct validity by Rasch analysis; external construct validity by associations with physical and cognitive disability (FIM); and responsiveness by Effect Size. Reliability was found to be good with Cronbach's alpha of 0.82 at both admission and discharge; and likewise an ICC of 0.80. Person separation index was 0.813. Internal construct validity was good by fit of the data to the Rasch model (mean item fit -0.178; SD 1.019). Items were substantially free of differential item functioning. External construct validity was confirmed by expected associations with physical and cognitive disability. Effect size was 0.42 compared with 0.22 for cognitive FIM. The reliability and validity of the Turkish version of MEAMS as a cognitive impairment screening tool in acquired brain injury has been demonstrated.
Evaluation of diagnostic criteria for panic attack using item response theory: findings from the National Comorbidity Survey in USA.

PubMed

Ietsugu, Tetsuji; Sukigara, Masune; Furukawa, Toshiaki A

2007-12-01

The dichotomous diagnostic systems such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Classification of Diseases (ICD) lose much important information concerning what each symptom can offer. This study explored the characteristics and performances of DSM-IV and ICD-10 diagnostic criteria items for panic attack using modern item response theory (IRT). The National Comorbidity Survey used the Composite International Diagnostic Interview to assess 14 DSM-IV and ICD-10 panic attack diagnostic criteria items in the general population in the USA. The dimensionality and measurement properties of these items were evaluated using dichotomous factor analysis and the two-parameter IRT model. A total of 1213 respondents reported at least one subsyndromal or syndromal panic attack in their lifetime. Factor analysis indicated that all items constitute a unidimensional construct. The two-parameter IRT model produced meaningful and interpretable results. Among items with high discrimination parameters, the difficulty parameter for "palpitation" was relatively low, while those for "choking," "fear of dying" and "paresthesia" were relatively high. Several items including "dry mouth" and "fear of losing control" had low discrimination parameters. The item characteristics of diagnostic criteria among help-seeking clinical populations may be different from those that we observed in the general population and deserve further examination. "Paresthesia," "choking" and "fear of dying" can be thought to be good indicators of severe panic attacks, while "palpitation" can discriminate well between cases and non-cases at low level of panic attack severity. Items such as "dry mouth" would contribute less to the discrimination.
The development and psychometric validation of the Ethical Awareness Scale.

PubMed

Milliken, Aimee; Ludlow, Larry; DeSanto-Madeya, Susan; Grace, Pamela

2018-04-19

To develop and psychometrically assess the Ethical Awareness Scale using Rasch measurement principles and a Rasch item response theory model. Critical care nurses must be equipped to provide good (ethical) patient care. This requires ethical awareness, which involves recognizing the ethical implications of all nursing actions. Ethical awareness is imperative in successfully addressing patient needs. Evidence suggests that the ethical import of everyday issues may often go unnoticed by nurses in practice. Assessing nurses' ethical awareness is a necessary first step in preparing nurses to identify and manage ethical issues in the highly dynamic critical care environment. A cross-sectional design was used in two phases of instrument development. Using Rasch principles, an item bank representing nursing actions was developed (33 items). Content validity testing was performed. Eighteen items were selected for face validity testing. Two rounds of operational testing were performed with critical care nurses in Boston between February-April 2017. A Rasch analysis suggests sufficient item invariance across samples and sufficient construct validity. The analysis further demonstrates a progression of items uniformly along a hierarchical continuum; items that match respondent ability levels; response categories that are sufficiently used; and adequate internal consistency. Mean ethical awareness scores were in the low/moderate range. The results suggest the Ethical Awareness Scale is a psychometrically sound, reliable and valid measure of ethical awareness in critical care nurses. © 2018 John Wiley & Sons Ltd.
A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a Rheumatoid Arthritis Population using Rasch Analysis

PubMed Central

Covic, Tanya; Pallant, Julie F; Conaghan, Philip G; Tennant, Alan

2007-01-01

Background The aim of this study was to test the internal validity of the total Center for Epidemiologic Studies-Depression (CES-D) scale using Rasch analysis in a rheumatoid arthritis (RA) population. Methods CES-D was administered to 157 patients with RA over three time points within a 12 month period. Rasch analysis was applied using RUMM2020 software to assess the overall fit of the model, the response scale used, individual item fit, differential item functioning (DIF) and person separation. Results Pooled data across three time points was shown to fit the Rasch model with removal of seven items from the original 20-item CES-D scale. It was necessary to rescore the response format from four to three categories in order to improve the scale's fit. Two items demonstrated some DIF for age and gender but were retained within the 13-item CES-D scale. A new cut point for depression score of 9 was found to correspond to the original cut point score of 16 in the full CES-D scale. Conclusion This Rasch analysis of the CES-D in a longstanding RA cohort resulted in the construction of a modified 13-item scale with good internal validity. Further validation of the modified scale is recommended particularly in relation to the new cut point for depression. PMID:17629902
Role of Cognitive Testing in the Development of the CAHPS® Hospital Survey

PubMed Central

Levine, Roger E; Fowler, Floyd J; Brown, Julie A

2005-01-01

Objective To describe how cognitive testing results were used to inform the modification and selection of items for the Consumer Assessment of Health Providers and Systems (CAHPS®) Hospital Survey pilot test instrument. Data Sources Cognitive interviews were conducted on 31 subjects in two rounds of testing: in December 2002–January 2003 and in February 2003. In both rounds, interviews were conducted in northern California, southern California, Massachusetts, and North Carolina. Study Design A common protocol served as the basis for cognitive testing activities in each round. This protocol was modified to enable testing of the items as interviewer-administered and self-administered items and to allow members of each of three research teams to use their preferred cognitive research tools. Data Collection/Extraction Methods Each research team independently summarized, documented, and reported their findings. Item-specific and general issues were noted. The results were reviewed and discussed by senior staff from each research team after each round of testing, to inform the acceptance, modification, or elimination of candidate items. Principal Findings Many candidate items required modification because respondents lacked the information required to answer them, respondents failed to understand them consistently, the items were not measuring the constructs they were intended to measure, the items were based on erroneous assumptions about what respondents wanted or experienced during their hospitalization, or the items were asking respondents to make distinctions that were too fine for them to make. Cognitive interviewing enabled the detection of these problems; an understanding of the etiology of the problem informed item revisions. However, for some constructs, the revisions proved to be inadequate. Accordingly, items could not be developed to provide acceptable measures of certain constructs such as shared decision making, coordination of care, and delays in the admissions process. Conclusions Cognitive testing is the most direct way of finding out whether respondents understand questions consistently, have the information needed to answer the questions, and can use the response alternatives provided to describe their experiences or their opinions accurately. Many of the candidate questions failed to meet these standards. Cognitive testing only evaluates the way in which respondents understand and answer questions. Although it does not directly assess the validity of the answers, it is a reasonable premise that cognitive problems will seriously compromise validity and reliability. PMID:16316437
An item response theory analysis of nicotine dependence symptoms in recent onset adolescent smokers.

PubMed

Rose, Jennifer S; Dierker, Lisa C

2010-07-01

Given absence of a "gold standard" for measuring self-reported nicotine dependence, particularly among less experienced smokers, there is a need to evaluate existing measures to determine how well symptoms measure the underlying nicotine dependence construct and whether symptoms function differently for less experienced smokers. Study aims were to determine (1) likelihood of endorsement of individual symptoms at different levels of a nicotine dependence construct and the ability of symptoms to discriminate between different levels of this construct and (2) whether these symptom properties varied between nondaily and daily smokers. We used multiple group item response theory analysis to evaluate nicotine dependence symptoms from the nicotine dependence syndrome scale based on a nationally representative sample of 8081 recent onset adolescent smokers from the national surveys on drug use and health. After controlling for age, gender, smoking quantity and length of smoking exposure, symptoms assessing tolerance were invariant across nondaily and daily smokers, and discriminated well between levels of the nicotine dependence construct. However, the majority of symptoms functioned differently for nondaily and daily smokers. These symptoms did not discriminate as well between levels of the nicotine dependence construct and were more likely to be endorsed at lower levels of this construct for daily smokers. A measure that encompasses a range of symptoms tapping different aspects of smoking may be ideally suited for nondaily adolescent smokers, while an ideal measure of nicotine dependence for daily smokers might also include more core diagnostic features of nicotine dependence such as withdrawal and tolerance. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
RESEARCH ON ROBUST METHODS FOR EXTRACTING AND RECOGNIZING PHOTOGRAPHY MANAGEMENT ITEMS FROM VARIOUS IMAGE DATA Of CONSTRUCTION

NASA Astrophysics Data System (ADS)

Kitagawa, Etsuji; Tanaka, Shigenori; Abiko, Satoshi; Wakabayashi, Katsuma; Jiang, Wenyuan

Recently, an electronic delivery for various documents is carried out by Ministry of Land, Infrastructure, Transport and Tourism in construction fields. One of them is image data of construction photography that must be delivered with information of photography management items such as construction name or type of works, etc. However, there is a problem that a lot of cost is needed to treat contents of these items from characters printed and handwritten on blackboard into these image data. In this research, we develop the system which can treat contents of these items by extracting contents of these items from the image data of construction photography taken in various scenes with preprocessing the image, recognizing characters with OCR and correcting error with natural language process. And we confirm the effectiveness of the system, by experimenting in each function of system and in entire system.

Rasch Analysis of the Adult Strabismus Quality of Life Questionnaire (AS-20) among Chinese Adult Patients with Strabismus.

PubMed

Wang, Zonghua; Zhou, Juan; Luo, Xingli; Xu, Yan; She, Xi; Chen, Ling; Yin, Honghua; Wang, Xianyuan

2015-01-01

The impact of strabismus on visual function, self-image, self-esteem, and social interactions decrease health-related quality of life (HRQoL).The purpose of this study was to evaluate and refine the adult strabismus quality of life questionnaire (AS-20) by using Rasch analysis among Chinese adult patients with strabismus. We evaluated the fitness of the AS-20 with Rasch model in Chinese population by assessing unidimensionality, infit and outfit, person and item separation index and reliability, response ordering, targeting and differential item functioning (DIF). The overall AS-20 did not demonstrate unidimensional; however, it was achieved separately in the two Rasch-revised subscales: the psychosocial subscale (11 items) and the function subscale (9 items). The features of good targeting, optimal item infit and outfit, and no notable local dependence were found for each of the subscales. The rating scale was appropriate for the psychosocial subscale but a reduction to four response categories was required for the function subscale. No significant DIF were revealed for any demographic and clinical factors (e.g., age, gender, and strabismus types). The AS-20 was demonstrated by Rasch analysis to be a rigorous instrument for measuring health-related quality of life in Chinese strabismus patents if some revisions were made regarding the subscale construct and response options.
Adaptive Testing without IRT.

ERIC Educational Resources Information Center

Yan, Duanli; Lewis, Charles; Stocking, Martha

It is unrealistic to suppose that standard item response theory (IRT) models will be appropriate for all new and currently considered computer-based tests. In addition to developing new models, researchers will need to give some attention to the possibility of constructing and analyzing new tests without the aid of strong models. Computerized…
Procrastination Revisited: The Constructive Use of Delayed Response.

ERIC Educational Resources Information Center

Subotnik, Rena F.; And Others

This study investigated patterns of procrastination in the domains of health, relationships, employment, and creative outlets in 19 former Westinghouse Science Talent Search winners, age 32 years. A model was synthesized from the available literature and an interview schedule of 14 open-ended items was developed to elicit self-assessments of…
Investigating Psychometric Isomorphism for Traditional and Performance-Based Assessment

ERIC Educational Resources Information Center

Fay, Derek M.; Levy, Roy; Mehta, Vandhana

2018-01-01

A common practice in educational assessment is to construct multiple forms of an assessment that consists of tasks with similar psychometric properties. This study utilizes a Bayesian multilevel item response model and descriptive graphical representations to evaluate the psychometric similarity of variations of the same task. These approaches for…
Civic Engagement in College Students: Connections between Involvement and Attitudes

ERIC Educational Resources Information Center

O'Leary, Lisa S.

2014-01-01

This chapter describes how canonical correlation was used in conjunction with an item response theory model to address the relationship between college students' civic engagement involvement and attitudes as undergraduates. The constructs of interest were students' participation in civic, political, and expressive activities, as well as…
Comparison of Automated Scoring Methods for a Computerized Performance Assessment of Clinical Judgment

ERIC Educational Resources Information Center

Harik, Polina; Baldwin, Peter; Clauser, Brian

2013-01-01

Growing reliance on complex constructed response items has generated considerable interest in automated scoring solutions. Many of these solutions are described in the literature; however, relatively few studies have been published that "compare" automated scoring strategies. Here, comparisons are made among five strategies for…
24 CFR 570.207 - Ineligible activities.

Code of Federal Regulations, 2014 CFR

2014-04-01

... to carry out the regular responsibilities of the unit of general local government are not eligible... construction equipment for use as part of a solid waste disposal facility is eligible under § 570.201(c). (ii... grant payments made to an individual or family for items such as food, clothing, housing (rent or...
24 CFR 570.207 - Ineligible activities.

Code of Federal Regulations, 2012 CFR

2012-04-01

... to carry out the regular responsibilities of the unit of general local government are not eligible... construction equipment for use as part of a solid waste disposal facility is eligible under § 570.201(c). (ii... grant payments made to an individual or family for items such as food, clothing, housing (rent or...
24 CFR 570.207 - Ineligible activities.

Code of Federal Regulations, 2013 CFR

2013-04-01

... to carry out the regular responsibilities of the unit of general local government are not eligible... construction equipment for use as part of a solid waste disposal facility is eligible under § 570.201(c). (ii... grant payments made to an individual or family for items such as food, clothing, housing (rent or...
Rewards of bridging the divide between measurement and clinical theory: demonstration of a bifactor model for the Brief Symptom Inventory.

PubMed

Thomas, Michael L

2012-03-01

There is growing evidence that psychiatric disorders maintain hierarchical associations where general and domain-specific factors play prominent roles (see D. Watson, 2005). Standard, unidimensional measurement models can fail to capture the meaningful nuances of such complex latent variable structures. The present study examined the ability of the multidimensional item response theory bifactor model (see R. D. Gibbons & D. R. Hedeker, 1992) to improve construct validity by serving as a bridge between measurement and clinical theories. Archival data consisting of 688 outpatients' psychiatric diagnoses and item-level responses to the Brief Symptom Inventory (BSI; L. R. Derogatis, 1993) were extracted from files at a university mental health clinic. The bifactor model demonstrated superior fit for the internal structure of the BSI and improved overall diagnostic accuracy in the sample (73%) compared with unidimensional (61%) and oblique simple structure (65%) models. Consistent with clinical theory, multiple sources of item variance were drawn from individual test items. Test developers and clinical researchers are encouraged to consider model-based measurement in the assessment of psychiatric distress.
An item response theory analysis of the Psychological Inventory of Criminal Thinking Styles: comparing male and female probationers and prisoners.

PubMed

Walters, Glenn D

2014-09-01

An item response theory (IRT) analysis of the Psychological Inventory of Criminal Thinking Styles (PICTS) was performed on 26,831 (19,067 male and 7,764 female) federal probationers and compared with results obtained on 3,266 (3,039 male and 227 female) prisoners from previous research. Despite the fact male and female federal probationers scored significantly lower on the PICTS thinking style scales than male and female prisoners, discrimination and location parameter estimates for the individual PICTS items were comparable across sex and setting. Consistent with the results of a previous IRT analysis conducted on the PICTS, the current results did not support sentimentality as a component of general criminal thinking. Findings from this study indicate that the discriminative power of the individual PICTS items is relatively stable across sex (male, female) and correctional setting (probation, prison) and that the PICTS may be measuring the same criminal thinking construct in male and female probationers and prisoners. PsycINFO Database Record (c) 2014 APA, all rights reserved.
What is the Ability Emotional Intelligence Test (MSCEIT) good for? An evaluation using item response theory.

PubMed

Fiori, Marina; Antonietti, Jean-Philippe; Mikolajczak, Moira; Luminet, Olivier; Hansenne, Michel; Rossier, Jérôme

2014-01-01

The ability approach has been indicated as promising for advancing research in emotional intelligence (EI). However, there is scarcity of tests measuring EI as a form of intelligence. The Mayer Salovey Caruso Emotional Intelligence Test, or MSCEIT, is among the few available and the most widespread measure of EI as an ability. This implies that conclusions about the value of EI as a meaningful construct and about its utility in predicting various outcomes mainly rely on the properties of this test. We tested whether individuals who have the highest probability of choosing the most correct response on any item of the test are also those who have the strongest EI ability. Results showed that this is not the case for most items: The answer indicated by experts as the most correct in several cases was not associated with the highest ability; furthermore, items appeared too easy to challenge individuals high in EI. Overall results suggest that the MSCEIT is best suited to discriminate persons at the low end of the trait. Results are discussed in light of applied and theoretical considerations.
Do Self Concept Tests Test Self Concept? An Evaluation of the Validity of Items on the Piers Harris and Coopersmith Measures.

ERIC Educational Resources Information Center

Lynch, Mervin D.; Chaves, John

Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…
Perceptions of team members working in cleft services in the United kingdom: a pilot study.

PubMed

Scott, Julia K; Leary, Sam D; Ness, Andy R; Sandy, Jonathan R; Persson, Martin; Kilpatrick, Nicky; Waylen, Andrea E

2015-01-01

Cleft care provision in the United Kingdom has been centralized over the past 15 years to improve outcomes for children born with cleft lip and palate. However, to date, there have been no investigations to examine how well these multidisciplinary teams are performing. In this pilot study, a cross-sectional questionnaire surveyed members of all health care specialties working to provide cleft care in 11 services across the United Kingdom. Team members were asked to complete the Team Work Assessment (TWA) to investigate perceptions of team working in cleft services. The TWA comprises 55 items measuring seven constructs: team foundation, function, performance and skills, team climate and atmosphere, team leadership, and team identity; individual constructs were also aggregated to provide an overall TWA score. Items were measured using five-point Likert-type scales and were converted into percentage agreement for analysis. Responses were received from members of every cleft team. Ninety-nine of 138 cleft team questionnaires (71.7%) were returned and analyzed. The median (interquartile range) percentage of maximum possible score across teams was 75.5% (70.8, 88.2) for the sum of all items. Team performance and team identity were viewed most positively, with 82.0% (75.0, 88.2) and 88.4% (82.2, 91.4), respectively. Team foundation and leadership were viewed least positively with 79.0% (72.6, 84.6) and 76.6% (70.6, 85.4), respectively. Cleft team members perceive that their teams work well, but there are variations in response according to construct.
Simple construct evaluation with latent class analysis: An investigation of Facebook addiction and the development of a short form of the Facebook Addiction Test (F-AT).

PubMed

Dantlgraber, Michael; Wetzel, Eunike; Schützenberger, Petra; Stieger, Stefan; Reips, Ulf-Dietrich

2016-09-01

In psychological research, there is a growing interest in using latent class analysis (LCA) for the investigation of quantitative constructs. The aim of this study is to illustrate how LCA can be applied to gain insights on a construct and to select items during test development. We show the added benefits of LCA beyond factor-analytic methods, namely being able (1) to describe groups of participants that differ in their response patterns, (2) to determine appropriate cutoff values, (3) to evaluate items, and (4) to evaluate the relative importance of correlated factors. As an example, we investigated the construct of Facebook addiction using the Facebook Addiction Test (F-AT), an adapted version of the Internet Addiction Test (I-AT). Applying LCA facilitates the development of new tests and short forms of established tests. We present a short form of the F-AT based on the LCA results and validate the LCA approach and the short F-AT with several external criteria, such as chatting, reading newsfeeds, and posting status updates. Finally, we discuss the benefits of LCA for evaluating quantitative constructs in psychological research.
Rasch validation of the Arabic version of the lower extremity functional scale.

PubMed

Alnahdi, Ali H

2018-02-01

The purpose of this study was to examine the internal construct validity of the Arabic version of the Lower Extremity Functional Scale (20-item Arabic LEFS) using Rasch analysis. Patients (n = 170) with lower extremity musculoskeletal dysfunction were recruited. Rasch analysis of 20-item Arabic LEFS was performed. Once the initial Rasch analysis indicated that the 20-item Arabic LEFS did not fit the Rasch model, follow-up analyses were conducted to improve the fit of the scale to the Rasch measurement model. These modifications included removing misfitting individuals, changing item scoring structure, removing misfitting items, addressing bias caused by response dependency between items and differential item functioning (DIF). Initial analysis indicated deviation of the 20-item Arabic LEFS from the Rasch model. Disordered thresholds in eight items and response dependency between six items were detected with the scale as a whole did not meet the requirement of unidimensionality. Refinements led to a 15-item Arabic LEFS that demonstrated excellent internal consistency (person separation index [PSI] = 0.92) and satisfied all the requirement of the Rasch model. Rasch analysis did not support the 20-item Arabic LEFS as a unidimensional measure of lower extremity function. The refined 15-item Arabic LEFS met all the requirement of the Rasch model and hence is a valid objective measure of lower extremity function. The Rasch-validated 15-item Arabic LEFS needs to be further tested in an independent sample to confirm its fit to the Rasch measurement model. Implications for Rehabilitation The validity of the 20-item Arabic Lower Extremity Functional Scale to measure lower extremity function is not supported. The 15-item Arabic version of the LEFS is a valid measure of lower extremity function and can be used to quantify lower extremity function in patients with lower extremity musculoskeletal disorders.
The Academic Resilience Scale (ARS-30): A New Multidimensional Construct Measure.

PubMed

Cassidy, Simon

2016-01-01

Resilience is a psychological construct observed in some individuals that accounts for success despite adversity. Resilience reflects the ability to bounce back, to beat the odds and is considered an asset in human characteristic terms. Academic resilience contextualizes the resilience construct and reflects an increased likelihood of educational success despite adversity. The paper provides an account of the development of a new multidimensional construct measure of academic resilience. The 30 item Academic Resilience Scale (ARS-30) explores process-as opposed to outcome-aspects of resilience, providing a measure of academic resilience based on students' specific adaptive cognitive-affective and behavioral responses to academic adversity. Findings from the study involving a sample of undergraduate students ( N = 532) demonstrate that the ARS-30 has good internal reliability and construct validity. It is suggested that a measure such as the ARS-30, which is based on adaptive responses, aligns more closely with the conceptualisation of resilience and provides a valid construct measure of academic resilience relevant for research and practice in university student populations.
Validity and Reliability of General Nutrition Knowledge Questionnaire for Adults in Uganda

PubMed Central

Bukenya, Richard; Ahmed, Abhiya; Andrade, Jeanette M.; Grigsby-Toussaint, Diana S.; Muyonga, John; Andrade, Juan E.

2017-01-01

This study sought to develop and validate a general nutrition knowledge questionnaire (GNKQ) for Ugandan adults. The initial draft consisted of 133 items on five constructs associated with nutrition knowledge; expert recommendations (16 items), food groups (70 items), selecting food (10 items), nutrition and disease relationship (23 items), and food fortification in Uganda (14 items). The questionnaire validity was evaluated in three studies. For the content validity (study 1), a panel of five content matter nutrition experts reviewed the GNKQ draft before and after face validity. For the face validity (study 2), head teachers and health workers (n = 27) completed the questionnaire before attending one of three focus groups to review the clarity of the items. For the construct and test-rest reliability (study 3), head teachers (n = 40) from private and public primary schools and nutrition (n = 52) and engineering (n = 49) students from Makerere University took the questionnaire twice (two weeks apart). Experts agreed (content validity index, CVI > 0.9; reliability, Gwet’s AC1 > 0.85) that all constructs were relevant to evaluate nutrition knowledge. After the focus groups, 29 items were identified as unclear, requiring major (n = 5) and minor (n = 24) reviews. The final questionnaire had acceptable internal consistency (Cronbach α > 0.95), test-retest reliability (r = 0.89), and differentiated (p < 0.001) nutrition knowledge scores between nutrition (67 ± 5) and engineering (39 ± 11) students. Only the construct on nutrition recommendations was unreliable (Cronbach α = 0.51, test-retest r = 0.55), which requires further optimization. The final questionnaire included topics on food groups (41 items), selecting food (2 items), nutrition and disease relationship (14 items), and food fortification in Uganda (22 items) and had good content, construct, and test-retest reliability to evaluate nutrition knowledge among Ugandan adults. PMID:28230779
More relevant, precise, and efficient items for assessment of physical function and disability: moving beyond the classic instruments

PubMed Central

Fries, J F; Bruce, B; Bjorner, J; Rose, M

2006-01-01

Objectives Patient reported outcomes (PROs) have become standard study endpoints. However, little attention has been given to using item improvement to advance PRO performance which could improve precision, clarity, patient relevance, and information content of “physical function/disability” items and thus the performance of resulting instruments. Methods The present study included1860 physical function/disability items from 165 instruments. Item formulations were assessed by frequency of use, modified Delphi consensus, respondent judgement of clarity and importance, and item response theory (IRT). Data from 1100 rheumatoid arthritis, osteoarthritis, and normal ageing subjects, using qualitative item review, focus groups, cognitive interviews, and patient survey were used to achieve a unique item pool that was clear, reliable, sensitive to change, readily translatable, devoid of floor and ceiling limitations, contained unidimensional subdomains, and had maximal information content. Results A “present tense” time frame was used most frequently, better understood, more readily translated, and more directly estimated the latent trait of disability. Items in the “past tense” had 80–90% false negatives (p<0.001). The best items were brief, clear, and contained a single construct. Responses with four to five options were preferred by both experts and respondents. The term physical function may be preferable to the term disability because of fewer floor effects. IRT analyses of “disability” suggest four independent subdomains (mobility, dexterity, axial, and compound) with factor loadings of 0.81–0.99. Conclusions Major improvement in performance of items and instruments is possible, and may have the effect of substantially reducing sample size requirements for clinical trials. PMID:17038464
Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey.

PubMed

Peyre, Hugo; Leplège, Alain; Coste, Joël

2011-03-01

Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias <2%) in all studied situations. Whereas multiple imputation and full information maximum likelihood are confirmed as reference methods, the personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.

Development, content validity, and piloting of an instrument designed to measure managers' attitude toward workplace breastfeeding support.

PubMed

Chow, Tan; Wolfe, Edward W; Olson, Beth H

2012-07-01

Manager attitude is influential in female employees' perceptions of workplace breastfeeding support. Currently, no instrument is available to assess manager attitude toward supporting women who wish to combine breastfeeding with work. We developed and piloted an instrument to measure manager attitudes toward workplace breastfeeding support entitled the "Managers' Attitude Toward Breastfeeding Support Questionnaire," an instrument that measures four constructs using 60 items that are rated agree/disagree on a 4-point Likert rating scale. We established the content validity of the Managers' Attitude Toward Breastfeeding Support Questionnaire measures through expert content review (n=22), expert assessment of item fit (n=11), and cognitive interviews (n=8). Data were collected from a purposive sample of 185 front-line managers who had experience supervising female employees, and responses were scaled using the Multidimensional Random Coefficients Multinomial Logit Model. Dimensionality analyses supported the proposed four-construct model. Reliability ranged from 0.75 to 0.86, and correlations between the constructs were moderately strong (0.47 to 0.71). Four items in two constructs exhibited model-to-data misfit and/or a low score-measure correlation. One item was revised and the other three items were retained in the Managers' Attitude Toward Breastfeeding Support Questionnaire. Findings of this study suggest that the Managers' Attitude Toward Breastfeeding Support Questionnaire measures are reliable and valid indicators of manager attitude toward workplace breastfeeding support, and future research should be conducted to establish external validity. The Managers' Attitude Toward Breastfeeding Support Questionnaire could be used to collect data in a standardized manner within and across companies to measure and compare manager attitudes toward supporting breastfeeding. Organizations can subsequently develop targeted strategies to improve support for breastfeeding employees through efforts influencing managerial attitude. Copyright © 2012 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
The construction of categorization judgments: using subjective confidence and response latency to test a distributed model.

PubMed

Koriat, Asher; Sorka, Hila

2015-01-01

The classification of objects to natural categories exhibits cross-person consensus and within-person consistency, but also some degree of between-person variability and within-person instability. What is more, the variability in categorization is also not entirely random but discloses systematic patterns. In this study, we applied the Self-Consistency Model (SCM, Koriat, 2012) to category membership decisions, examining the possibility that confidence judgments and decision latency track the stable and variable components of categorization responses. The model assumes that category membership decisions are constructed on the fly depending on a small set of clues that are sampled from a commonly shared population of pertinent clues. The decision and confidence are based on the balance of evidence in favor of a positive or a negative response. The results confirmed several predictions derived from SCM. For each participant, consensual responses to items were more confident than non-consensual responses, and for each item, participants who made the consensual response tended to be more confident than those who made the nonconsensual response. The difference in confidence between consensual and nonconsensual responses increased with the proportion of participants who made the majority response for the item. A similar pattern was observed for response speed. The pattern of results obtained for cross-person consensus was replicated by the results for response consistency when the responses were classified in terms of within-person agreement across repeated presentations. These results accord with the sampling assumption of SCM, that confidence and response speed should be higher when the decision is consistent with what follows from the entire population of clues than when it deviates from it. Results also suggested that the context for classification can bias the sample of clues underlying the decision, and that confidence judgments mirror the effects of context on categorization decisions. The model and results offer a principled account of the stable and variable contributions to categorization behavior within a decision-making framework. Copyright © 2014 Elsevier B.V. All rights reserved.
Examination of an eHealth literacy scale and a health literacy scale in a population with moderate to high cardiovascular risk: Rasch analyses.

PubMed

Richtering, Sarah S; Morris, Rebecca; Soh, Sze-Ee; Barker, Anna; Bampi, Fiona; Neubeck, Lis; Coorey, Genevieve; Mulley, John; Chalmers, John; Usherwood, Tim; Peiris, David; Chow, Clara K; Redfern, Julie

2017-01-01

Electronic health (eHealth) strategies are evolving making it important to have valid scales to assess eHealth and health literacy. Item response theory methods, such as the Rasch measurement model, are increasingly used for the psychometric evaluation of scales. This paper aims to examine the internal construct validity of an eHealth and health literacy scale using Rasch analysis in a population with moderate to high cardiovascular disease risk. The first 397 participants of the CONNECT study completed the electronic health Literacy Scale (eHEALS) and the Health Literacy Questionnaire (HLQ). Overall Rasch model fit as well as five key psychometric properties were analysed: unidimensionality, response thresholds, targeting, differential item functioning and internal consistency. The eHEALS had good overall model fit (χ2 = 54.8, p = 0.06), ordered response thresholds, reasonable targeting and good internal consistency (person separation index (PSI) 0.90). It did, however, appear to measure two constructs of eHealth literacy. The HLQ subscales (except subscale 5) did not fit the Rasch model (χ2: 18.18-60.60, p: 0.00-0.58) and had suboptimal targeting for most subscales. Subscales 6 to 9 displayed disordered thresholds indicating participants had difficulty distinguishing between response options. All subscales did, nonetheless, demonstrate moderate to good internal consistency (PSI: 0.62-0.82). Rasch analyses demonstrated that the eHEALS has good measures of internal construct validity although it appears to capture different aspects of eHealth literacy (e.g. using eHealth and understanding eHealth). Whilst further studies are required to confirm this finding, it may be necessary for these constructs of the eHEALS to be scored separately. The nine HLQ subscales were shown to measure a single construct of health literacy. However, participants' scores may not represent their actual level of ability, as distinction between response categories was unclear for the last four subscales. Reducing the response categories of these subscales may improve the ability of the HLQ to distinguish between different levels of health literacy.
Rasch analyses of the Activities-specific Balance Confidence Scale with individuals 50 years and older with lower limb amputations

PubMed Central

Sakakibara, Brodie M.; Miller, William C.; Backman, Catherine L.

2012-01-01

Objective To explore shortened response formats for use with the Activities-specific Balance Confidence scale and then: 1) evaluate the unidimensionality of the scale; 2) evaluate the item difficulty; 3) evaluate the scale for redundancy and content gaps; and 4) evaluate the item standard error of measurement (SEM) and internal consistency reliability among aging individuals (≥50 years) with a lower-limb amputation living in the community. Design Secondary analysis of cross-sectional survey and chart review data. Setting Out-patient amputee clinics, Ontario, Canada. Participants Four hundred forty eight community living adults, at least 50 years old (mean = 68 years), who have used a prosthesis for at least 6 months for a major unilateral lower limb amputation. Three hundred twenty five (72.5%) were men. Intervention N/a Main Outcome Measure(s) Activities-specific Balance Confidence Scale. Results A 5-option response format outperformed 4- and 6-option formats. Factor analyses confirmed a unidimensional scale. The distance between response options is not the same for all items on the scale, evident by the Partial Credit Model (PCM) having a better fit to the data than the Rating Scale Model. Two items, however, did not fit the PCM within statistical reason. Revising the wording of the two items may resolve the misfit, and improve the construct validity and lower the SEM. Overall, the difficulty of the scale’s items is appropriate for use with aging individuals with lower-limb amputation, and is most reliable (Cronbach ∝ = 0.94) for use with individuals with moderately low balance confidence levels. Conclusions The ABC-scale with a simplified 5-option response format is a valid and reliable measure of balance confidence for use with individuals aging with a lower limb amputation. PMID:21704978
Results of a community-based survey of construction safety climate for Hispanic workers.

PubMed

Marin, Luz S; Cifuentes, Manuel; Roelofs, Cora

2015-01-01

Hispanic construction workers experience high rates of occupational injury, likely influenced by individual, organizational, and social factors. To characterize the safety climate of Hispanic construction workers using worker, contractor, and supervisor perceptions of the workplace. We developed a 40-item interviewer-assisted survey with six safety climate dimensions and administered it in Spanish and English to construction workers, contractors, and supervisors. A safety climate model, comparing responses and assessing contributing factors was created based on survey responses. While contractors and construction supervisors' (n = 128) scores were higher, all respondents shared a negative perception of safety climate. Construction workers had statistically significantly lower safety climate scores compared to supervisors and contractors (30·6 vs 46·5%, P<0·05). Safety climate scores were not associated with English language ability or years lived in the United States. We found that Hispanic construction workers in this study experienced a poor safety climate. The Hispanic construction safety climate model we propose can serve as a framework to guide organizational safety interventions and evaluate safety climate improvements.
Results of a community-based survey of construction safety climate for Hispanic workers

PubMed Central

Marin, Luz S; Cifuentes, Manuel; Roelofs, Cora

2015-01-01

Background: Hispanic construction workers experience high rates of occupational injury, likely influenced by individual, organizational, and social factors. Objectives: To characterize the safety climate of Hispanic construction workers using worker, contractor, and supervisor perceptions of the workplace. Methods: We developed a 40-item interviewer-assisted survey with six safety climate dimensions and administered it in Spanish and English to construction workers, contractors, and supervisors. A safety climate model, comparing responses and assessing contributing factors was created based on survey responses. Results: While contractors and construction supervisors’ (n = 128) scores were higher, all respondents shared a negative perception of safety climate. Construction workers had statistically significantly lower safety climate scores compared to supervisors and contractors (30.6 vs 46.5%, P<0.05). Safety climate scores were not associated with English language ability or years lived in the United States. Conclusions: We found that Hispanic construction workers in this study experienced a poor safety climate. The Hispanic construction safety climate model we propose can serve as a framework to guide organizational safety interventions and evaluate safety climate improvements. PMID:26145454
Use of non-parametric item response theory to develop a shortened version of the Positive and Negative Syndrome Scale (PANSS).

PubMed

Khan, Anzalee; Lewis, Charles; Lindenmayer, Jean-Pierre

2011-11-16

Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity.
Use of NON-PARAMETRIC Item Response Theory to develop a shortened version of the Positive and Negative Syndrome Scale (PANSS)

PubMed Central

2011-01-01

Background Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Methods Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. Results The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. Conclusions The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity. PMID:22087503
Development of the Comprehensive General Parenting Questionnaire for caregivers of 5-13 year olds.

PubMed

Sleddens, Ester F C; O'Connor, Teresia M; Watson, Kathleen B; Hughes, Sheryl O; Power, Thomas G; Thijs, Carel; De Vries, Nanne K; Kremers, Stef P J

2014-02-10

Despite the large number of parenting questionnaires, considerable disagreement exists about how to best assess parenting. Most of the instruments only assess limited aspects of parenting. To overcome this shortcoming, the "Comprehensive General Parenting Questionnaire" (CGPQ) was systematically developed. Such a measure is frequently requested in the area of childhood overweight. First, an item bank of existing parenting measures was created assessing five key parenting constructs that have been identified across multiple theoretical approaches to parenting (Nurturance, Overprotection, Coercive control, Behavioral control, and Structure). Caregivers of 5- to 13-year-olds were asked to complete the online survey in the Netherlands (N = 821), Belgium (N = 435) and the United States (N = 241). In addition, a questionnaire regarding personality characteristics ("Big Five") of the caregiver was administered and parents were asked to report about their child's height and weight. Factor analyses and Item-Response Modeling (IRM) techniques were used to assess the underlying parenting constructs and for item reduction. Correlation analyses were performed to assess the relations between general parenting and personality of the caregivers, adjusting for socio-economic status (SES) indicators, to establish criterion validity. Multivariate linear regressions were performed to examine the associations of SES indicators and parenting with child BMI z-scores. Additionally, we assessed whether scores on the parenting constructs and child BMI z-scores differed depending on SES indicators. The reduced questionnaire (62 items) revealed acceptable fit of our parenting model and acceptable IRM item fit statistics. Caregiver personality was related as hypothesized with the GCPQ parenting constructs. While correcting for SES, overprotection was positively related to child BMI. The negative relationship between structure and BMI was borderline significant. Parents with a high level of education were less likely to use overly forms of controlling parenting (i.e., coercive control and overprotection) and more likely to have children with lower BMI. Based on several author review meetings and cognitive interviews the questionnaire was further modified to an 85-item questionnaire. The GCPQ may facilitate research exploring how parenting influences children's weight-related behaviors. The contextual influence of general parenting is likely to be more profound than its direct relationship with weight status.
Construction of a web-based questionnaire for longitudinal investigation of work exposure, musculoskeletal pain and performance impairments in high-performance marine craft populations.

PubMed

Lo Martire, Riccardo; de Alwis, Manudul Pahansen; Äng, Björn Olov; Garme, Karl

2017-07-20

High-performance marine craft personnel (HPMCP) are regularly exposed to vibration and repeated shock (VRS) levels exceeding maximum limitations stated by international legislation. Whereas such exposure reportedly is detrimental to health and performance, the epidemiological data necessary to link these adverse effects causally to VRS are not available in the scientific literature, and no suitable tools for acquiring such data exist. This study therefore constructed a questionnaire for longitudinal investigations in HPMCP. A consensus panel defined content domains, identified relevant items and outlined a questionnaire. The relevance and simplicity of the questionnaire's content were then systematically assessed by expert raters in three consecutive stages, each followed by revisions. An item-level content validity index (I-CVI) was computed as the proportion of experts rating an item as relevant and simple, and a scale-level content validity index (S-CVI/Ave) as the average I-CVI across items. The thresholds for acceptable content validity were 0.78 and 0.90, respectively. Finally, a dynamic web version of the questionnaire was constructed and pilot tested over a 1-month period during a marine exercise in a study population sample of eight subjects, while accelerometers simultaneously quantified VRS exposure. Content domains were defined as work exposure, musculoskeletal pain and human performance, and items were selected to reflect these constructs. Ratings from nine experts yielded S-CVI/Ave of 0.97 and 1.00 for relevance and simplicity, respectively, and the pilot test suggested that responses were sensitive to change in acceleration and that the questionnaire, following some adjustments, was feasible for its intended purpose. A dynamic web-based questionnaire for longitudinal survey of key variables in HPMCP was constructed. Expert ratings supported that the questionnaire content is relevant, simple and sufficiently comprehensive, and the pilot test suggested that the questionnaire is feasible for longitudinal measurements in the study population. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Development and psychometric evaluation of a health-related quality of life instrument for individuals with adult-onset hearing loss.

PubMed

Stika, Carren J; Hays, Ron D

2015-07-01

Self-reports of 'hearing handicap' are available, but a comprehensive measure of health-related quality of life (HRQOL) for individuals with adult-onset hearing loss (AOHL) does not exist. Our objective was to develop and evaluate a multidimensional HRQOL instrument for individuals with AOHL. The Impact of Hearing Loss Inventory Tool (IHEAR-IT) was developed using results of focus groups, a literature review, advisory expert panel input, and cognitive interviews. The 73-item field-test instrument was completed by 409 adults (22-91 years old) with varying degrees of AOHL and from different areas of the USA. Multitrait scaling analysis supported four multi-item scales and five individual items. Internal consistency reliabilities ranged from 0.93 to 0.96 for the scales. Construct validity was supported by correlations between the IHEAR-IT scales and scores on the 36-item Short Form Health Survey, version 2.0 (SF-36v2) mental composite summary (r = 0.32-0.64) and the Hearing Handicap Inventory for the Elderly/Adults (HHIE/HHIA) (r ≥ -0.70). The field test provides initial support for the reliability and construct validity of the IHEAR-IT for evaluating HRQOL of individuals with AOHL. Further research is needed to evaluate the responsiveness to change of the IHEAR-IT scales and identify items for a short-form.
Preliminary Study of the Autism Self-Efficacy Scale for Teachers (ASSET).

PubMed

Ruble, Lisa A; Toland, Michael D; Birdwhistell, Jessica L; McGrew, John H; Usher, Ellen L

2013-09-01

The purpose of the current study was to evaluate a new measure, the Autism Self-Efficacy Scale for Teachers (ASSET) for its dimensionality, internal consistency, and construct validity derived in a sample of special education teachers ( N = 44) of students with autism. Results indicate that all items reflect one dominant factor, teachers' responses to items were internally consistent within the sample, and compared to a 100-point scale, a 6-point response scale is adequate. ASSET scores were found to be negatively correlated with scores on two subscale measures of teacher stress (i.e., self-doubt/need for support and disruption of the teaching process) but uncorrelated with teacher burnout scores. The ASSET is a promising tool that requires replication with larger samples.
[Design and validation of the CSR-Hospital-SP scale to measure corporate social responsibility].

PubMed

Mira, José Joaquín; Lorenzo, Susana; Navarro, Isabel; Pérez-Jover, Virtudes; Vitaller, Julián

2013-01-01

To design and validate a scale (CSR-Hospital-SP) to determine health professionals' views on the approach of management to corporate social responsibility (CSR) in their hospital. The literature was reviewed to identify the main CSR scales and select the dimensions to be evaluated. The initial version of the scale consisted of 25 items. A convenience sample of a minimum of 224 health professionals working in five public hospitals in five autonomous regions were invited to respond. Floor and ceiling effects, internal consistency, reliability, and construct validity were analyzed. A total of 233 health professionals responded. The CSR-Hospital-SP scale had 20 items grouped into four factors. The item-total correlation was higher than 0.30; all factor loadings were greater than 0.50; 59.57% of the variance was explained; Cronbach's alpha was 0.90; Spearman-Brown's coefficient was 0.82. The CSR-Hospital-SP scale is a tool designed for hospitals that implement accountability mechanisms and promote socially responsible management approaches. Copyright © 2012 SESPAS. Published by Elsevier Espana. All rights reserved.
Method for automatic measurement of second language speaking proficiency

NASA Astrophysics Data System (ADS)

Bernstein, Jared; Balogh, Jennifer

2005-04-01

Spoken language proficiency is intuitively related to effective and efficient communication in spoken interactions. However, it is difficult to derive a reliable estimate of spoken language proficiency by situated elicitation and evaluation of a person's communicative behavior. This paper describes the task structure and scoring logic of a group of fully automatic spoken language proficiency tests (for English, Spanish and Dutch) that are delivered via telephone or Internet. Test items are presented in spoken form and require a spoken response. Each test is automatically-scored and primarily based on short, decontextualized tasks that elicit integrated listening and speaking performances. The tests present several types of tasks to candidates, including sentence repetition, question answering, sentence construction, and story retelling. The spoken responses are scored according to the lexical content of the response and a set of acoustic base measures on segments, words and phrases, which are scaled with IRT methods or parametrically combined to optimize fit to human listener judgments. Most responses are isolated spoken phrases and sentences that are scored according to their linguistic content, their latency, and their fluency and pronunciation. The item development procedures and item norming are described.
Transformational, transactional among physician and laissez-faire leadership among physician executives.

PubMed

Xirasagar, Sudha

2008-01-01

The purpose of this paper is to examine the empirical validity of transformational, transactional and laissez-faire leadership and their sub-scales among physician managers. A nation-wide, anonymous mail survey was carried out in the United States, requesting community health center executive directors to provide ratings of their medical director's leadership behaviors (34 items) and effectiveness (nine items), using the Multifactor Leadership Questionnaire 5X-Short, on a five-point Likert scale. The survey response rate was 40.9 percent, for a total 269 responses. Exploratory factor analysis was done, using principal factor extraction, followed by promax rotation). The data yielded a three-factor structure, generally aligned with Bass and Avolio's constructs of transformational, transactional and laissez-faire leadership. Data do not support the factorial independence of their subscales (idealized influence, inspirational motivation, individualized consideration, and intellectual stimulation under transformational leadership; contingent reward, management-by-exception active, and management-by-exception passive under transactional leadership). Two contingent reward items loaded on transformational leadership, and all items of management-by-exception passive loaded on laissez-faire. A key limitation is that supervisors were surveyed for ratings of the medical directors' leadership style. Although past research in other fields has shown that supervisor ratings are strongly correlated with subordinate ratings, further research is needed to validate the findings by surveying physician and other clinical subordinates. Such research will also help to develop appropriate content of leadership training for clinical leaders. This study represents an important step towards establishing the empirical evidence for the full range of leadership constructs among physician leaders.
Construct validity of the Swedish version of the revised piper fatigue scale in an oncology sample--a Rasch analysis.

PubMed

Lundgren-Nilsson, Asa; Dencker, Anna; Jakobsson, Sofie; Taft, Charles; Tennant, Alan

2014-06-01

Fatigue is a common and distressing symptom in cancer patients due to both the disease and its treatments. The concept of fatigue is multidimensional and includes both physical and mental components. The 22-item Revised Piper Fatigue Scale (RPFS) is a multidimensional instrument developed to assess cancer-related fatigue. This study reports on the construct validity of the Swedish version of the RPFS from the perspective of Rasch measurement. The Swedish version of the RPFS was answered by 196 cancer patients fatigued after 4 to 5 weeks of curative radiation therapy. Data from the scale were fitted to the Rasch measurement model. This involved testing a series of assumptions, including the stochastic ordering of items, local response dependency, and unidimensionality. A series of fit statistics were computed, differential item functioning (DIF) was tested, and local response dependency was accommodated through testlets. The Behavioral, Affective and Sensory domains all satisfied the Rasch model expectations. No DIF was observed, and all domains were found to be unidimensional. The Mood/Cognitive scale failed to fit the model, and substantial multidimensionality was found. Splitting the scale between Mood and Cognitive items resolved fit to the Rasch model, and new domains were unidimensional without DIF. The current Rasch analyses add to the evidence of measurement properties of the scale and show that the RPFS has good psychometric properties and works well to measure fatigue. The original four-factor structure, however, was not supported. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Construction and Validation of a Women's Autonomy Measurement Scale with Reference to Utilization of Maternal Health Care Services in Nepal.

PubMed

Bhandari, T R; Dangal, G; Sarma, P S; Kutty, V R

2014-01-01

Women's autonomy is one of the predictors of maternal health care service utilization. This study aimed to construct and validate a scale for measuring women's autonomy with relevance to developing countries. We conducted a study for construction and validation of a scale in Rupandehi and further validated in Kapilvastu districts of Nepal. Initially, we administered a 24-item preliminary scale and finalized a 23-item scale using psychometric tests. After defining the construct of women's autonomy, we pooled 194 items and selected 24 items to develop a preliminary scale. The scale development process followed different steps i.e. definition of construct, generation of items pool, pretesting, analysis of psychometric test and further validation. The new scale was strongly supported by Cronbach's Alpha value (0.84), test-retest Pearson correlation (0.87), average content validity ratio (0.8) and overall agreement- Kappa value of the items (0.83) whereas all values were found satisfactory. From factor analysis, we selected 23 items for the final scale which show good convergent and discriminant validity. From preliminary draft, we removed one item; the remaining 23 items were loaded in five factors. All five factors had single loading items by suppressing absolute coefficient value less than 0.45 and average coefficient was more than 0.60 of each factor. Similarly, the factors and loaded items had good convergent and discriminant validity which further showed strong measurement capacity of the scale. The new scale is a reliable tool for assessing women's autonomy in developing countries. We recommend for further use and validation of the scale for ensuring the measurement capacity.
Piers Harris and Coopersmith Measure of Self-Esteem: A Comparative Analysis

ERIC Educational Resources Information Center

Lynch, Mervin D.; Foley-Peres, Kathleen D.; Sullivan, Stefanie S.

2008-01-01

The purposes of this study were to see if the items from the Piers Harris Self Concept Scale and the Coopersmith Self Esteem Inventory had construct and predictive validity. Items used in this study were 50 items from the Coopersmith Self-Esteem Inventory and 80 items from the Piers Harris Self-Concept Scale. Construct measures were obtained using…
Development of Elderly Quality of Life Index – Eqoli: Item Reduction and Distribution into Dimensions

PubMed Central

Paschoal, Sérgio Márcio Pacheco; Filho, Wilson Jacob; Litvoc, Júlio

2008-01-01

OBJECTIVE To describe item reduction and its distribution into dimensions in the construction process of a quality of life evaluation instrument for the elderly. METHODS The sampling method was chosen by convenience through quotas, with selection of elderly subjects from four programs to achieve heterogeneity in the “health status”, “functional capacity”, “gender”, and “age” variables. The Clinical Impact Method was used, consisting of the spontaneous and elicited selection by the respondents of relevant items to the construct Quality of Life in Old Age from a previously elaborated item pool. The respondents rated each item’s importance using a 5-point Likert scale. The product of the proportion of elderly selecting the item as relevant (frequency) and the mean importance score they attributed to it (importance) represented the overall impact of that item in their quality of life (impact). The items were ordered according to their impact scores and the top 46 scoring items were grouped in dimensions by three experts. A review of the negative items was performed. RESULTS One hundred and ninety three people (122 women and 71 men) were interviewed. Experts distributed the 46 items into eight dimensions. Closely related items were grouped and dimensions not reaching the minimum expected number of items received additional items resulting in eight dimensions and 43 items. DISCUSSION The sample was heterogeneous and similar to what was expected. The dimensions and items demonstrated the multidimensionality of the construct. The Clinical Impact Method was appropriate to construct the instrument, which was named Elderly Quality of Life Index - EQoLI. An accuracy process will be examined in the future. PMID:18438571
Item analysis of the Spanish version of the Boston Naming Test with a Spanish speaking adult population from Colombia.

PubMed

Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos

2018-02-23

The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.

Development of a questionnaire for assessing the childbirth experience (QACE).

PubMed

Carquillat, Pierre; Vendittelli, Françoise; Perneger, Thomas; Guittier, Marie-Julia

2017-08-30

Due to its potential impact on women's psychological health, assessing perceptions of their childbirth experience is important. The aim of this study was to develop a multidimensional self-reporting questionnaire to evaluate the childbirth experience. Factors influencing the childbirth experience were identified from a literature review and the results of a previous qualitative study. A total of 25 items were combined from existing instruments or were created de novo. A draft version was pilot tested for face validity with 30 women and submitted for evaluation of its construct validity to 477 primiparous women at one-month post-partum. The recruitment took place in two obstetric clinics from Swiss and French university hospitals. To evaluate the content validity, we compared item responses to general childbirth experience assessments on a numeric, 0 to 10 rating scale. We dichotomized two group assessment scores: "0 to 7" and "8 to 10". We performed an exploratory factor analysis to identify underlying dimensions. In total, 291 women completed the questionnaire (response rate = 61%). The responses to 22 items were statistically significant between the 0 to 7 and 8 to 10 groups for the general childbirth experience assessments. An exploratory factor analysis yielded four sub-scales, which were labelled "relationship with staff" (4 items), "emotional status" (3 items), "first moments with the new born," (3 items) and "feelings at one month postpartum" (3 items). All 4 scales had satisfactory internal consistency levels (alpha coefficients from 0.70 to 0.85). The full 25-item version can be used to analyse each item by itself, and the short 4-dimension version can be scored to summarize the general assessment of the childbirth experience. The Questionnaire for Assessing the Childbirth Experience (QACE) could be useful as a screening instrument to identify women with negative childbirth experiences. It can be used as both a research instrument in its short version and a questionnaire for use in clinical practice in its full version.
Use of a safety climate questionnaire in UK health care: factor structure, reliability and usability.

PubMed

Hutchinson, A; Cooper, K L; Dean, J E; McIntosh, A; Patterson, M; Stride, C B; Laurence, B E; Smith, C M

2006-10-01

To explore the factor structure, reliability, and potential usefulness of a patient safety climate questionnaire in UK health care. Four acute hospital trusts and nine primary care trusts in England. The questionnaire used was the 27 item Teamwork and Safety Climate Survey. Thirty three healthcare staff commented on the wording and relevance. The questionnaire was then sent to 3650 staff within the 13 NHS trusts, seeking to achieve at least 600 responses as the basis for the factor analysis. 1307 questionnaires were returned (36% response). Factor analyses and reliability analyses were carried out on 897 responses from staff involved in direct patient care, to explore how consistently the questions measured the underlying constructs of safety climate and teamwork. Some questionnaire items related to multiple factors or did not relate strongly to any factor. Five items were discarded. Two teamwork factors were derived from the remaining 11 teamwork items and three safety climate factors were derived from the remaining 11 safety items. Internal consistency reliabilities were satisfactory to good (Cronbach's alpha > or =0.69 for all five factors). This is one of the few studies to undertake a detailed evaluation of a patient safety climate questionnaire in UK health care and possibly the first to do so in primary as well as secondary care. The results indicate that a 22 item version of this safety climate questionnaire is useable as a research instrument in both settings, but also demonstrates a more general need for thorough validation of safety climate questionnaires before widespread usage.
The development of the 'Quality-of-life for Respiratory Illness Questionnaire (QOL-RIQ)': a disease-specific quality-of-life questionnaire for patients with mild to moderate chronic non-specific lung disease.

PubMed

Maillé, A R; Koning, C J; Zwinderman, A H; Willems, L N; Dijkman, J H; Kaptein, A A

1997-05-01

Chronic non-specific lung disease (CNSLD) encompasses asthma as well as chronic obstructive pulmonary disease (COPD). Recently in health care, there has been increasing awareness in the functional, psychological and social aspects of the health of patients; their quality of life (QOL). Quality-of-life research addressing CNSLD patients has been rather underdeveloped for a long period of time. Recently, however, the importance of QOL is being increasingly recognized, and several research groups have started to study QOL in CNSLD patients in more detail. This paper describes the construction of a disease-specific QOL instrument for patients with mild to moderately severe CNSLD. Items relating to several domains of QOL were listed, and 171 CNSLD patients in general practice were asked how much of a problem each item had been (assessed on a seven-point Likert scale). After applying an item-selection procedure, a uni-dimensional QOL questionnaire was constructed consisting of 55 items divided into seven domain subscales: breathing problems, physical problems, emotions, situations triggering or enhancing breathing problems, general activities, daily and domestic activities, and social activities, relationships and sexuality. Reliability estimates of the domain subscales of the constructed questionnaire varied from 0.68 to 0.89, and was 0.92 for the QOL for Respiratory Illness Questionnaire (QOL-RIQ) total scale. A first impression of the construct validity of the questionnaire was gained by investigation of the relationship between the QOL domain subscales and several indicators of illness severity, as well as the relative contribution of illness severity variables, background characteristics and symptoms to QOL, using regression analysis. Further research to validate the questionnaire to a greater extent (construct validity, test-retest reliability and responsiveness to change) is currently taking place.
Validity and Reliability of the US National Cancer Institute's Patient-Reported Outcomes Version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE).

PubMed

Dueck, Amylou C; Mendoza, Tito R; Mitchell, Sandra A; Reeve, Bryce B; Castro, Kathleen M; Rogak, Lauren J; Atkinson, Thomas M; Bennett, Antonia V; Denicoff, Andrea M; O'Mara, Ann M; Li, Yuelin; Clauser, Steven B; Bryant, Donna M; Bearden, James D; Gillis, Theresa A; Harness, Jay K; Siegel, Robert D; Paul, Diane B; Cleeland, Charles S; Schrag, Deborah; Sloan, Jeff A; Abernethy, Amy P; Bruner, Deborah W; Minasian, Lori M; Basch, Ethan

2015-11-01

To integrate the patient perspective into adverse event reporting, the National Cancer Institute developed a patient-reported outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE). To assess the construct validity, test-retest reliability, and responsiveness of PRO-CTCAE items. A total of 975 adults with cancer undergoing outpatient chemotherapy and/or radiation therapy enrolled in this questionnaire-based study between January 2011 and February 2012. Eligible participants could read English and had no clinically significant cognitive impairment. They completed PRO-CTCAE items on tablet computers in clinic waiting rooms at 9 US cancer centers and community oncology practices at 2 visits 1 to 6 weeks apart. A subset completed PRO-CTCAE items during an additional visit 1 business day after the first visit. Primary comparators were clinician-reported Eastern Cooperative Oncology Group Performance Status (ECOG PS) and the European Organisation for Research and Treatment of Cancer Core Quality of Life Questionnaire (QLQ-C30). A total of 940 of 975 (96.4%) and 852 of 940 (90.6%) participants completed PRO-CTCAE items at visits 1 and 2, respectively. At least 1 symptom was reported by 938 of 940 (99.8%) participants. Participants' median age was 59 years; 57.3% were female, 32.4% had a high school education or less, and 17.1% had an ECOG PS of 2 to 4. All PRO-CTCAE items had at least 1 correlation in the expected direction with a QLQ-C30 scale (111 of 124, P<.05 for all). Stronger correlations were seen between PRO-CTCAE items and conceptually related QLQ-C30 domains. Scores for 94 of 124 PRO-CTCAE items were higher in the ECOG PS 2 to 4 vs 0 to 1 group (58 of 124, P<.05 for all). Overall, 119 of 124 items met at least 1 construct validity criterion. Test-retest reliability was 0.7 or greater for 36 of 49 prespecified items (median [range] intraclass correlation coefficient, 0.76 [0.53-.96]). Correlations between PRO-CTCAE item changes and corresponding QLQ-C30 scale changes were statistically significant for 27 prespecified items (median [range] r=0.43 [0.10-.56]; all P≤.006). Evidence demonstrates favorable validity, reliability, and responsiveness of PRO-CTCAE in a large, heterogeneous US sample of patients undergoing cancer treatment. Studies evaluating other measurement properties of PRO-CTCAE are under way to inform further development of PRO-CTCAE and its inclusion in cancer trials.
Improving the evaluation of therapeutic interventions in multiple sclerosis: the role of new psychometric methods.

PubMed

Hobart, J; Cano, S

2009-02-01

In this monograph we examine the added value of new psychometric methods (Rasch measurement and Item Response Theory) over traditional psychometric approaches by comparing and contrasting their psychometric evaluations of existing sets of rating scale data. We have concentrated on Rasch measurement rather than Item Response Theory because we believe that it is the more advantageous method for health measurement from a conceptual, theoretical and practical perspective. Our intention is to provide an authoritative document that describes the principles of Rasch measurement and the practice of Rasch analysis in a clear, detailed, non-technical form that is accurate and accessible to clinicians and researchers in health measurement. A comparison was undertaken of traditional and new psychometric methods in five large sets of rating scale data: (1) evaluation of the Rivermead Mobility Index (RMI) in data from 666 participants in the Cannabis in Multiple Sclerosis (CAMS) study; (2) evaluation of the Multiple Sclerosis Impact Scale (MSIS-29) in data from 1725 people with multiple sclerosis; (3) evaluation of test-retest reliability of MSIS-29 in data from 150 people with multiple sclerosis; (4) examination of the use of Rasch analysis to equate scales purporting to measure the same health construct in 585 people with multiple sclerosis; and (5) comparison of relative responsiveness of the Barthel Index and Functional Independence Measure in data from 1400 people undergoing neurorehabilitation. Both Rasch measurement and Item Response Theory are conceptually and theoretically superior to traditional psychometric methods. Findings from each of the five studies show that Rasch analysis is empirically superior to traditional psychometric methods for evaluating rating scales, developing rating scales, analysing rating scale data, understanding and measuring stability and change, and understanding the health constructs we seek to quantify. There is considerable added value in using Rasch analysis rather than traditional psychometric methods in health measurement. Future research directions include the need to reproduce our findings in a range of clinical populations, detailed head-to-head comparisons of Rasch analysis and Item Response Theory, and the application of Rasch analysis to clinical practice.
The Instructional Effects of Matching or Mismatching Lesson and Posttest Screen Color

ERIC Educational Resources Information Center

Clariana, Roy B.

2004-01-01

This investigation considers the instructional effects of color as an over-arching context variable when learning from computer displays. The purpose of this investigation is to examine the posttest retrieval effects of color as a local, extra-item non-verbal lesson context variable for constructed-response versus multiple-choice posttest…
An Alternative Methodology for Creating Parallel Test Forms Using the IRT Information Function.

ERIC Educational Resources Information Center

Ackerman, Terry A.

The purpose of this paper is to report results on the development of a new computer-assisted methodology for creating parallel test forms using the item response theory (IRT) information function. Recently, several researchers have approached test construction from a mathematical programming perspective. However, these procedures require…
Using Rasch Analysis to Identify Uncharacteristic Responses to Undergraduate Assessments

ERIC Educational Resources Information Center

Edwards, Antony; Alcock, Lara

2010-01-01

Rasch Analysis is a statistical technique that is commonly used to analyse both test data and Likert survey data, to construct and evaluate question item banks, and to evaluate change in longitudinal studies. In this article, we introduce the dichotomous Rasch model, briefly discussing its assumptions. Then, using data collected in an…
Self-Rating and Respondent Anonymity

ERIC Educational Resources Information Center

Goh, Jonathan W. P.; Lee, Ong Kim; Salleh, Hairon

2010-01-01

Background: Most empirical investigations in survey research have been conducted using self-reported or self-evaluated item responses. Such measures are common because they are relatively easy to obtain and are often the only feasible way to assess constructs of interest. In order to improve on the validity of self-reports it has become a common…
46 CFR 298.21 - Limits.

Code of Federal Regulations, 2013 CFR

2013-10-01

... customarily be capitalized as Vessel or Shipyard Project construction costs such as designing, engineering...) Cost items include those items usually specified in Vessel or Shipyard Project construction contracts... fees and interest on the Obligations or other borrowings incurred during the construction period...
46 CFR 298.21 - Limits.

Code of Federal Regulations, 2012 CFR

2012-10-01

... customarily be capitalized as Vessel or Shipyard Project construction costs such as designing, engineering...) Cost items include those items usually specified in Vessel or Shipyard Project construction contracts... fees and interest on the Obligations or other borrowings incurred during the construction period...
46 CFR 298.21 - Limits.

Code of Federal Regulations, 2014 CFR

2014-10-01

... customarily be capitalized as Vessel or Shipyard Project construction costs such as designing, engineering...) Cost items include those items usually specified in Vessel or Shipyard Project construction contracts... fees and interest on the Obligations or other borrowings incurred during the construction period...
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14).

PubMed

Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi

2015-01-01

Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach's alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice.
The Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES): item response theory findings.

PubMed

Grigg, Kaine; Manderson, Lenore

2016-03-17

Racism and associated discrimination are pervasive and persistent challenges with multiple cumulative deleterious effects contributing to inequities in various health outcomes. Globally, research over the past decade has shown consistent associations between racism and negative health concerns. Such research confirms that race endures as one of the strongest predictors of poor health. Due to the lack of validated Australian measures of racist attitudes, RACES (Racism, Acceptance, and Cultural-Ethnocentrism Scale) was developed. Here, we examine RACES' psychometric properties, including the latent structure, utilising Item Response Theory (IRT). Unidimensional and Multidimensional Rating Scale Model (RSM) Rasch analyses were utilised with 296 Victorian primary school students and 182 adolescents and 220 adults from the Australian community. RACES was demonstrated to be a robust 24-item three-dimensional scale of Accepting Attitudes (12 items), Racist Attitudes (8 items), and Ethnocentric Attitudes (4 items). RSM Rasch analyses provide strong support for the instrument as a robust measure of racist attitudes in the Australian context, and for the overall factorial and construct validity of RACES across primary school children, adolescents, and adults. RACES provides a reliable and valid measure that can be utilised across the lifespan to evaluate attitudes towards all racial, ethnic, cultural, and religious groups. A core function of RACES is to assess the effectiveness of interventions to reduce community levels of racism and in turn inequities in health outcomes within Australia.
Is the Berg Balance Scale an effective tool for the measurement of early postural control impairments in patients with Parkinson's disease? Evidence from Rasch analysis.

PubMed

La Porta, F; Giordano, A; Caselli, S; Foti, C; Franchignoni, F

2015-12-01

It is unclear whether the BBS is an effective tool for the measurement of early postural control impairments in patients with Parkinson's disease (PD). The aim of this paper was to evaluate BBS' content validity, internal construct validity, reliability and targeting in patients with PD within the Rasch analysis framework. Observational, cross-sectional study. Outpatient Rehabilitation Unit. A sample of 285 outpatients with PD. The content validity of the BBS was assessed using standard linking techniques. The BBS was administered by trained physiotherapists. The data collected then underwent Rasch analysis. Content validity analysis showed a lack of items assessing postural responses to tripping and slips and stability during walking. On Rasch analysis, the BBS failed the requirements of monotonicity, local independence, unidimensionality and invariance. After rescoring 7 items, grouping of locally dependent items into testlets, and deletion of the static sitting balance item because mistargeted and underdiscriminating, the Rasch-modified BBS for PD (BBS-PD) showed adequate internal construct validity (χ(2)24=39.693; P=0.023), including absence of differential item functioning (DIF) across gender and age, and was, as a whole, sufficiently precise for individual person measurement (PSI=0.894). However, the scale was not well targeted to the sample in view of the prevalence of higher scores. This study demonstrated the internal construct validity and reliability of the BBS-PD as a measurement tool for patients with PD within the Rasch analysis framework. However, the lack of items critical to the assessment of postural control impairments typical of PD, affected negatively the targeting, so that a significant percentage of patients was located in the higher ability range of the measurement continuum, where precision of measurement is reduced. These findings suggest that the BBS, even if modified, may not be an effective tool for the measurement of early postural control in patients with PD.
Stroke Self-efficacy Questionnaire: a Rasch-refined measure of confidence post stroke.

PubMed

Riazi, Afsane; Aspden, Trefor; Jones, Fiona

2014-05-01

Measuring self-efficacy during rehabilitation provides an important insight into understanding recovery post stroke. A Rasch analysis of the Stroke Self-efficacy Questionnaire (SSEQ) was undertaken to establish its use as a clinically meaningful and scientifically rigorous measure. One hundred and eighteen stroke patients completed the SSEQ with the help of an interviewer. Participants were recruited from local acute stroke units and community stroke rehabilitation teams. Data were analysed with confirmatory factor analysis conducted using AMOS and Rasch analysis conducted using RUMM2030 software. Confirmatory factor analysis and Rasch analyses demonstrated the presence of two separate scales that measure stroke survivors' self-efficacy with: i) self-management and ii) functional activities. Guided by Rasch analyses, the response categories of these two scales were collapsed from an 11-point to a 4-point scale. Modified scales met the expectations of the Rasch model. Items satisfied the Rasch requirements (overall and individual item fit, local response independence, differential item functioning, unidimensionality). Furthermore, the two subscales showed evidence of good construct validity. The new SSEQ has good psychometric properties and is a clinically useful assessment of self-efficacy after stroke. The scale measures stroke survivors' self-efficacy with self-management and activities as two unidimensional constructs. It is recommended for use in clinical and research interventions, and in evaluating stroke self-management interventions.
Efficient Algorithms for Segmentation of Item-Set Time Series

NASA Astrophysics Data System (ADS)

Chundi, Parvathi; Rosenkrantz, Daniel J.

We propose a special type of time series, which we call an item-set time series, to facilitate the temporal analysis of software version histories, email logs, stock market data, etc. In an item-set time series, each observed data value is a set of discrete items. We formalize the concept of an item-set time series and present efficient algorithms for segmenting a given item-set time series. Segmentation of a time series partitions the time series into a sequence of segments where each segment is constructed by combining consecutive time points of the time series. Each segment is associated with an item set that is computed from the item sets of the time points in that segment, using a function which we call a measure function. We then define a concept called the segment difference, which measures the difference between the item set of a segment and the item sets of the time points in that segment. The segment difference values are required to construct an optimal segmentation of the time series. We describe novel and efficient algorithms to compute segment difference values for each of the measure functions described in the paper. We outline a dynamic programming based scheme to construct an optimal segmentation of the given item-set time series. We use the item-set time series segmentation techniques to analyze the temporal content of three different data sets—Enron email, stock market data, and a synthetic data set. The experimental results show that an optimal segmentation of item-set time series data captures much more temporal content than a segmentation constructed based on the number of time points in each segment, without examining the item set data at the time points, and can be used to analyze different types of temporal data.
Do the Guideline Violations Influence Test Difficulty of High-Stake Test?: An Investigation on University Entrance Examination in Turkey

ERIC Educational Resources Information Center

Atalmis, Erkan Hasan

2016-01-01

Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
Construction cost forecast model : model documentation and technical notes.

DOT National Transportation Integrated Search

2013-05-01

Construction cost indices are generally estimated with Laspeyres, Paasche, or Fisher indices that allow changes : in the quantities of construction bid items, as well as changes in price to change the cost indices of those items. : These cost indices...
Developing an Assessment Method of Active Aging: University of Jyvaskyla Active Aging Scale.

PubMed

Rantanen, Taina; Portegijs, Erja; Kokko, Katja; Rantakokko, Merja; Törmäkangas, Timo; Saajanaho, Milla

2018-01-01

To develop an assessment method of active aging for research on older people. A multiphase process that included drafting by an expert panel, a pilot study for item analysis and scale validity, a feedback study with focus groups and questionnaire respondents, and a test-retest study. Altogether 235 people aged 60 to 94 years provided responses and/or feedback. We developed a 17-item University of Jyvaskyla Active Aging Scale with four aspects in each item (goals, ability, opportunity, and activity; range 0-272). The psychometric and item properties are good and the scale assesses a unidimensional latent construct of active aging. Our scale assesses older people's striving for well-being through activities pertaining to their goals, abilities, and opportunities. The University of Jyvaskyla Active Aging Scale provides a quantifiable measure of active aging that may be used in postal questionnaires or interviews in research and practice.

Development and initial validation of a brief self-report measure of cognitive dysfunction in fibromyalgia.

PubMed

Kratz, Anna L; Schilling, Stephen G; Goesling, Jenna; Williams, David A

2015-06-01

Pain is often the focus of research and clinical care in fibromyalgia (FM); however, cognitive dysfunction is also a common, distressing, and disabling symptom in FM. Current efforts to address this problem are limited by the lack of a comprehensive, valid measure of subjective cognitive dysfunction in FM that is easily interpretable, accessible, and brief. The purpose of this study was to leverage cognitive functioning item banks that were developed as part of the Patient Reported Outcomes Measurement Information System (PROMIS) to devise a 10-item short form measure of cognitive functioning for use in FM. In study 1, a nationwide (U.S.) sample of 1,035 adults with FM (age range = 18-82, 95.2% female) completed 2 cognitive item pools. Factor analyses and item response theory analyses were used to identify dimensionality and optimally performing items. A recommended 10-item measure, called the Multidimensional Inventory of Subjective Cognitive Impairment (MISCI) was created. In study 2, 232 adults with FM completed the MISCI and a legacy measure of cognitive functioning that is used in FM clinical trials, the Multiple Ability Self-Report Questionnaire (MASQ). The MISCI showed excellent internal reliability, low ceiling/floor effects, and good convergent validity with the MASQ (r = -.82). This paper presents the MISCI, a 10-item measure of cognitive dysfunction in FM, developed through classical test theory and item response theory. This brief but comprehensive measure shows evidence of excellent construct validity through large correlations with a lengthy legacy measure of cognitive functioning. Copyright © 2015 American Pain Society. Published by Elsevier Inc. All rights reserved.
Cultural Resources Survey of Three Iberville Parish Levee Enlargement and Revetment Construction Items

DTIC Science & Technology

1993-09-22

SURVEY OF THREE IBERVILLE PARISH LEVEE ENLARGEMENT AND REVETMENT CONSTRUCTION ITEMS September 1993 Sam .4 D2 FINAL REPORT E R. Christopher Goodwin...LEVEE ENLARGEMENT ANj REVETMENT CONSTRUCTION ITEMS 12. PERSONAL AUTHOR(S) R. Christopher Goodwin, Ph.d., Rebecca E. Bruce, Lawrence L Hewitt, and E... block number) FIELD GROUP SUB-GROUP Acadian Coast Historic Arche6cogy Rice Antebellum Iberville Parish Saw Mill Plantation Carville Leprosarium Ophelia
The measurement of threat orientations.

PubMed

Thompson, Suzanne C; Schlehofer, Michèle M; Bovin, Michelle J

2006-01-01

To develop measures of 3 threat orientations that affect responses to health behavior messages. In Study 1, college students (N = 47) completed items assessing threat orientations and health behaviors. In Study 2, college students and community adults (N = 110) completed the threat orientation items and measures of convergent and discriminant validity. In Study 1, the control-based, denial-based, and heightened-sensitivity-based threat orientation scales demonstrated good internal consistency and correlated with engagement in health behaviors. In Study 2, the convergent and discriminant validity of the 3 measures was established. The 3 scales have good internal reliability and construct validity.
The (mis)measurement of the Dark Triad Dirty Dozen: exploitation at the core of the scale

PubMed Central

Kajonius, Petri J.; Persson, Björn N.; Rosenberg, Patricia

2016-01-01

Background. The dark side of human character has been conceptualized in the Dark Triad Model: Machiavellianism, psychopathy, and narcissism. These three dark traits are often measured using single long instruments for each one of the traits. Nevertheless, there is a necessity of short and valid personality measures in psychological research. As an independent research group, we replicated the factor structure, convergent validity and item response for one of the most recent and widely used short measures to operationalize these malevolent traits, namely, Jonason’s Dark Triad Dirty Dozen. We aimed to expand the understanding of what the Dirty Dozen really captures because the mixed results on construct validity in previous research. Method. We used the largest sample to date to respond to the Dirty Dozen (N = 3,698). We firstly investigated the factor structure using Confirmatory Factor Analysis and an exploratory distribution analysis of the items in the Dirty Dozen. Secondly, using a sub-sample (n = 500) and correlation analyses, we investigated the Dirty Dozen dark traits convergent validity to Machiavellianism measured by the Mach-IV, psychopathy measured by Eysenck’s Personality Questionnaire Revised, narcissism using the Narcissism Personality Inventory, and both neuroticism and extraversion from the Eysenck’s questionnaire. Finally, besides these Classic Test Theory analyses, we analyzed the responses for each Dirty Dozen item using Item Response Theory (IRT). Results. The results confirmed previous findings of a bi-factor model fit: one latent core dark trait and three dark traits. All three Dirty Dozen traits had a striking bi-modal distribution, which might indicate unconcealed social undesirability with the items. The three Dirty Dozen traits did converge too, although not strongly, with the contiguous single Dark Triad scales (r between .41 and .49). The probabilities of filling out steps on the Dirty Dozen narcissism-items were much higher than on the Dirty Dozen items for Machiavellianism and psychopathy. Overall, the Dirty Dozen instrument delivered the most predictive value with persons with average and high Dark Triad traits (theta > −0.5). Moreover, the Dirty Dozen scale was better conceptualized as a combined Machiavellianism-psychopathy factor, not narcissism, and is well captured with item 4: ‘I tend to exploit others towards my own end.’ Conclusion. The Dirty Dozen showed a consistent factor structure, a relatively convergent validity similar to that found in earlier studies. Narcissism measured using the Dirty Dozen, however, did not contribute with information to the core of the Dirty Dozen construct. More importantly, the results imply that the core of the Dirty Dozen scale, a manipulative and anti-social trait, can be measured by a Single Item Dirty Dark Dyad (SIDDD). PMID:26966673
The (mis)measurement of the Dark Triad Dirty Dozen: exploitation at the core of the scale.

PubMed

Kajonius, Petri J; Persson, Björn N; Rosenberg, Patricia; Garcia, Danilo

2016-01-01

Background. The dark side of human character has been conceptualized in the Dark Triad Model: Machiavellianism, psychopathy, and narcissism. These three dark traits are often measured using single long instruments for each one of the traits. Nevertheless, there is a necessity of short and valid personality measures in psychological research. As an independent research group, we replicated the factor structure, convergent validity and item response for one of the most recent and widely used short measures to operationalize these malevolent traits, namely, Jonason's Dark Triad Dirty Dozen. We aimed to expand the understanding of what the Dirty Dozen really captures because the mixed results on construct validity in previous research. Method. We used the largest sample to date to respond to the Dirty Dozen (N = 3,698). We firstly investigated the factor structure using Confirmatory Factor Analysis and an exploratory distribution analysis of the items in the Dirty Dozen. Secondly, using a sub-sample (n = 500) and correlation analyses, we investigated the Dirty Dozen dark traits convergent validity to Machiavellianism measured by the Mach-IV, psychopathy measured by Eysenck's Personality Questionnaire Revised, narcissism using the Narcissism Personality Inventory, and both neuroticism and extraversion from the Eysenck's questionnaire. Finally, besides these Classic Test Theory analyses, we analyzed the responses for each Dirty Dozen item using Item Response Theory (IRT). Results. The results confirmed previous findings of a bi-factor model fit: one latent core dark trait and three dark traits. All three Dirty Dozen traits had a striking bi-modal distribution, which might indicate unconcealed social undesirability with the items. The three Dirty Dozen traits did converge too, although not strongly, with the contiguous single Dark Triad scales (r between .41 and .49). The probabilities of filling out steps on the Dirty Dozen narcissism-items were much higher than on the Dirty Dozen items for Machiavellianism and psychopathy. Overall, the Dirty Dozen instrument delivered the most predictive value with persons with average and high Dark Triad traits (theta > -0.5). Moreover, the Dirty Dozen scale was better conceptualized as a combined Machiavellianism-psychopathy factor, not narcissism, and is well captured with item 4: 'I tend to exploit others towards my own end.' Conclusion. The Dirty Dozen showed a consistent factor structure, a relatively convergent validity similar to that found in earlier studies. Narcissism measured using the Dirty Dozen, however, did not contribute with information to the core of the Dirty Dozen construct. More importantly, the results imply that the core of the Dirty Dozen scale, a manipulative and anti-social trait, can be measured by a Single Item Dirty Dark Dyad (SIDDD).
Rasch-built Overall Disability Scale (R-ODS) for immune-mediated peripheral neuropathies.

PubMed

van Nes, S I; Vanhoutte, E K; van Doorn, P A; Hermans, M; Bakkers, M; Kuitwaard, K; Faber, C G; Merkies, I S J

2011-01-25

To develop a patient-based, linearly weighted scale that captures activity and social participation limitations in patients with Guillain-Barré syndrome (GBS), chronic inflammatory demyelinating polyradiculoneuropathy (CIDP), and gammopathy-related polyneuropathy (MGUSP). A preliminary Rasch-built Overall Disability Scale (R-ODS) containing 146 activity and participation items was constructed, based on the WHO International Classification of Functioning, Disability and Health, literature search, and patient interviews. The preliminary R-ODS was assessed twice (interval: 2-4 weeks; test-retest reliability studies) in 294 patients who experienced GBS in the past (n = 174) or currently have stable CIDP (n = 80) or MGUSP (n = 40). Data were analyzed using the Rasch unidimensional measurement model (RUMM2020). The preliminary R-ODS did not meet the Rasch model expectations. Based on disordered thresholds, misfit statistics, item bias, and local dependency, items were systematically removed to improve the model fit, regularly controlling the class intervals and model statistics. Finally, we succeeded in constructing a 24-item scale that fulfilled all Rasch requirements. "Reading a newspaper/book" and "eating" were the 2 easiest items; "standing for hours" and "running" were the most difficult ones. Good validity and reliability were obtained. The R-ODS is a linearly weighted scale that specifically captures activity and social participation limitations in patients with GBS, CIDP, and MGUSP. Compared to the Overall Disability Sum Score, the R-ODS represents a wider range of item difficulties, thereby better targeting patients with different ability levels. If responsive, the R-ODS will be valuable for future clinical trials and follow-up studies in these conditions.
The PROMIS fatigue item bank has good measurement properties in patients with fibromyalgia and severe fatigue.

PubMed

Yost, Kathleen J; Waller, Niels G; Lee, Minji K; Vincent, Ann

2017-06-01

Efficient management of fibromyalgia (FM) requires precise measurement of FM-specific symptoms. Our objective was to assess the measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) fatigue item bank (FIB) in people with FM. We applied classical psychometric and item response theory methods to cross-sectional PROMIS-FIB data from two samples. Data on the clinical FM sample were obtained at a tertiary medical center. Data for the U.S. general population sample were obtained from the PROMIS network. The full 95-item bank was administered to both samples. We investigated dimensionality of the item bank in both samples by separately fitting a bifactor model with two group factors; experience and impact. We assessed measurement invariance between samples, and we explored an alternate factor structure with the normative sample and subsequently confirmed that structure in the clinical sample. Finally, we assessed whether reporting FM subdomain scores added value over reporting a single total score. The item bank was dominated by a general fatigue factor. The fit of the initial bifactor model and evidence of measurement invariance indicated that the same constructs were measured across the samples. An alternative bifactor model with three group factors demonstrated slightly improved fit. Subdomain scores add value over a total score. We demonstrated that the PROMIS-FIB is appropriate for measuring fatigue in clinical samples of FM patients. The construct can be presented by a single score; however, subdomain scores for the three group factors identified in the alternative model may also be reported.
The Val30Met familial amyloid polyneuropathy specific Rasch-built overall disability scale (FAP-RODS(©) ).

PubMed

Pruppers, Mariëlle H J; Merkies, Ingemar S J; Faber, Catharina G; Da Silva, Ana M; Costa, Vanessa; Coelho, Teresa

2015-09-01

Familial amyloid polyneuropathy (FAP) is a chronic debilitating multi-organic disorder, mainly assessed using ordinal-based impairment measures. To date, no outcome measure at the activity and participation level has been constructed in FAP. The current study aimed to design an interval activity/participation scale for FAP through Rasch methodology. A preliminary FAP Rasch-built overall disability scale (pre-FAP-RODS) containing 146 activity/participation items was assessed twice (interval: 2-4 week; test-retest reliability) in 248 patients with Val30Met FAP examined in Porto, Portugal, of which 65.7% have received liver transplantation. An ordinal-based 24-item FAP-symptoms inventory questionnaire (FAP-SIQ) was also assessed (validity purposes). The pre-FAP-RODS and FAP-SIQ data were subjected to Rasch analyses. The pre-FAP-RODS did not meet model's expectations. On the basis of requirements such as misfit statistics, differential item functioning, and local dependency, items were systematically removed until a final 34-item FAP-RODS(©) was constructed fulfilling all Rasch requirements. Acceptable reliability/validity scores were demonstrated. In conclusion, the 34-item FAP-RODS(©) is a disease-specific interval measure suitable for detecting activity and participation restrictions in patients with FAP. The use of the FAP-RODS(©) is recommended for future international clinical trials in patients with Val30Met FAP determining its responsiveness and its cross-cultural validation. Its expansion to other forms of FAP should also be focus of future clinical studies. © 2015 Peripheral Nerve Society.
Development of the Primary Care Quality-Homeless (PCQ-H) Instrument: A Practical Survey of Patients' Experiences in Primary Care

PubMed Central

Kertesz, Stefan. G.; Pollio, David E.; Jones, Richard N.; Steward, Jocelyn; Stringfellow, Erin J.; Gordon, Adam J.; Johnson, Nancy K.; Kim, Theresa A.; Granstaff, Unita; Austin, Erika L.; Young, Alexander S.; Golden, Joya; Davis, Lori L.; Roth, David L.; Holt, Cheryl L.

2015-01-01

Background Homeless patients face unique challenges in obtaining primary care responsive to their needs and context. Patient experience questionnaires could permit assessment of patient-centered medical homes for this population, but standard instruments may not reflect homeless patients' priorities and concerns. Objectives This report describes (a) the content and psychometric properties of a new primary care questionnaire for homeless patients and (b) the methods utilized in its development. Methods Starting with quality-related constructs from the Institute of Medicine, we identified relevant themes by interviewing homeless patients and experts in their care. A multidisciplinary team drafted a preliminary set of 78 items. This was administered to homeless-experienced clients (n=563) across 3 VA facilities and 1 non-VA Health Care for the Homeless Program. Using Item Response Theory, we examined Test Information Function curves to eliminate less informative items and devise plausibly distinct subscales. Results The resulting 33-item instrument (Primary Care Quality-Homeless, PCQ-H) has four subscales: Patient-Clinician Relationship (15 items), Cooperation among Clinicians (3 items), Access/Coordination (11 items) and Homeless-Specific Needs (4 items). Evidence for divergent and convergent validity is provided. Test Information Function (TIF) graphs showed adequate informational value to permit inferences about groups for 3 subscales (Relationship, Cooperation and Access/Coordination). The 3-item Cooperation subscale had lower informational value (TIF<5) but had good internal consistency (alpha=0.75) and patients frequently reported problems in this aspect of care. Conclusions Systematic application of qualitative and quantitative methods supported the development of a brief patient-reported questionnaire focused on the primary care of homeless patients and offers guidance for future population-specific instrument development. PMID:25023918
Are Attitudes Toward Writing and Reading Separable Constructs? A Study With Primary Grade Children

PubMed Central

Graham, Steve; Berninger, Virginia; Abbott, Robert

2012-01-01

This study examined whether or not attitude towards writing is a unique and separable construct from attitude towards reading for young, beginning writers. Participants were 128 first-grade children (70 girls and 58 boys) and 113 third-grade students (57 girls and 56 boys). Each child was individually administered a 24 item attitude measure, which contained 12 items assessing attitude towards writing and 12 parallel items for reading. Students also wrote a narrative about a personal event in their life. A factor analysis of the 24 item attitude measure provided evidence that generally support the contention that writing and reading attitudes are separable constructs for young beginning writers, as it yielded three factors: a writing attitude factor with 9 items, a reading attitude factor with 9 parallel items, and an attitude about literacy interactions with others factor containing 4 items (2 items in writing and 2 parallel items in reading). Further validation that attitude towards writing is a separable construct from attitude towards reading was obtained at the third-grade level, where writing attitude made a unique and significant contribution, beyond the other two attitude measures, to the prediction of three measures of writing: quality, length, and longest correct word sequence. At the first-grade level, none of the 3 attitude measures predicted students’ writing performance. Finally, girls had more positive attitudes concerning reading and writing than boys. PMID:22736933
Preliminary Study of the Autism Self-Efficacy Scale for Teachers (ASSET)

PubMed Central

Ruble, Lisa A.; Toland, Michael D.; Birdwhistell, Jessica L.; McGrew, John H.; Usher, Ellen L.

2013-01-01

The purpose of the current study was to evaluate a new measure, the Autism Self-Efficacy Scale for Teachers (ASSET) for its dimensionality, internal consistency, and construct validity derived in a sample of special education teachers (N = 44) of students with autism. Results indicate that all items reflect one dominant factor, teachers’ responses to items were internally consistent within the sample, and compared to a 100-point scale, a 6-point response scale is adequate. ASSET scores were found to be negatively correlated with scores on two subscale measures of teacher stress (i.e., self-doubt/need for support and disruption of the teaching process) but uncorrelated with teacher burnout scores. The ASSET is a promising tool that requires replication with larger samples. PMID:23976899
Measuring the quality of life in hypertension according to Item Response Theory.

PubMed

Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; Andrade, Dalton Francisco de; Barbetta, Pedro Alberto; Souza, Ana Célia Caetano de; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

2017-05-04

To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL - Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. Analisar o Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL) por meio da Teoria da Resposta ao Item. Estudo analítico realizado com 712 pessoas com hipertensão arterial atendidas em 13 unidades de atenção primária em saúde de Fortaleza, CE, em 2015. As etapas da análise pela Teoria da Resposta ao Item foram: avaliação da dimensionalidade, estimação dos parâmetros dos itens e construção da escala. O estudo da dimensionalidade foi realizado sobre a matriz de correlação policórica e análise fatorial confirmatória. Para a estimação dos parâmetros dos itens, foi utilizado o Modelo de Resposta Gradual de Samejima. As análises foram conduzidas no software livre R com o auxílio dos pacotes psych e mirt. A análise permitiu a visualização dos parâmetros dos itens e suas contribuições individuais na mensuração do traço latente, gerando mais informação, permitindo a construção de uma escala com um modelo interpretativo que demonstra a evolução da piora da qualidade de vida em cinco níveis. Quanto aos parâmetros dos itens, houve bom desempenho daqueles referentes ao estado somático, pois apresentaram melhor poder de discriminar os indivíduos com pior qualidade de vida. Os itens relacionados ao estado mental foram os que contribuíram com menor quantidade de informação psicométrica no MINICHAL. Conclui-se que o instrumento é indicado para a identificação da deterioração da qualidade de vida em hipertensão arterial. A análise do MINICHAL pela Teoria da Resposta ao Item permitiu identificar novas facetas desse instrumento ainda não abordadas em estudos anteriores.
Hope and General Self-efficacy: Two Measures of the Same Construct?

PubMed

Zhou, Mingming; Kam, Chester Chun Seng

2016-07-03

The aim of this study was to test the extent to which hope measure is equivalent to general self-efficacy measure. Questionnaire data on these two constructs and other external variables were collected from 199 Chinese college students. The factor analytic results suggested that hope and self-efficacy items measured the same construct. The unidimensional model combining hope items and GSE items fit the data as well as the bidimensional model, indicating that their corresponding items measured the same underlying construct. Further analyses showed that hope and GSE did not correlate with external variables differently in a systematic manner. Most of these correlational differences were non-significant and negligible. These findings suggested that the literatures studying GSE and hope could be considered to be integrated and that researchers need to recognize and acknowledge the conceptual and operational similarities among these constructs in the literature.
Validity and measurement precision of the PROMIS physical function item bank and a content validity-driven 20-item short form in rheumatoid arthritis compared with traditional measures.

PubMed

Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Glas, Cees A W; Vonkeman, Harald E; Taal, Erik; Krishnan, Eswar; Bernelot Moens, Hein J; Boers, Maarten; Terwee, Caroline B; van Riel, Piet L C M; van de Laar, Mart A F J

2015-12-01

To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) physical function item bank and a 20-item short form in patients with RA in comparison with the HAQ disability index (HAQ-DI) and 36-item Short Form Health Survey (SF-36) physical functioning scale (PF-10). The content validity of the instruments was evaluated by linking their items to the International Classification of Functioning, Disability and Health (ICF) core set for RA. The measures were administered to 690 RA patients enrolled in the Dutch Rheumatoid Arthritis Monitoring registry. Measurement precision was evaluated using item response theory methods and construct validity was evaluated by correlating physical function scores with other clinical and patient-reported outcome measures. All 207 health concepts identified in the physical function measures referred to activities that are featured in the ICF. Twenty-three of 26 ICF RA core set domains are featured in the full PROMIS physical function item bank compared with 13 and 8 for the HAQ-DI and PF-10, respectively. As hypothesized, all three physical function instruments were highly intercorrelated (r 0.74-0.84), moderately correlated with disease activity measures (r 0.44-0.63) and weakly correlated with age (rs 0.07-0.14). Item response theory-based analysis revealed that a 20-item PROMIS physical function short form covered a wider range of physical function levels than the HAQ-DI or PF-10. The PROMIS physical function item bank demonstrated excellent measurement properties in RA. A content-driven 20-item short form may be a useful tool for assessing physical function in RA. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS).

PubMed

Rose, M; Bjorner, J B; Becker, J; Fries, J F; Ware, J E

2008-01-01

The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.
Item Response Theory Modeling and Categorical Regression Analyses of the Five-Factor Model Rating Form: A Study on Italian Community-Dwelling Adolescent Participants and Adult Participants.

PubMed

Fossati, Andrea; Widiger, Thomas A; Borroni, Serena; Maffei, Cesare; Somma, Antonella

2017-06-01

To extend the evidence on the reliability and construct validity of the Five-Factor Model Rating Form (FFMRF) in its self-report version, two independent samples of Italian participants, which were composed of 510 adolescent high school students and 457 community-dwelling adults, respectively, were administered the FFMRF in its Italian translation. Adolescent participants were also administered the Italian translation of the Borderline Personality Features Scale for Children-11 (BPFSC-11), whereas adult participants were administered the Italian translation of the Triarchic Psychopathy Measure (TriPM). Cronbach α values were consistent with previous findings; in both samples, average interitem r values indicated acceptable internal consistency for all FFMRF scales. A multidimensional graded item response theory model indicated that the majority of FFMRF items had adequate discrimination parameters; information indices supported the reliability of the FFMRF scales. Both categorical (i.e., item-level) and scale-level regression analyses suggested that the FFMRF scores may predict a nonnegligible amount of variance in the BPFSC-11 total score in adolescent participants, and in the TriPM scale scores in adult participants.
Testing measurement invariance of the patient-reported outcomes measurement information system pain behaviors score between the US general population sample and a sample of individuals with chronic pain.

PubMed

Chung, Hyewon; Kim, Jiseon; Cook, Karon F; Askew, Robert L; Revicki, Dennis A; Amtmann, Dagmar

2014-02-01

In order to test the difference between group means, the construct measured must have the same meaning for all groups under investigation. This study examined the measurement invariance of responses to the patient-reported outcomes measurement information system (PROMIS) pain behavior (PB) item bank in two samples: the PROMIS calibration sample (Wave 1, N = 426) and a sample recruited from the American Chronic Pain Association (ACPA, N = 750). The ACPA data were collected to increase the number of participants with higher levels of pain. Multi-group confirmatory factor analysis (MG-CFA) and two item response theory (IRT)-based differential item functioning (DIF) approaches were employed to evaluate the existence of measurement invariance. MG-CFA results supported metric invariance of the PROMIS-PB, indicating unstandardized factor loadings with equal across samples. DIF analyses revealed that impact of 6 DIF items was negligible. Based on the results of both MG-CFA and IRT-based DIF approaches, we recommend retaining the original parameter estimates obtained from the combined samples based on the results of MG-CFA.
Examining Gender Differences in Written Assessment Tasks in Biology: A Case Study of Evolutionary Explanations

PubMed Central

Federer, Meghan Rector; Nehm, Ross H.; Pearl, Dennis K.

2016-01-01

Understanding sources of performance bias in science assessment provides important insights into whether science curricula and/or assessments are valid representations of student abilities. Research investigating assessment bias due to factors such as instrument structure, participant characteristics, and item types are well documented across a variety of disciplines. However, the relationships among these factors are unclear for tasks evaluating understanding through performance on scientific practices, such as explanation. Using item-response theory (Rasch analysis), we evaluated differences in performance by gender on a constructed-response (CR) assessment about natural selection (ACORNS). Three isomorphic item strands of the instrument were administered to a sample of undergraduate biology majors and nonmajors (Group 1: n = 662 [female = 51.6%]; G2: n = 184 [female = 55.9%]; G3: n = 642 [female = 55.1%]). Overall, our results identify relationships between item features and performance by gender; however, the effect is small in the majority of cases, suggesting that males and females tend to incorporate similar concepts into their CR explanations. These results highlight the importance of examining gender effects on performance in written assessment tasks in biology. PMID:26865642
Rasch Analysis of the Student Refractive Error and Eyeglass Questionnaire

PubMed Central

Crescioni, Mabel; Messer, Dawn H.; Warholak, Terri L.; Miller, Joseph M.; Twelker, J. Daniel; Harvey, Erin M.

2014-01-01

Purpose To evaluate and refine a newly developed instrument, the Student Refractive Error and Eyeglasses Questionnaire (SREEQ), designed to measure the impact of uncorrected and corrected refractive error on vision-related quality of life (VRQoL) in school-aged children. Methods. A 38 statement instrument consisting of two parts was developed: Part A relates to perceptions regarding uncorrected vision and Part B relates to perceptions regarding corrected vision and includes other statements regarding VRQoL with spectacle correction. The SREEQ was administered to 200 Native American 6th through 12th grade students known to have previously worn and who currently require eyeglasses. Rasch analysis was conducted to evaluate the functioning of the SREEQ. Statements on Part A and Part B were analyzed to examine the dimensionality and constructs of the questionnaire, how well the items functioned, and the appropriateness of the response scale used. Results Rasch analysis suggested two items be eliminated and the measurement scale for matching items be reduced from a 4-point response scale to a 3-point response scale. With these modifications, categorical data were converted to interval level data, to conduct an item and person analysis. A shortened version of the SREEQ was constructed with these modifications, the SREEQ-R, which included the statements that were able to capture changes in VRQoL associated with spectacle wear for those with significant refractive error in our study population. Conclusions While the SREEQ Part B appears to be a have less than optimal reliability to assess the impact of spectacle correction on VRQoL in our student population, it is also able to detect statistically significant differences from pretest to posttest on both the group and individual levels to show that the instrument can assess the impact that glasses have on VRQoL. Further modifications to the questionnaire, such as those included in the SREEQ-R, could enhance its functionality. PMID:24811844
Students' perceptions of a blended learning experience in dental education.

PubMed

Varthis, S; Anderson, O R

2018-02-01

"Flipped" instructional sequencing is a new instructional method where online instruction precedes the group meeting, allowing for more sophisticated learning through discussion and critical thinking during the in-person class session; a novel approach studied in this research. The purpose of this study was to document dental students' perceptions of flipped-based blended learning and to apply a new method of displaying their perceptions based on Likert-scale data analysis using a network diagramming method known as an item correlation network diagram (ICND). In addition, this article aimed to encourage institutions or course directors to consider self-regulated learning and social constructivism as a theoretical framework when blended learning is incorporated in dental curricula. Twenty (second year) dental students at a Northeastern Regional Dental School in the United States participated in this study. A Likert scale was administered before and after the learning experience to obtain evidence of their perceptions of its quality and educational merits. Item correlation network diagrams, based on the intercorrelations amongst the responses to the Likert-scale items, were constructed to display students' changes in perceptions before and after the learning experience. Students reported positive perceptions of the blended learning, and the ICND analysis of their responses before and after the learning experience provided insights into their social (group-based) cognition about the learning experience. The ICNDs are considered evidence of social or group-based cognition, because they are constructed from evidence obtained using intercorrelations of the total group responses to the Likert-scale items. The students positively received blended learning in dental education, and the ICND analyses demonstrated marked changes in their social cognition of the learning experience based on the pre- and post-Likert survey data. Self-regulated learning and social constructivism are encouraged as useful theoretical frameworks for a blended learning approach. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

Calibration of the Test of Relational Reasoning.

PubMed

Dumas, Denis; Alexander, Patricia A

2016-10-01

Relational reasoning, or the ability to discern meaningful patterns within a stream of information, is a critical cognitive ability associated with academic and professional success. Importantly, relational reasoning has been described as taking multiple forms, depending on the type of higher order relations being drawn between and among concepts. However, the reliable and valid measurement of such a multidimensional construct of relational reasoning has been elusive. The Test of Relational Reasoning (TORR) was designed to tap 4 forms of relational reasoning (i.e., analogy, anomaly, antinomy, and antithesis). In this investigation, the TORR was calibrated and scored using multidimensional item response theory in a large, representative undergraduate sample. The bifactor model was identified as the best-fitting model, and used to estimate item parameters and construct reliability. To improve the usefulness of the TORR to educators, scaled scores were also calculated and presented. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
A Comparison of Latent Growth Models for Constructs Measured by Multiple Items

ERIC Educational Resources Information Center

Leite, Walter L.

2007-01-01

Univariate latent growth modeling (LGM) of composites of multiple items (e.g., item means or sums) has been frequently used to analyze the growth of latent constructs. This study evaluated whether LGM of composites yields unbiased parameter estimates, standard errors, chi-square statistics, and adequate fit indexes. Furthermore, LGM was compared…
Guide to English Language Arts/Literacy Released Items: Understanding Scoring

ERIC Educational Resources Information Center

Partnership for Assessment of Readiness for College and Careers, 2016

2016-01-01

The Partnership for Assessment of Readiness for College and Careers (PARCC) is a group of states working together to develop a set of assessments that measure whether students are on track to be successful in college and careers. Administrations of the PARCC assessment included three Prose Constructed Responses (PCR), one per task for English…
Using the Graded Response Model to Control Spurious Interactions in Moderated Multiple Regression

ERIC Educational Resources Information Center

Morse, Brendan J.; Johanson, George A.; Griffeth, Rodger W.

2012-01-01

Recent simulation research has demonstrated that using simple raw score to operationalize a latent construct can result in inflated Type I error rates for the interaction term of a moderated statistical model when the interaction (or lack thereof) is proposed at the latent variable level. Rescaling the scores using an appropriate item response…
Longitudinal Construct Validity of Brief Symptom Inventory Subscales in Schizophrenia

ERIC Educational Resources Information Center

Long, Jeffrey D.; Harring, Jeffrey R.; Brekke, John S.; Test, Mary Ann; Greenberg, Jan

2007-01-01

Longitudinal validity of Brief Symptom Inventory subscales was examined in a sample (N = 318) with schizophrenia-related illness measured at baseline and every 6 months for 3 years. Nonlinear factor analysis of items was used to test graded response models (GRMs) for subscales in isolation. The models varied in their within-time and between-times…
A Critical Analysis of the Body of Work Method for Setting Cut-Scores

ERIC Educational Resources Information Center

Radwan, Nizam; Rogers, W. Todd

2006-01-01

The recent increase in the use of constructed-response items in educational assessment and the dissatisfaction with the nature of the decision that the judges must make using traditional standard-setting methods created a need to develop new and effective standard-setting procedures for tests that include both multiple-choice and…
Validation Study of a Gatekeeping Attitude Index for Social Work Education

ERIC Educational Resources Information Center

Tam, Dora M. Y.; Coleman, Heather

2011-01-01

This article reports on a study designed to validate the Gatekeeping Attitude Index, a 14-item Likert scaling index. The authors collected data from a convenience sample of social work field instructors (N = 188) with a response rate of 74.0%. Construct validation by exploratory factor analysis identified a 2-factor solution on the index after…
State of Modern Measurement Approaches in Social Work Research Literature

ERIC Educational Resources Information Center

Unick, George J.; Stone, Susan

2010-01-01

The need to develop measures that tap into constructs of interest to social work, refine existing measures, and ensure that measures function adequately across diverse populations of interest is critical. Item response theory (IRT) is a modern measurement approach that is increasingly seen as an essential tool in a number of allied professions.…
Curve of Factors Model: A Latent Growth Modeling Approach for Educational Research

ERIC Educational Resources Information Center

Isiordia, Marilu; Ferrer, Emilio

2018-01-01

A first-order latent growth model assesses change in an unobserved construct from a single score and is commonly used across different domains of educational research. However, examining change using a set of multiple response scores (e.g., scale items) affords researchers several methodological benefits not possible when using a single score. A…
Investigating Criteria That Seventh Graders Use to Evaluate the Quality of Online Information

ERIC Educational Resources Information Center

Coiro, Julie; Coscarelli, Carla; Maykel, Cheryl; Forzani, Elena

2015-01-01

This article presents qualitative findings from a study that examined the types of criteria that middle school students use to evaluate the quality of online information and sources for a Web-based research assignment. Open-constructed responses from four critical evaluation items were compiled from diverse seventh graders in a representative,…
Development and Psychometric Evaluation of a Health-Related Quality of Life Instrument for Individuals with Adult-Onset Hearing Loss

PubMed Central

Stika, Carren J.; Hays, Ron D.

2016-01-01

Objective Self-reports of “hearing handicap” are available, but a comprehensive measure of health-related quality of life (HRQOL) for individuals with adult-onset hearing loss (AOHL) does not exist. Our objective was to develop and evaluate a multidimensional HRQOL instrument for individuals with AOHL. Design The Impact of Hearing Loss Inventory Tool (IHEAR-IT) was developed using results of focus groups, a literature review, Advisory Expert Panel input, and cognitive interviews. Study Sample The 73-item field-test instrument was completed by 409 adults (22-91 years old) with varying degrees of AOHL and from different areas of the US. Results Multitrait scaling analysis supported four multi-item scales and five individual items. Internal consistency reliabilities ranged from 0.93 to 0.96 for the scales. Construct validity was supported by correlations between the IHEAR-IT scales and scores on the 36-Item Short Form Health Survey, Version 2.0 (SF-36v2) Mental Composite Summary (r’s = 0.32 – 0.64) and the Hearing Handicap Inventory for the Elderly/Adults (HHIE/HHIA) (r’s > −0.70). Conclusions The field test provide initial support for the reliability and construct validity of the IHEAR-IT for evaluating HRQOL of individuals with AOHL. Further research is needed to evaluate the responsiveness to change of the IHEAR-IT scales and identify items for a short-form. PMID:27104754
The Academic Resilience Scale (ARS-30): A New Multidimensional Construct Measure

PubMed Central

Cassidy, Simon

2016-01-01

Resilience is a psychological construct observed in some individuals that accounts for success despite adversity. Resilience reflects the ability to bounce back, to beat the odds and is considered an asset in human characteristic terms. Academic resilience contextualizes the resilience construct and reflects an increased likelihood of educational success despite adversity. The paper provides an account of the development of a new multidimensional construct measure of academic resilience. The 30 item Academic Resilience Scale (ARS-30) explores process—as opposed to outcome—aspects of resilience, providing a measure of academic resilience based on students’ specific adaptive cognitive-affective and behavioral responses to academic adversity. Findings from the study involving a sample of undergraduate students (N = 532) demonstrate that the ARS-30 has good internal reliability and construct validity. It is suggested that a measure such as the ARS-30, which is based on adaptive responses, aligns more closely with the conceptualisation of resilience and provides a valid construct measure of academic resilience relevant for research and practice in university student populations. PMID:27917137
Developing and investigating the use of single-item measures in organizational research.

PubMed

Fisher, Gwenith G; Matthews, Russell A; Gibbons, Alyssa Mitchell

2016-01-01

The validity of organizational research relies on strong research methods, which include effective measurement of psychological constructs. The general consensus is that multiple item measures have better psychometric properties than single-item measures. However, due to practical constraints (e.g., survey length, respondent burden) there are situations in which certain single items may be useful for capturing information about constructs that might otherwise go unmeasured. We evaluated 37 items, including 18 newly developed items as well as 19 single items selected from existing multiple-item scales based on psychometric characteristics, to assess 18 constructs frequently measured in organizational and occupational health psychology research. We examined evidence of reliability; convergent, discriminant, and content validity assessments; and test-retest reliabilities at 1- and 3-month time lags for single-item measures using a multistage and multisource validation strategy across 3 studies, including data from N = 17 occupational health subject matter experts and N = 1,634 survey respondents across 2 samples. Items selected from existing scales generally demonstrated better internal consistency reliability and convergent validity, whereas these particular new items generally had higher levels of content validity. We offer recommendations regarding when use of single items may be more or less appropriate, as well as 11 items that seem acceptable, 14 items with mixed results that might be used with caution due to mixed results, and 12 items we do not recommend using as single-item measures. Although multiple-item measures are preferable from a psychometric standpoint, in some circumstances single-item measures can provide useful information. (c) 2016 APA, all rights reserved).
Validation and psychometric properties of the Somatic and Psychological HEalth REport (SPHERE) in a young Australian-based population sample using non-parametric item response theory.

PubMed

Couvy-Duchesne, Baptiste; Davenport, Tracey A; Martin, Nicholas G; Wright, Margaret J; Hickie, Ian B

2017-08-01

The Somatic and Psychological HEalth REport (SPHERE) is a 34-item self-report questionnaire that assesses symptoms of mental distress and persistent fatigue. As it was developed as a screening instrument for use mainly in primary care-based clinical settings, its validity and psychometric properties have not been studied extensively in population-based samples. We used non-parametric Item Response Theory to assess scale validity and item properties of the SPHERE-34 scales, collected through four waves of the Brisbane Longitudinal Twin Study (N = 1707, mean age = 12, 51% females; N = 1273, mean age = 14, 50% females; N = 1513, mean age = 16, 54% females, N = 1263, mean age = 18, 56% females). We estimated the heritability of the new scores, their genetic correlation, and their predictive ability in a sub-sample (N = 1993) who completed the Composite International Diagnostic Interview. After excluding items most responsible for noise, sex or wave bias, the SPHERE-34 questionnaire was reduced to 21 items (SPHERE-21), comprising a 14-item scale for anxiety-depression and a 10-item scale for chronic fatigue (3 items overlapping). These new scores showed high internal consistency (alpha > 0.78), moderate three months reliability (ICC = 0.47-0.58) and item scalability (Hi > 0.23), and were positively correlated (phenotypic correlations r = 0.57-0.70; rG = 0.77-1.00). Heritability estimates ranged from 0.27 to 0.51. In addition, both scores were associated with later DSM-IV diagnoses of MDD, social anxiety and alcohol dependence (OR in 1.23-1.47). Finally, a post-hoc comparison showed that several psychometric properties of the SPHERE-21 were similar to those of the Beck Depression Inventory. The scales of SPHERE-21 measure valid and comparable constructs across sex and age groups (from 9 to 28 years). SPHERE-21 scores are heritable, genetically correlated and show good predictive ability of mental health in an Australian-based population sample of young people.
A Mixed Effects Randomized Item Response Model

ERIC Educational Resources Information Center

Fox, J.-P.; Wyrick, Cheryl

2008-01-01

The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…
Dutch-Flemish translation of nine pediatric item banks from the Patient-Reported Outcomes Measurement Information System (PROMIS)®.

PubMed

Haverman, Lotte; Grootenhuis, Martha A; Raat, Hein; van Rossum, Marion A J; van Dulmen-den Broeder, Eline; Hoppenbrouwers, Karel; Correia, Helena; Cella, David; Roorda, Leo D; Terwee, Caroline B

2016-03-01

The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is a new, state-of-the-art assessment system for measuring patient-reported health and well-being of adults and children. It has the potential to be more valid, reliable, and responsive than existing PROMs. The items banks are designed to be self-reported and completed by children aged 8-18 years. The PROMIS items can be administered in short forms or through computerized adaptive testing. This paper describes the translation and cultural adaption of nine PROMIS item banks (151 items) for children in Dutch-Flemish. The translation was performed by FACITtrans using standardized PROMIS methodology and approved by the PROMIS Statistical Center. The translation included four forward translations, two back-translations, three independent reviews (at least two Dutch, one Flemish), and pretesting in 24 children from the Netherlands and Flanders. For some items, it was necessary to have separate translations for Dutch and Flemish: physical function-mobility (three items), anger (one item), pain interference (two items), and asthma impact (one item). Challenges faced in the translation process included scarcity or overabundance of possible translations, unclear item descriptions, constructs broader/smaller in the target language, difficulties in rank ordering items, differences in unit of measurement, irrelevant items, or differences in performance of activities. By addressing these challenges, acceptable translations were obtained for all items. The Dutch-Flemish PROMIS items are linguistically equivalent to the original USA version. Short forms are now available for use, and entire item banks are ready for cross-cultural validation in the Netherlands and Flanders.
[Case Study] CityCenter and Cosmopolitan Construction Projects, Las Vegas, Nevada: lessons learned from the use of multiple sources and mixed methods in a safety needs assessment.

PubMed

Gittleman, Janie L; Gardner, Paige C; Haile, Elizabeth; Sampson, Julie M; Cigularov, Konstantin P; Ermann, Erica D; Stafford, Pete; Chen, Peter Y

2010-06-01

The present study describes a response to eight tragic deaths over an eighteen month times span on a fast track construction project on the largest commercial development project in U.S. history. Four versions of a survey were distributed to workers, foremen, superintendents, and senior management. In addition to standard Likert-scale safety climate scale items, an open-ended item was included at the end of the survey. Safety climate perceptions differed by job level. Specifically, management perceived a more positive safety climate as compared to workers. Content analysis of the open-ended item was used to identify important safety and health concerns which might have been overlooked with the qualitative portion of the survey. The surveys were conducted to understand workforce issues of concern with the aim of improving site safety conditions. Such efforts can require minimal investment of resources and time and result in critical feedback for developing interventions affecting organizational structure, management processes, and communication. The most important lesson learned was that gauging differences in perception about site safety can provide critical feedback at all levels of a construction organization. Implementation of multi-level organizational perception surveys can identify major safety issues of concern. Feedback, if acted upon, can potentially result in fewer injuries and fatal events. (c) 2010 Elsevier Ltd. All rights reserved.
Validity of a questionnaire measuring the world health organization concept of health system responsiveness with respect to perinatal services in the Dutch obstetric care system.

PubMed

van der Kooy, Jacoba; Valentine, Nicole B; Birnie, Erwin; Vujkovic, Marijana; de Graaf, Johanna P; Denktaş, Semiha; Steegers, Eric A P; Bonsel, Gouke J

2014-12-03

The concept of responsiveness, introduced by the World Health Organization (WHO), addresses non-clinical aspects of health service quality that are relevant regardless of provider, country, health system or health condition. Responsiveness refers to "aspects related to the way individuals are treated and the environment in which they are treated" during health system interactions. This paper assesses the psychometric properties of a newly developed responsiveness questionnaire dedicated to evaluating maternal experiences of perinatal care services, called the Responsiveness in Perinatal and Obstetric Health Care Questionnaire (ReproQ), using the eight-domain WHO concept. The ReproQ was developed between October 2009 and February 2010 by adapting the WHO Responsiveness Questionnaire items to the perinatal care context. The psychometric properties of feasibility, construct validity, and discriminative validity were empirically assessed in a sample of Dutch women two weeks post partum. A total of 171 women consented to participation. Feasibility: the interviews lasted between 20 and 40 minutes and the overall missing rate was 8%. Construct validity: mean Cronbach's alphas for the antenatal, birth and postpartum phase were: 0.73 (range 0.57-0.82), 0.84 (range 0.66-0.92), and 0.87 (range 0.62-0.95) respectively. The item-own scale correlations within all phases were considerably higher than most of the item-other scale correlations. Within the antenatal care, birth care and post partum phases, the eight factors explained 69%, 69%, and 76% of variance respectively. Discriminative validity: overall responsiveness mean sum scores were higher for women whose children were not admitted. This confirmed the hypothesis that dissatisfaction with health outcomes is transferred to their judgement on responsiveness of the perinatal services. The ReproQ interview-based questionnaire demonstrated satisfactory psychometric properties to describe the quality of perinatal care in the Netherlands, with the potential to discriminate between different levels of quality of care. In view of the relatively small sample, further testing and research is recommended.
Measurement Properties of the Psoriasis Symptom Inventory Electronic Daily Diary in Patients with Moderate to Severe Plaque Psoriasis.

PubMed

Viswanathan, Hema N; Mutebi, Alex; Milmont, Cassandra E; Gordon, Kenneth; Wilson, Hilary; Zhang, Hao; Klekotka, Paul A; Revicki, Dennis A; Augustin, Matthias; Kricorian, Gregory; Nirula, Ajay; Strober, Bruce

2017-09-01

The Psoriasis Symptom Inventory (PSI) is a patient-reported outcome instrument that measures the severity of psoriasis signs and symptoms. This study evaluated measurement properties of the PSI in patients with moderate to severe plaque psoriasis. This secondary analysis used pooled data from a phase 3 brodalumab clinical trial (AMAGINE-1). Outcome measures included the PSI, Psoriasis Area and Severity Index (PASI), static Physician's Global Assessment (sPGA), psoriasis-affected body surface area, 36-item Short-Form Health Survey version 2, and the Dermatology Life Quality Index (DLQI). The PSI was evaluated for dimensionality, item performance, reliability (internal consistency and test-retest), construct validity, ability to detect change, and agreement between PSI response and response measures based on the PASI, sPGA, and DLQI. Results supported unidimensionality, good item fit, ordered responses, and PSI scoring. The PSI demonstrated reliability: baseline Cronbach's alpha ≥ 0.92 and intraclass correlation coefficients ≥ 0.95. Correlations between PSI total score and DLQI item 1 (r = 0.86), DLQI symptoms and feelings (r = 0.87), and 36-item Short-Form Health Survey version 2 bodily pain (r = -0.61) supported convergent validity. PSI scores differed significantly (P < 0.001) among severity groups based on the PASI (< 12/≥ 12), sPGA (0-1/2-3/4-5), body surface area (< 5%/5%-10%/> 10%), and DLQI (≤ 5/> 5) at weeks 8 and 12. At week 12, the PSI detected significant changes in severity based on PASI responses (< 50/50- < 75/≥ 75) and sPGA (0-1/≥ 2), and showed good agreement (k ≥ 0.66) between PSI response and PASI, sPGA, and DLQI responses. The PSI demonstrated excellent validity, reliability, and ability to detect change in the severity of psoriasis signs and symptoms. Copyright © 2017 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Evaluating construct validity of the second version of the Copenhagen Psychosocial Questionnaire through analysis of differential item functioning and differential item effect.

PubMed

Bjorner, Jakob Bue; Pejtersen, Jan Hyld

2010-02-01

To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.

Measurement of multiple nicotine dependence domains among cigarette, non-cigarette and poly-tobacco users: Insights from item response theory.

PubMed

Strong, David R; Messer, Karen; Hartman, Sheri J; Conway, Kevin P; Hoffman, Allison C; Pharris-Ciurej, Nikolas; White, Martha; Green, Victoria R; Compton, Wilson M; Pierce, John

2015-07-01

Nicotine dependence (ND) is a key construct that organizes physiological and behavioral symptoms associated with persistent nicotine intake. Measurement of ND has focused primarily on cigarette smokers. Thus, validation of brief instruments that apply to a broad spectrum of tobacco product users is needed. We examined multiple domains of ND in a longitudinal national study of the United States population, the United States National Epidemiological Survey of Alcohol and Related Conditions (NESARC). We used methods based in item response theory to identify and validate increasingly brief measures of ND that included symptoms to assess ND similarly among cigarette, cigar, smokeless, and poly tobacco users. Confirmatory factor analytic models supported a single, primary dimension underlying symptoms of ND across tobacco use groups. Differential Item Functioning (DIF) analysis generated little support for systematic differences in response to symptoms of ND across tobacco use groups. We established significant concurrent and predictive validity of brief 3- and 5-symptom indices for measuring ND. Measuring ND across tobacco use groups with a common set of symptoms facilitates evaluation of tobacco use in an evolving marketplace of tobacco and nicotine products. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.

ERIC Educational Resources Information Center

Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne

Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…
Item Pool Construction Using Mixed Integer Quadratic Programming (MIQP). GMAC® Research Report RR-14-01

ERIC Educational Resources Information Center

Han, Kyung T.; Rudner, Lawrence M.

2014-01-01

This study uses mixed integer quadratic programming (MIQP) to construct multiple highly equivalent item pools simultaneously, and compares the results from mixed integer programming (MIP). Three different MIP/MIQP models were implemented and evaluated using real CAT item pool data with 23 different content areas and a goal of equal information…
Construction and Analysis of Educational Tests Using Abductive Machine Learning

ERIC Educational Resources Information Center

El-Alfy, El-Sayed M.; Abdel-Aal, Radwan E.

2008-01-01

Recent advances in educational technologies and the wide-spread use of computers in schools have fueled innovations in test construction and analysis. As the measurement accuracy of a test depends on the quality of the items it includes, item selection procedures play a central role in this process. Mathematical programming and the item response…
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14)

PubMed Central

Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi

2015-01-01

Purpose Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. Methods In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach’s alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). Results The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. Conclusion The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice. PMID:26247356
Consequences of screening in lung cancer: development and dimensionality of a questionnaire.

PubMed

Brodersen, John; Thorsen, Hanne; Kreiner, Svend

2010-08-01

The objective of this study was to extend the Consequences of Screening (COS) Questionnaire for use in a lung cancer screening by testing for comprehension, content coverage, dimensionality, and reliability. In interviews, the suitability, content coverage, and relevance of the COS were tested on participants in a lung cancer screening program. The results were thematically analyzed to identify the key consequences of abnormal and false-positive screening results. Item Response Theory and Classical Test Theory were used to analyze data. Dimensionality, objectivity, and reliability were established by item analysis, examining the fit between item responses and Rasch models. Eight themes specifically relevant for participants in lung cancer screening results were identified: "self-blame,"focus on symptoms,"stigmatization,"introvert,"harm of smoking,"impulsivity,"empathy," and "regretful of still smoking." Altogether, 26 new items for part I and 16 new items for part II were generated. These themes were confirmed to fit a partial-credit Rasch model measuring different constructs including several of the new items. In conclusion, the reliability and the dimensionality of a condition-specific measure with high content validity for persons having abnormal or false-positive lung cancer screening results have been demonstrated. This new questionnaire called Consequences of Screening in Lung Cancer (COS-LC) covers in two parts the psychosocial experience in lung cancer screening. Part I: "anxiety,"behavior,"dejection,"sleep,"self-blame,"focus on airway symptoms,"stigmatization,"introvert," and "harm of smoking." Part II: "calm/relax,"social network,"existential values,"impulsivity,"empathy," and "regretful of still smoking."
Validity aspects of the patient feedback questionnaire on consultation skills (PFC), a promising learning instrument in medical education.

PubMed

Reinders, Marcel E; Blankenstein, Annette H; Knol, Dirk L; de Vet, Henrica C W; van Marwijk, Harm W J

2009-08-01

A focus on the communicator competency is considered to be an important requirement to help physicians to acquire consultation skills. A feedback questionnaire, in which patients assess consultation skills might be a useful learning tool. An existing questionnaire on patient perception of patient-centeredness (PPPC) was adapted to cover the 'communicator' items in the competency profile. We assessed the face and content validity, the construct validity and the internal consistency of this new patient feedback on consultation skills (PFC) questionnaire. We assessed the face validity of the PFC by interviewing patients and general practice trainees (GPTs) during the developmental process. The content validity was determined by experts (n=10). First-year GPTs (23) collected 222 PFCs, from which the data were used to assess the construct validity (factor analysis), internal consistency, response rates and ceiling effects. The PFC adequately covers the corresponding 'communicator' competency (face and content validity). Factor analysis showed a one-dimensional construct. The internal consistency was high (Cronbach's alpha 0.89). For the single items, the response rate varied from 89.2% to 100%; the maximum score (ceiling effect) varied from 45.5% to 89.2%. The PFC appears to be a valid, internally consistent instrument. The PFC may be a valuable learning tool with which GPTs, other physicians and medical students can acquire feedback from patients regarding their consultation skills.
Development of the Comprehensive General Parenting Questionnaire for caregivers of 5-13 year olds

PubMed Central

2014-01-01

Background Despite the large number of parenting questionnaires, considerable disagreement exists about how to best assess parenting. Most of the instruments only assess limited aspects of parenting. To overcome this shortcoming, the “Comprehensive General Parenting Questionnaire” (CGPQ) was systematically developed. Such a measure is frequently requested in the area of childhood overweight. Methods First, an item bank of existing parenting measures was created assessing five key parenting constructs that have been identified across multiple theoretical approaches to parenting (Nurturance, Overprotection, Coercive control, Behavioral control, and Structure). Caregivers of 5- to 13-year-olds were asked to complete the online survey in the Netherlands (N = 821), Belgium (N = 435) and the United States (N = 241). In addition, a questionnaire regarding personality characteristics (“Big Five”) of the caregiver was administered and parents were asked to report about their child’s height and weight. Factor analyses and Item-Response Modeling (IRM) techniques were used to assess the underlying parenting constructs and for item reduction. Correlation analyses were performed to assess the relations between general parenting and personality of the caregivers, adjusting for socio-economic status (SES) indicators, to establish criterion validity. Multivariate linear regressions were performed to examine the associations of SES indicators and parenting with child BMI z-scores. Additionally, we assessed whether scores on the parenting constructs and child BMI z-scores differed depending on SES indicators. Results The reduced questionnaire (62 items) revealed acceptable fit of our parenting model and acceptable IRM item fit statistics. Caregiver personality was related as hypothesized with the GCPQ parenting constructs. While correcting for SES, overprotection was positively related to child BMI. The negative relationship between structure and BMI was borderline significant. Parents with a high level of education were less likely to use overly forms of controlling parenting (i.e., coercive control and overprotection) and more likely to have children with lower BMI. Based on several author review meetings and cognitive interviews the questionnaire was further modified to an 85-item questionnaire. Conclusions The GCPQ may facilitate research exploring how parenting influences children’s weight-related behaviors. The contextual influence of general parenting is likely to be more profound than its direct relationship with weight status. PMID:24512450
The Childhood Cancer Survivor Study-Neurocognitive Questionnaire (CCSS-NCQ) Revised: Item Response Analysis and Concurrent Validity

PubMed Central

Kenzik, Kelly M.; Huang, I-Chan; Brinkman, Tara M.; Baughman, Brandon; Ness, Kirsten K.; Shenkman, Elizabeth A.; Hudson, Melissa M.; Robison, Leslie L.; Krull, Kevin R.

2014-01-01

Objective Childhood cancer survivors are at risk for neurocognitive impairment related to cancer diagnosis or treatment. This study refined and further validated the Childhood Cancer Survivor Study Neurocognitive Questionnaire (CCSS-NCQ), a scale developed to screen for impairment in long-term survivors of childhood cancer. Method Items related to task efficiency, memory, organization and emotional regulation domains were examined using item response theory (IRT). Data were collected from 833 adult survivors of childhood cancer in the St. Jude Lifetime Cohort Study who completed self-report and direct neurocognitive testing. The revision process included: 1) content validity mapping of items to domains, 2) constructing a revised CCSS-NCQ, 3) selecting items within specific domains using IRT, and 4) evaluating concordance between the revised CCSS-NCQ and direct neurocognitive assessment. Results Using content and measurement properties, 32 items were retained (8 items in 4 domains). Items captured low to middle levels of neurocognitive concerns. The latent domain scores demonstrated poor convergent/divergent validity with the direct assessments. Adjusted effect sizes (Cohen's d) for agreement between self-reported memory and direct memory assessment were moderate for total recall (ES=0.66), long-term memory (ES=0.63), and short-term memory (ES=0.55). Effect sizes between self-rated task efficiency and direct assessment of attention were moderate for focused attention (ES=0.70) and attention span (ES=0.50), but small for sustained attention (ES=0.36). Cranial radiation therapy and female gender were associated with lower self-reported neurocognitive function. Conclusion The revised CCSS-NCQ demonstrates adequate measurement properties for assessing day-to-day neurocognitive concerns in childhood cancer survivors, and adds useful information to direct assessment. PMID:24933482
Validity and reliability of a Malay version of the brief illness perception questionnaire for patients with type 2 diabetes mellitus.

PubMed

Chew, Boon-How; Vos, Rimke C; Heijmans, Monique; Shariff-Ghazali, Sazlina; Fernandez, Aaron; Rutten, Guy E H M

2017-08-03

Illness perceptions involve the personal beliefs that patients have about their illness and may influence health behaviours considerably. Since an instrument to measure these perceptions for Malay population in Malaysia is lacking, we translated and examined the psychometric properties of the Malay version of the Brief Illness Perception Questionnaire (MBIPQ) in adult patients with type 2 diabetes mellitus. The MBIPQ has nine items, all use a 0-10 response scale, except the ninth item about causal factors, which is an open-ended item. A standard procedure was used to translate and adapt the English BIPQ into Malay language. Construct validity was examined comparing item scores and scores on the Diabetes Management Self-Efficacy Scale, the Morisky Medication Adherence Scale, the World Health Organization Quality of Life-brief, the 9-item Patient Health Questionnaire, the 17-item Diabetes Distress Scale, HbA1c and the presence of complications. In addition, 2-week and 4-week test-retest reliability were studied. A total of 312 patients completed the MBIPQ. Out of this, 97 and 215 patients completed the 2- or 4-weeks test-retest reliability questionnaire, respectively. Moderate inter-items correlations were observed between illness perception dimensions (r = -0.31 to 0.53). MBIPQ items showed the expected correlations with self-efficacy (r = 0.35), medication adherence (r = 0.29), quality of life (r = -0.17 to 0.31) and depressive symptoms (r = -0.18 to 0.21). People with severe diabetes-related distress also were more concern (t-test = 4.01, p < 0.001) and experienced lower personal control (t-test = 2.07, p = 0.031). People with any diabetes-related complication perceived the consequences as more serious (t-test = 2.04, p = 0.044). The 2-week and 4-week test-retest reliabilities varied between ICC agreement 0.39 to 0.70 and 0.58 to 0.78, respectively. The psychometric properties of items in the MBIPQ are moderate. The MBIPQ showed good cross-cultural validity and moderate construct validity. Test-retest reliability was moderate. Despite the moderate psychometric properties, the MBIPQ may be useful in clinical practice as it is a useful instrument to elicit and communicate on patient's personal thoughts and feelings. Future research is needed to establish its responsiveness and predictive validity. ClinicalTrials.gov NCT02730754 registered on March 29, 2016; NCT02730078 registered on March 29, 2016.
Space construction system analysis. Part 2: Cost and programmatics

NASA Technical Reports Server (NTRS)

Vonflue, F. W.; Cooper, W.

1980-01-01

Cost and programmatic elements of the space construction systems analysis study are discussed. The programmatic aspects of the ETVP program define a comprehensive plan for the development of a space platform, the construction system, and the space shuttle operations/logistics requirements. The cost analysis identified significant items of cost on ETVP development, ground, and flight segments, and detailed the items of space construction equipment and operations.
Confirmatory factor analysis and measurement invariance of the Child Feeding Questionnaire in low-income Hispanic and African-American mothers with preschool-age children.

PubMed

Kong, Angela; Vijayasiri, Ganga; Fitzgibbon, Marian L; Schiffer, Linda A; Campbell, Richard T

2015-07-01

Validation work of the Child Feeding Questionnaire (CFQ) in low-income minority samples suggests a need for further conceptual refinement of this instrument. Using confirmatory factor analysis, this study evaluated 5- and 6-factor models on a large sample of African-American and Hispanic mothers with preschool-age children (n = 962). The 5-factor model included: 'perceived responsibility', 'concern about child's weight', 'restriction', 'pressure to eat', and 'monitoring' and the 6-factor model also tested 'food as a reward'. Multi-group analysis assessed measurement invariance by race/ethnicity. In the 5-factor model, two low-loading items from 'restriction' and one low-variance item from 'perceived responsibility' were dropped to achieve fit. Only removal of the low-variance item was needed to achieve fit in the 6-factor model. Invariance analyses demonstrated differences in factor loadings. This finding suggests African-American and Hispanic mothers may vary in their interpretation of some CFQ items and use of cognitive interviews could enhance item interpretation. Our results also demonstrated that 'food as a reward' is a plausible construct among a low-income minority sample and adds to the evidence that this factor resonates conceptually with parents of preschoolers; however, further testing is needed to determine the validity of this factor with older age groups. Copyright © 2015 Elsevier Ltd. All rights reserved.
Assessing Hopelessness in Terminally Ill Cancer Patients: Development of the Hopelessness Assessment in Illness Questionnaire

PubMed Central

Rosenfeld, Barry; Pessin, Hayley; Lewis, Charles; Abbey, Jennifer; Olden, Megan; Sachs, Emily; Amakawa, Lia; Kolva, Elissa; Brescia, Robert; Breitbart, William

2013-01-01

Hopelessness has become an increasingly important construct in palliative care research, yet concerns exist regarding the utility of existing measures when applied to patients with a terminal illness. This article describes a series of studies focused on the exploration, development, and analysis of a measure of hopelessness specifically intended for use with terminally ill cancer patients. The 1st stage of measure development involved interviews with 13 palliative care experts and 30 terminally ill patients. Qualitative analysis of the patient interviews culminated in the development of a set of potential questionnaire items. In the 2nd study phase, we evaluated these preliminary items with a sample of 314 participants, using item response theory and classical test theory to identify optimal items and response format. These analyses generated an 8-item measure that we tested in a final study phase, using a 3rd sample (n = 228) to assess reliability and concurrent validity. These analyses demonstrated strong support for the Hopelessness Assessment in Illness Questionnaire providing greater explanatory power than existing measures of hopelessness and found little evidence that this assessment was confounded by illness-related variables (e.g., prognosis). In summary, these 3 studies suggest that this brief measure of hopelessness is particularly useful for palliative care settings. Further research is needed to assess the applicability of the measure to other populations and contexts. PMID:21443366
An Item Response Theory Analysis of DSM-IV Cannabis Abuse and Dependence Criteria in Adolescents

PubMed Central

Hartman, Christie A.; Gelhorn, Heather; Crowley, Thomas J.; Sakai, Joseph T.; Stallings, Michael; Young, Susan E.; Rhee, Soo Hyun; Corley, Robin; Hewitt, John K.; Hopfer, Christian J.

2008-01-01

Objective To examine three aspects of adolescent cannabis problems: 1) do DSM-IV cannabis abuse and dependence criteria represent two different levels of severity of substance involvement, 2) to what degree do each of the 11 abuse and dependence criteria assess adolescent cannabis problems, and 3) do the DSM-IV items function similarly across different adolescent populations? Method We examined 5587 adolescents aged 11–19, including 615 youth in treatment for substance use disorders, 179 adjudicated youth, and 4793 youth from the community. All subjects were assessed with a structured diagnostic interview. Item response theory was utilized to analyze symptom endorsement patterns. Results Abuse and dependence criteria were not found to represent different levels of severity of problem cannabis use in any of the samples. Among the 11 abuse and dependence criteria, Problems cutting down and Legal problems were the least informative for distinguishing problem users. Two dependence criteria and three of the four abuse criteria indicated different severities of cannabis problems across samples. Conclusions We found little evidence to support the idea that abuse and dependence are separate constructs for adolescent cannabis problems. Furthermore, certain abuse criteria may indicate severe substance problems while specific dependence items may indicate less severe problems. The abuse items in particular need further study. These results have implications for the refinement of the current substance use disorder criteria for DSM-V. PMID:18176333
The reliability, validity and responsiveness of the Thai version of Systemic Lupus Erythematosus Quality of Life (SLEQOL-TH) instrument.

PubMed

Kasitanon, N; Wangkaew, S; Puntana, S; Sukitawut, W; Leong, K P; Louthrenoo, W

2013-03-01

The English version of the Systemic Lupus Erythematosus Quality of Life Questionnaire (SLEQOL) is a validated disease-specific quality of life instrument. The aim of this study was to evaluate the psychometric properties of the Thai version of the SLEQOL (SLEQOL-TH). Two independent translators translated the SLEQOL into Thai. The back translation of this version was performed by two other independent translators. The final version, SLEQOL-TH, was completed after resolving the discrepancies revealed by the back translation. One hundred and nine patients with SLE were enrolled to test the reliability, construct validity, floor and ceiling effects, and sensitivity to the changes of the SLEQOL-TH at six months. The differential item functioning (DIF) between the Thai and English versions was analyzed using the partial gamma. The internal consistency of the SLEQOL-TH was satisfactory with the overall Cronbach's alpha of 0.86. The test-retest reliability of the SLEQOL-TH was acceptable with the intra-class correlation coefficient of 0.86. Low correlations between the SLEQOL-TH and SLEDAI were observed. The total score of the SLEQOL-TH was moderately responsive to changes in quality of life, with a standardized response mean of 0.50. When comparing the SLEQOL-TH from Thai SLE patients with the original SLEQOL version obtained from Singapore SLE patients, 11 out of 40 items showed a moderate to large DIF. The SLEQOL-TH has acceptable psychometric properties and shows construct validity. In comparison with the English version of SLEQOL, there are some items that showed DIF. The applicability of the SLEQOL-TH in real-life clinical practice and clinical trials needs to be determined.
[Measuring job satisfaction: development of a multidimensional scale].

PubMed

Faraci, Palmira; Valenti, Giusy

2016-01-01

Although numerous studies have been done on the topic ofjob satisfaction, as regards the Italian research, the construction of specific psychometric instruments is lacking. The present paper is aimed to develop a scale to measure job satisfaction referring to our cultural context. Participants were 222 workers (36.5% males, 63.5% females) with an average age of 38.39 years (SD = 10.91). The formulated items were selected from a large item pool on the basis of the evaluation by a group of expert judges, and the item analysis procedure. In order to establish test validity, the following instruments were also administered: Occupational Stress Indicator, Satisfaction With Life Scale, Rosenberg Self-Esteem Scale, Multidimensional Scale of Perceived Social Support, and Beck Depression Inventory. Both exploratory and confirmatory factor analyses highlighted a 6-factor structure. Those factors were responsible for 51.30% of the total variance. Reliability analyses indicated satisfying internal consistency (ranging from alpha = .73 to alpha = .86). Construct validity was supported by results obtained calculating correlations with the theoretically associated variables. Our findings suggest promising psychometric properties for the presented measure. The instrument could be used in specific programs developed to promote well-being conditions in work settings.
Improving Measures via Examining the Behavior of Distractors in Multiple-Choice Tests

PubMed Central

Sideridis, Georgios; Tsaousis, Ioannis; Al Harbi, Khaleel

2017-01-01

The purpose of the present article was to illustrate, using an example from a national assessment, the value from analyzing the behavior of distractors in measures that engage the multiple-choice format. A secondary purpose of the present article was to illustrate four remedial actions that can potentially improve the measurement of the construct(s) under study. Participants were 2,248 individuals who took a national examination of chemistry. The behavior of the distractors was analyzed by modeling their behavior within the Rasch model. Potentially informative distractors were (a) further modeled using the partial credit model, (b) split onto separate items and retested for model fit and parsimony, (c) combined to form a “super” item or testlet, and (d) reexamined after deleting low-ability individuals who likely guessed on those informative, albeit erroneous, distractors. Results indicated that all but the item split strategies were associated with better model fit compared with the original model. The best fitted model, however, involved modeling and crediting informative distractors via the partial credit model or eliminating the responses of low-ability individuals who likely guessed on informative distractors. The implications, advantages, and disadvantages of modeling informative distractors for measurement purposes are discussed. PMID:29795904
The breastfeeding self-efficacy scale: psychometric assessment of the short form.

PubMed

Dennis, Cindy-Lee

2003-01-01

The purpose of this study was to reduce the number of items on the original Breastfeeding Self-Efficacy Scale (BSES) and psychometrically assess the revised BSES-Short Form (BSES-SF). As part of a longitudinal study, participants completed mailed questionnaires at 1, 4, and 8 weeks postpartum. Health region in British Columbia. A population-based sample of 491 breastfeeding mothers. BSES, Edinburgh Postnatal Depression Scale, Rosenberg Self-Esteem Scale, and Perceived Stress Scale. Internal consistency statistics with the original BSES suggested item redundancy. As such, 18 items were deleted, using explicit reduction criteria. Based on the encouraging reliability analysis of the new 14-item BSES-SF, construct validity was assessed using principal components factor analysis, comparison of contrasted groups, and correlations with measures of similar constructs. Support for predictive validity was demonstrated through significant mean differences between breastfeeding and bottle feeding mothers at 4 (p < .001) and 8 (p < .001) weeks postpartum. Demographic response patterns suggested the BSES-SF is a unique tool to identify mothers at risk of prematurely discontinuing breastfeeding. These psychometric results indicate the BSES-SF is an excellent measure of breastfeeding self-efficacy and considered ready for clinical use to (a) identify breastfeeding mothers at high risk, (b) assess breastfeeding behaviors and cognitions to individualize confidence-building strategies, and (c) evaluate the effectiveness of various interventions and guide program development.
Development of the Parent Responses to School Functioning Questionnaire.

PubMed

Barber Garcia, Brittany N; Gray, Laura S; Simons, Laura E; Logan, Deirdre E

2017-10-01

Parents play an important role in supporting school functioning in youth with chronic pain, but no validated tools exists to assess parental responses to child and adolescent pain behaviors in the school context. Such a tool would be useful in identifying targets of change to reduce pain-related school impairment. The goal of this study was to develop and preliminarily validate the Parent Responses to School Functioning Questionnaire (PRSF), a parent self-report measure of this construct. After initial expert review and pilot testing, the measure was administered to 418 parents of children (ages 6-17 years) seen for initial multidisciplinary chronic pain clinic evaluation. The final 16-item PRSF showed evidence of good internal consistency (α = .82) and 2-week test-retest reliability (intraclass correlation coefficient = .87). Criterion validity was demonstrated by significant correlations with school absence rates and overall school functioning, and construct validity was demonstrated by correlations with general parental responses to pain. Three subscales emerged capturing parents' personal distress, parents' level of distrust of the school, and parents' expectations and behaviors related to their child's management of challenging school situations. These results provide preliminary support for the PRSF as a psychometrically sound tool to assess parents' responses to child pain in the school setting. The 16-item PRSF measures parental responses to their child's chronic pain in the school context. The clinically useful measure can inform interventions aimed reducing functional disability in children with chronic pain by enhancing parents' ability to respond adaptively to child pain behaviors. Copyright © 2017 American Pain Society. Published by Elsevier Inc. All rights reserved.
Development and validation of a scale for mouth handicap in systemic sclerosis: the Mouth Handicap in Systemic Sclerosis scale

PubMed Central

Mouthon, L; Rannou, F; Bérezné, A; Pagnoux, C; Arène, J‐P; Foïs, E; Cabane, J; Guillevin, L; Revel, M; Fermanian, J; Poiraudeau, S

2007-01-01

Objective To develop and assess the reliability and construct validity of a scale assessing disability involving the mouth in systemic sclerosis (SSc). Methods We generated a 34‐item provisional scale from mailed responses of patients (n = 74), expert consensus (n = 10) and literature analysis. A total of 71 other SSc patients were recruited. The test–retest reliability was assessed using the intraclass coefficient correlation and divergent validity using the Spearman correlation coefficient. Factor analysis followed by varimax rotation was performed to assess the factorial structure of the scale. Results The item reduction process retained 12 items with 5 levels of answers (total score range 0–48). The mean total score of the scale was 20.3 (SD 9.7). The test–retest reliability was 0.96. Divergent validity was confirmed for global disability (Health Assessment Questionnaire (HAQ), r = 0.33), hand function (Cochin Hand Function Scale, r = 0.37), inter‐incisor distance (r = −0.34), handicap (McMaster‐Toronto Arthritis questionnaire (MACTAR), r = 0.24), depression (Hospital Anxiety and Depression (HAD); HADd, r = 0.26) and anxiety (HADa, r = 0.17). Factor analysis extracted 3 factors with eigenvalues of 4.26, 1.76 and 1.47, explaining 63% of the variance. These 3 factors could be clinically characterised. The first factor (5 items) represents handicap induced by the reduction in mouth opening, the second (5 items) handicap induced by sicca syndrome and the third (2 items) aesthetic concerns. Conclusion We propose a new scale, the Mouth Handicap in Systemic Sclerosis (MHISS) scale, which has excellent reliability and good construct validity, and assesses specifically disability involving the mouth in patients with SSc. PMID:17502364

Development and validation of the simulation-based learning evaluation scale.

PubMed

Hung, Chang-Chiao; Liu, Hsiu-Chen; Lin, Chun-Chih; Lee, Bih-O

2016-05-01

The instruments that evaluate a student's perception of receiving simulated training are English versions and have not been tested for reliability or validity. The aim of this study was to develop and validate a Chinese version Simulation-Based Learning Evaluation Scale (SBLES). Four stages were conducted to develop and validate the SBLES. First, specific desired competencies were identified according to the National League for Nursing and Taiwan Nursing Accreditation Council core competencies. Next, the initial item pool was comprised of 50 items related to simulation that were drawn from the literature of core competencies. Content validity was established by use of an expert panel. Finally, exploratory factor analysis and confirmatory factor analysis were conducted for construct validity, and Cronbach's coefficient alpha determined the scale's internal consistency reliability. Two hundred and fifty students who had experienced simulation-based learning were invited to participate in this study. Two hundred and twenty-five students completed and returned questionnaires (response rate=90%). Six items were deleted from the initial item pool and one was added after an expert panel review. Exploratory factor analysis with varimax rotation revealed 37 items remaining in five factors which accounted for 67% of the variance. The construct validity of SBLES was substantiated in a confirmatory factor analysis that revealed a good fit of the hypothesized factor structure. The findings tally with the criterion of convergent and discriminant validity. The range of internal consistency for five subscales was .90 to .93. Items were rated on a 5-point scale from 1 (strongly disagree) to 5 (strongly agree). The results of this study indicate that the SBLES is valid and reliable. The authors recommend that the scale could be applied in the nursing school to evaluate the effectiveness of simulation-based learning curricula. Copyright © 2016 Elsevier Ltd. All rights reserved.
Validation of the CMT Pediatric Scale as an outcome measure of disability

PubMed Central

Burns, Joshua; Ouvrier, Robert; Estilow, Tim; Shy, Rosemary; Laurá, Matilde; Pallant, Julie F.; Lek, Monkol; Muntoni, Francesco; Reilly, Mary M.; Pareyson, Davide; Acsadi, Gyula; Shy, Michael E.; Finkel, Richard S.

2012-01-01

Objective Charcot-Marie-Tooth disease (CMT) is a common heritable peripheral neuropathy. There is no treatment for any form of CMT although clinical trials are increasingly occurring. Patients usually develop symptoms during the first two decades of life but there are no established outcome measures of disease severity or response to treatment. We identified a set of items that represent a range of impairment levels and conducted a series of validation studies to build a patient-centered multi-item rating scale of disability for children with CMT. Methods As part of the Inherited Neuropathies Consortium, patients aged 3–20 years with a variety of CMT types were recruited from the USA, UK, Italy and Australia. Initial development stages involved: definition of the construct, item pool generation, peer review and pilot testing. Based on data from 172 patients, a series of validation studies were conducted, including: item and factor analysis, reliability testing, Rasch modeling and sensitivity analysis. Results Seven areas for measurement were identified (strength, dexterity, sensation, gait, balance, power, endurance), and a psychometrically robust 11-item scale constructed (Charcot-Marie-Tooth disease Pediatric Scale: CMTPedS). Rasch analysis supported the viability of the CMTPedS as a unidimensional measure of disability in children with CMT. It showed good overall model fit, no evidence of misfitting items, no person misfit and it was well targeted for children with CMT. Interpretation The CMTPedS is a well-tolerated outcome measure that can be completed in 25-minutes. It is a reliable, valid and sensitive global measure of disability for children with CMT from the age of 3 years. PMID:22522479
A New Statistic for Evaluating Item Response Theory Models for Ordinal Data. CRESST Report 839

ERIC Educational Resources Information Center

Cai, Li; Monroe, Scott

2014-01-01

We propose a new limited-information goodness of fit test statistic C[subscript 2] for ordinal IRT models. The construction of the new statistic lies formally between the M[subscript 2] statistic of Maydeu-Olivares and Joe (2006), which utilizes first and second order marginal probabilities, and the M*[subscript 2] statistic of Cai and Hansen…
Australian Secondary Students' Views about Global Warming: Beliefs about Actions, and Willingness to Act

ERIC Educational Resources Information Center

Boyes, Edward; Skamp, Keith; Stanisstreet, Martin

2009-01-01

A 44-item questionnaire was constructed to determine secondary students' views about how useful various specific actions might be at reducing global warming, their willingness to undertake the various actions, and the extent to which these two might be linked. Responses (n = 500) were obtained from students in years 7 to 10 in three schools in…
The Reliability and Construct Validity of American College Students' Responses to the WHOQOL-BREF

ERIC Educational Resources Information Center

D'Abundo, Michelle; Orsini, M. M.; Milroy, J. J.; Sidman, C. L.

2011-01-01

The World Health Organization Quality of Life (WHOQOL-100) instrument was developed to assess quality of life from a multi-dimensional perspective. A shorter 26-item version of the instrument was created called the WHOQOL-BREF, which is the focus of this study. Based on previous research, it is unclear if the WHOQOL-BREF instrument is appropriate…
Asymmetry in Student Achievement on Multiple-Choice and Constructed-Response Items in Reversible Mathematics Processes

ERIC Educational Resources Information Center

Sangwin, Christopher J.; Jones, Ian

2017-01-01

In this paper we report the results of an experiment designed to test the hypothesis that when faced with a question involving the inverse direction of a reversible mathematical process, students solve a multiple-choice version by verifying the answers presented to them by the direct method, not by undertaking the actual inverse calculation.…
Development of a Measure of Drinking and Driving Expectancies for Youth

ERIC Educational Resources Information Center

McCarthy, Denis M.; Pedersen, Sarah L.; Thompsen, Dana M.; Leuty, Melanie E.

2006-01-01

The present study constructs and provides initial validation for a measure of positive expectancies for drinking and driving for use with adolescents and young adults (PEDD-Y). In Study 1, items were generated through open-ended responses from high school- and college-age youth. Data collected from a 2nd sample of college students (n = 404)…
[Validity and reliability of the Culture of Quality Health Services questionnaire in Mexico].

PubMed

Herrera-Kiengelher, L; Zepeda-Zaragoza, J; Austria-Corrales, F; Vázquez-Zarate, V M

2013-01-01

Patient Safety is a major public health problem worldwide and is responsibility of all those involved in health care. Establishing a Safety Culture has proved to be a factor that favors the integration of work teams, communication and construction of clear procedures in various organizations. Promote a culture of safety depends on several factors, such as organization, work unit and staff. Objective assessment of these factors will help to identify areas for improvement and establish strategic lines of action. [corrected] To adapt, validate and calibrate the questionnaire Culture of Quality in Health Services (CQHS) in Mexican population. A cross with a stratified representative sample of 522 health workers. The questionnaire was translated and adapted from Singer's. Content was validated by experts, internal consistency, confirmatory factorial validity and item calibration with Samejima's Graded Response Model. Convergent and divergent construct validity was confirmed from the CQHS, item calibration showed that the questionnaire is able to discriminate between patients and represent different levels of the hypothesized dimensions with greater accuracy and lower standard error. The CQHS is a valid and reliable instrument to assess patient safety culture in hospitals in Mexico. Copyright © 2013 SECA. Published by Elsevier Espana. All rights reserved.
Development and validation of instrument for ergonomic evaluation of tablet arm chairs

PubMed Central

Tirloni, Adriana Seára; dos Reis, Diogo Cunha; Bornia, Antonio Cezar; de Andrade, Dalton Francisco; Borgatto, Adriano Ferreti; Moro, Antônio Renato Pereira

2016-01-01

The purpose of this study was to develop and validate an evaluation instrument for tablet arm chairs based on ergonomic requirements, focused on user perceptions and using Item Response Theory (IRT). This exploratory study involved 1,633 participants (university students and professors) in four steps: a pilot study (n=26), semantic validation (n=430), content validation (n=11) and construct validation (n=1,166). Samejima's graded response model was applied to validate the instrument. The results showed that all the steps (theoretical and practical) of the instrument's development and validation processes were successful and that the group of remaining items (n=45) had a high consistency (0.95). This instrument can be used in the furniture industry by engineers and product designers and in the purchasing process of tablet arm chairs for schools, universities and auditoriums. PMID:28337099
Reconceptualizing Efficacy in Substance Use Prevention Research: Refusal Response Efficacy and Drug Resistance Self-Efficacy in Adolescent Substance Use

PubMed Central

Choi, Hye Jeong; Krieger, Janice L.; Hecht, Michael L.

2014-01-01

The purpose of this study is to utilize the Extended Parallel Process Model (EPPM) to expand the construct of efficacy in the adolescent substance use context. Using survey data collected from 2,129 seventh-grade students in 39 rural schools, we examined the construct of drug refusal efficacy and demonstrated relationships among response efficacy (RE), self-efficacy (SE), and adolescent drug use. Consistent with the hypotheses, confirmatory factor analyses of a 12-item scale yielded a three-factor solution: refusal RE, alcohol-resistance self-efficacy (ASE), and marijuana-resistance self-efficacy (MSE). Refusal RE and ASE/MSE were negatively related to alcohol use and marijuana use, whereas MSE was positively associated with alcohol use. These data demonstrate that efficacy is a broader construct than typically considered in drug prevention. Prevention programs should reinforce both refusal RE and substance-specific resistance SE. PMID:23330857
Development and Validation of the Consumer Health Activation Index.

PubMed

Wolf, Michael S; Smith, Samuel G; Pandit, Anjali U; Condon, David M; Curtis, Laura M; Griffith, James; O'Conor, Rachel; Rush, Steven; Bailey, Stacy C; Kaplan, Gordon; Haufle, Vincent; Martin, David

2018-04-01

Although there has been increasing interest in patient engagement, few measures are publicly available and suitable for patients with limited health literacy. We sought to develop a Consumer Health Activation Index (CHAI) for use among diverse patients. Expert opinion, a systematic literature review, focus groups, and cognitive interviews with patients were used to create and revise a potential set of items. Psychometric testing guided by item response theory was then conducted among 301 English-speaking, community-dwelling adults. This included differential item functioning analyses to evaluate item performance across participant health literacy levels. To determine construct validity, CHAI scores were compared to scales measuring similar personality constructs. Associations between the CHAI and physical and mental health established predictive validity. A second study among 9,478 adults was used to confirm CHAI associations with health outcomes. Exploratory factor analyses revealed a single-factor solution with a 10-item scale. The CHAI showed good internal consistency (alpha = 0.81) and moderate test-retest reliability (ICC = 0.53). Reading grade level was found to be at the 6 th grade. Moderate to strong correlations were found with similar constructs (Multidimensional Health Locus of Control, r = 0.38, P < 0.001; Conscientiousness, r = 0.41, P < 0.001). Predictive validity was demonstrated through associations with functional health status measures (depression, r = -0.28, P < 0.001; anxiety, r = -0.22, P < 0.001; and physical functioning, r = 0.22, P < 0.001). In the validation sample, the CHAI was significantly associated with self-reported physical and mental health ( r = 0.31 and 0.32 respectively; both P < 0.001). The CHAI appears to be a valid, reliable, and easily administered tool that can be used to assess health activation among adults, including those with limited health literacy. Future studies should test the tool in actual use and explore further applications.
A Study of the Homogeneity of Items Produced From Item Forms Across Different Taxonomic Levels.

ERIC Educational Resources Information Center

Weber, Margaret B.; Argo, Jana K.

This study determined whether item forms ( rules for constructing items related to a domain or set of tasks) would enable naive item writers to generate multiple-choice items at three taxonomic levels--knowledge, comprehension, and application. Students wrote 120 multiple-choice items from 20 item forms, corresponding to educational objectives…
Overview and current management of computerized adaptive testing in licensing/certification examinations.

PubMed

Seo, Dong Gi

2017-01-01

Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees' ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.
Overview and current management of computerized adaptive testing in licensing/certification examinations

PubMed Central

2017-01-01

Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations. PMID:28811394
Development and validation of the Myasthenia Gravis Impairment Index.

PubMed

Barnett, Carolina; Bril, Vera; Kapral, Moira; Kulkarni, Abhaya; Davis, Aileen M

2016-08-30

We aimed to develop a measure of myasthenia gravis impairment using a previously developed framework and to evaluate reliability and validity, specifically face, content, and construct validity. The first draft of the Myasthenia Gravis Impairment Index (MGII) included examination items from available measures enriched with newly developed, patient-reported items, modified after patient input. International neuromuscular specialists evaluated face and content validity via an e-mail survey. Test-retest reliability was assessed in stable patients at a 3-week interval and interrater reliability was evaluated in the same day. Construct validity was assessed through correlations between the MGII and other measures and by comparing scores in different patient groups. The first draft was assessed by 18 patients, and 72 specialists answered the survey. The second draft had 7 examination and 22 patient-reported items. Field testing included 200 patients, with 54 patients completing the reliability studies. Test-retest reliability of the total score was good (intraclass correlation coefficient 0.92; 95% confidence interval 0.79-0.94), as was interrater reliability of the examination component (intraclass correlation coefficient 0.81; 95% confidence interval 0.79-0.94). The MGII correlated well with comparison measures, with higher correlations with the MG-activities of daily living (r = 0.91) and MG-specific quality of life 15-item scale (r = 0.78). When assessing different patient groups, the scores followed expected patterns. The MGII was developed using a patient-centered framework of myasthenia-related impairments and incorporating patient input throughout the development process. It is reliable in an outpatient setting and has demonstrated construct validity. Responsiveness studies are under way. © 2016 American Academy of Neurology.
Development and validation of the Myasthenia Gravis Impairment Index

PubMed Central

Bril, Vera; Kapral, Moira; Kulkarni, Abhaya; Davis, Aileen M.

2016-01-01

Objective: We aimed to develop a measure of myasthenia gravis impairment using a previously developed framework and to evaluate reliability and validity, specifically face, content, and construct validity. Methods: The first draft of the Myasthenia Gravis Impairment Index (MGII) included examination items from available measures enriched with newly developed, patient-reported items, modified after patient input. International neuromuscular specialists evaluated face and content validity via an e-mail survey. Test–retest reliability was assessed in stable patients at a 3-week interval and interrater reliability was evaluated in the same day. Construct validity was assessed through correlations between the MGII and other measures and by comparing scores in different patient groups. Results: The first draft was assessed by 18 patients, and 72 specialists answered the survey. The second draft had 7 examination and 22 patient-reported items. Field testing included 200 patients, with 54 patients completing the reliability studies. Test–retest reliability of the total score was good (intraclass correlation coefficient 0.92; 95% confidence interval 0.79–0.94), as was interrater reliability of the examination component (intraclass correlation coefficient 0.81; 95% confidence interval 0.79–0.94). The MGII correlated well with comparison measures, with higher correlations with the MG–activities of daily living (r = 0.91) and MG-specific quality of life 15-item scale (r = 0.78). When assessing different patient groups, the scores followed expected patterns. Conclusions: The MGII was developed using a patient-centered framework of myasthenia-related impairments and incorporating patient input throughout the development process. It is reliable in an outpatient setting and has demonstrated construct validity. Responsiveness studies are under way. PMID:27402891
The development of the physical fitness construct across childhood.

PubMed

Utesch, T; Dreiskämper, D; Strauss, B; Naul, R

2018-01-01

The measurement of physical fitness (PF) is an important factor from many different perspectives. PF is a determinant of healthy child development as it is related to several health outcomes. However, existing taxonomies of the construct and frequently used fitness assessments vary concerning their theoretical assumptions and practical implications. From a theoretical perspective, the construct of physical fitness covers a variety of motor domains, such as cardiovascular endurance, strength, coordination, or flexibility (eg, Caspersen et al., 1985). However, most fitness assessments provide a single (composite) score including all items as test outcome. This implicitly relates to a one-dimensional structure of physical fitness, which has been shown for other motor performance assessments in early childhood (eg, Utesch et al., 2016). This study investigated this one-dimensional structure for 6- to 9-year-old children within the item response theory framework (Partial Credit Model). Seven fitness subtests covering a variety of motor dimensions (6-minute run, pushups, sit-ups, standing broad jump, 20 m sprint, jumping sideways, and balancing backwards) were conducted to a total of 790 six-year-olds, 1371 seven-year-olds, 1331 eight-year-olds, and 925 nine-year-olds (48.2% females). Each item was transformed into five performance categories controlling for sex and age. This study indicates that a one-dimensional testing of PF is feasible across middle childhood. Furthermore, for 6- and 7-year-olds, all seven items including balancing backwards can be accumulated to one factor. From the age of about 8 and 9 years balancing backwards seems to become too easy. Altogether, analyses show no diversification of PF across childhood. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Measuring participation in patients with chronic back pain-the 5-Item Pain Disability Index.

PubMed

McKillop, Ashley B; Carroll, Linda J; Dick, Bruce D; Battié, Michele C

2018-02-01

Of the three broad outcome domains of body functions and structures, activities, and participation (eg, engaging in valued social roles) outlined in the World Health Organization's (WHO) International Classification of Functioning, Disability and Health (ICF), it has been argued that participation is the most important to individuals, particularly those with chronic health problems. Yet, participation is not commonly measured in back pain research. The aim of this study was to investigate the construct validity of a modified 5-Item Pain Disability Index (PDI) score as a measure of participation in people with chronic back pain. A validation study was conducted using cross-sectional data. Participants with chronic back pain were recruited from a multidisciplinary pain center in Alberta, Canada. The outcome measure of interest is the 5-Item PDI. Each study participant was given a questionnaire package containing measures of participation, resilience, anxiety and depression, pain intensity, and pain-related disability, in addition to the PDI. The first five items of the PDI deal with social roles involving family responsibilities, recreation, social activities with friends, work, and sexual behavior, and comprised the 5-Item PDI seeking to measure participation. The last two items of the PDI deal with self-care and life support functions and were excluded. Construct validity of the 5-Item PDI as a measure of participation was examined using Pearson correlations or point-biserial correlations to test each hypothesized association. Participants were 70 people with chronic back pain and a mean age of 48.1 years. Forty-four (62.9%) were women. As hypothesized, the 5-Item PDI was associated with all measures of participation, including the Participation Assessment with Recombined Tools-Objective (r=-0.61), Late-Life Function and Disability Instrument: Disability Component (frequency: r=-0.66; limitation: r=-0.65), Work and Social Adjustment Scale (r=0.85), a global perceived participation scale (r=0.54), employment status (r=-0.30), and the Usual Activity domain of the 15D (r=0.50). The expected correlations observed indicating a moderate or strong association provided supporting evidence for the construct validity of the 5-Item PDI as a measure of participation. The Oswestry Disability Index and the 5-Item PDI were also strongly correlated (r=0.70). The 5-Item PDI was associated to a lesser degree with depressive symptoms and resilience, as measured by the Hospital Anxiety and Depression Scale (HADS) (r=0.25) and the Connor-Davidson Resilience Scale (r=-0.28), as would be expected. No statistically significant association was found between the 5-Item PDI and the HADS Anxiety score. It is important that outcome measures of participation are included in back pain research to gauge the effects of painful spinal conditions and interventions on maintaining valued social roles. A simple, concise measure would be very useful for this purpose in clinical and research settings. The results of this study support the construct validity of the 5-Item PDI as a brief measure of participation in people with chronic back pain. These findings are likely most applicable to those with chronic back pain attending pain clinics and other tertiary centers for care. Copyright © 2017 Elsevier Inc. All rights reserved.
An Extended Validity Argument for Assessing Feedback Culture.

PubMed

Rougas, Steven; Clyne, Brian; Cianciolo, Anna T; Chan, Teresa M; Sherbino, Jonathan; Yarris, Lalena M

2015-01-01

NEGEA 2015 CONFERENCE ABSTRACT (EDITED): Measuring an Organization's Culture of Feedback: Can It Be Done? Steven Rougas and Brian Clyne. CONSTRUCT: This study sought to develop a construct for measuring formative feedback culture in an academic emergency medicine department. Four archetypes (Market, Adhocracy, Clan, Hierarchy) reflecting an organization's values with respect to focus (internal vs. external) and process (flexibility vs. stability and control) were used to characterize one department's receptiveness to formative feedback. The prevalence of residents' identification with certain archetypes served as an indicator of the department's organizational feedback culture. New regulations have forced academic institutions to implement wide-ranging changes to accommodate competency-based milestones and their assessment. These changes challenge residencies that use formative feedback from faculty as a major source of data for determining training advancement. Though various approaches have been taken to improve formative feedback to residents, there currently exists no tool to objectively measure the organizational culture that surrounds this process. Assessing organizational culture, commonly used in the business sector to represent organizational health, may help residency directors gauge their program's success in fostering formative feedback. The Organizational Culture Assessment Instrument (OCAI) is widely used, extensively validated, applicable to survey research, and theoretically based and may be modifiable to assess formative feedback culture in the emergency department. Using a modified Delphi technique and several iterations of focus groups amongst educators at one institution, four of the original six OCAI domains (which each contain 4 possible responses) were modified to create a 16-item Formative Feedback Culture Tool (FFCT) that was administered to 26 residents (response rate = 55%) at a single academic emergency medicine department. The mean score of each item on the FFCT (range = 0-100) was analyzed. Convergent and divergent properties of the four archetypes were assessed using a multitrait-multimethod matrix of Pearson's coefficients. Expecting that items in one archetype would diverge from the others, whereas items within an archetype should have strong convergent properties, convergent validity was assessed by comparing items across domains that all related to the same archetype. Similarly, divergent validity was assessed by comparing the correlation of items within an archetype to the correlations of those items within a hetero-domain block (i.e., to other items within the same domain). Three of the four domains of the FFCT (Overall Departmental Characteristics 35.4 ± 15.4, Departmental Foundation of Feedback 46.1 ± 16.7, and Departmental Emphasis of Feedback 30.3 ± 17.7) had the highest mean in the Market archetype (results/achievement oriented), whereas the final domain (Departmental Definition of Successful Feedback 34.8 ± 22.1) had the highest mean in the Clan archetype (personal growth/team achievement). Item responses in the Clan and Hierarchy archetypes had the strongest convergent and divergent validity, respectively. Item responses in the Adhocracy archetype had the weakest convergent and divergent validity. Although the sample size was small, this initial study demonstrates that a modified organizational culture assessment tool can feasibly be utilized to identify the primary formative feedback archetype of a cohort of residents. This may have future implications for measuring changes in culture after the implementation of strategic programs to address formative feedback. Future studies should examine the generalizability of the FFCT to other institutions, as well as address the weak validity evidence of the Adhocracy archetype in the FFCT.
Psychometric properties of the Multidimensional Assessment of Fatigue scale in traumatic brain injury: an NIDRR Traumatic Brain Injury Model Systems study.

PubMed

Lequerica, Anthony; Bushnik, Tamara; Wright, Jerry; Kolakowsky-Hayner, Stephanie A; Hammond, Flora M; Dijkers, Marcel P; Cantor, Joshua

2012-01-01

To investigate the psychometric properties of the Multidimensional Assessment of Fatigue (MAF) scale in a traumatic brain injury (TBI) sample. Prospective survey study. Community. One hundred sixty-seven individuals with TBI admitted for inpatient rehabilitation, enrolled into the TBI Model Systems national database, and followed up at either the first or second year postinjury. Not applicable. Multidimensional Assessment of Fatigue. The initial analysis, using items 1 to 14, which are based on a 10-point rating scale, found that only 1 item ("walking") misfit the overall construct of fatigue in this TBI population. However, this 10-point rating scale was found to have disordered thresholds. When ratings were collapsed into 4 response categories, all MAF items used to calculate the Global Fatigue Index formed a unidimensional scale. Findings generally support the unidimensionality of the MAF when used in a TBI population but call into question the use of a 10-point rating scale for items 1 to 14. Further study is needed to investigate the use of a 4-category rating scale across all items and the fit of the "walking" item for a measure of fatigue among individuals with TBI.

A New Look at the Psychometrics of the Parenting Scale through the Lens of Item Response Theory

PubMed Central

Lorber, Michael F.; Xu, Shu; Smith Slep, Amy M.; Bulling, Lisanne; O'Leary, Susan G.

2015-01-01

The psychometrics of the Parenting Scale's Overreactivity and Laxness subscales were evaluated using item response theory (IRT) techniques. The IRT analyses were based on two community samples of cohabiting parents of 3- to 8-year-old children, combined to yield an N of 852 families. The results supported the utility of the Overreactivity and Laxness subscales, particularly in discriminating among parents in the mid to upper reaches of each construct. The original versions of the Overreactivity and Laxness subscales were more reliable than alternative, shorter versions identified in replicated factor analyses from previously published research and in IRT analyses in the present research. Moreover, in several cases, the original versions of these subscales, in comparison with the shortened versions, exhibited greater six-month stabilities and correlations with child externalizing behavior and couple relationship satisfaction. Reliability was greater for the Laxness than for the Overreactivity subscale. Item performance on each subscale was highly variable. Together, the present findings are generally supportive of the psychometrics of the Parenting Scale, particularly for clinical research and practice. They also suggest areas for further development. PMID:24828855
A new look at the psychometrics of the parenting scale through the lens of item response theory.

PubMed

Lorber, Michael F; Xu, Shu; Slep, Amy M Smith; Bulling, Lisanne; O'Leary, Susan G

2014-01-01

The psychometrics of the Parenting Scale's Overreactivity and Laxness subscales were evaluated using item response theory (IRT) techniques. The IRT analyses were based on 2 community samples of cohabiting parents of 3- to 8-year-old children, combined to yield a total sample size of 852 families. The results supported the utility of the Overreactivity and Laxness subscales, particularly in discriminating among parents in the mid to upper reaches of each construct. The original versions of the Overreactivity and Laxness subscales were more reliable than alternative, shorter versions identified in replicated factor analyses from previously published research and in IRT analyses in the present research. Moreover, in several cases, the original versions of these subscales, in comparison with the shortened versions, exhibited greater 6-month stabilities and correlations with child externalizing behavior and couple relationship satisfaction. Reliability was greater for the Laxness than for the Overreactivity subscale. Item performance on each subscale was highly variable. Together, the present findings are generally supportive of the psychometrics of the Parenting Scale, particularly for clinical research and practice. They also suggest areas for further development.
Dutch-Flemish translation of 17 item banks from the patient-reported outcomes measurement information system (PROMIS).

PubMed

Terwee, C B; Roorda, L D; de Vet, H C W; Dekker, J; Westhovens, R; van Leeuwen, J; Cella, D; Correia, H; Arnold, B; Perez, B; Boers, M

2014-08-01

The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is a new, state-of-the-art assessment system for measuring patient-reported health and well-being of adults and children that has the potential to be more valid, reliable and responsive than existing PROMs. The PROMIS items can be administered in short forms or, more efficiently, through computerized adaptive testing. This paper describes the translation of 563 items from 17 PROMIS item banks (domains) for adults from the English source into Dutch-Flemish. The translation was performed by FACITtrans using standardized methodology and approved by the PROMIS Statistical Center. The translation included four forward translations, two back-translations, three to five independent reviews (at least two Dutch, one Flemish) and pre-testing in 70 adults (age range 20-77) from the Netherlands and Flanders. A small number of items required separate translations for Dutch and Flemish: physical function (five items), pain behaviour (two items), pain interference (one item), social isolation (one item) and global health (one item). Challenges faced in the translation process included: scarcity or overabundance of possible translations, unclear item descriptions, constructs broader/smaller in the target language, difficulties in rank ordering items, differences in unit of measurement, irrelevant items or differences in performance of activities. By addressing these challenges, acceptable translations were obtained for all items. The methodology used and experience gained in this study can be used as an example for researchers in other countries interested in translating PROMIS. The Dutch-Flemish PROMIS items are linguistically equivalent. Short forms will soon be available for use and entire item banks are ready for cross-cultural validation in the Netherlands and Flanders.
Differential Performance by English Language Learners on an Inquiry-Based Science Assessment

NASA Astrophysics Data System (ADS)

Turkan, Sultan; Liu, Ou Lydia

2012-10-01

The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs.
Realizing a Rasch measurement through instructionally- sequenced domains of test items.

NASA Astrophysics Data System (ADS)

Schulz, E. Matthew

2016-11-01

This paper presents results from a project in which instructionally-sequenced domains were defined for purposes of constructing measures that that conform to an ideal in Guttman scaling and Rasch measurement. A fundamental idea in these measurement systems is that every person higher on the measurement scale can do everything that lower-level persons can do, plus at least one more thing. This idea has had limited application in educational measurement due to the stochastic nature of item response data and the sheer number of items needed to obtain reliable measures. However, it has been shown by Schulz, Lee, and Mullen [1] that this ideal can be can be realized at a higher level of abstraction - when items within a content strand are aggregated into a small number of domains that are ordered in instructional timing and difficulty. The present paper shows how this was done, and the results, in an achievement level setting project for the 2007 Grade 12 NAEP Economics Assessment.
A Psychometric Evaluation of the DSM-IV Criteria for Antisocial Personality Disorder: Dimensionality, Local Reliability, and Differential Item Functioning Across Gender.

PubMed

Paap, Muirne C S; Braeken, Johan; Pedersen, Geir; Urnes, Øyvind; Karterud, Sigmund; Wilberg, Theresa; Hummelen, Benjamin

2017-12-01

This study aims at evaluating the psychometric properties of the antisocial personality disorder (ASPD) criteria in a large sample of patients, most of whom had one or more personality disorders (PD). PD diagnoses were assessed by experienced clinicians using the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Axis II PDs. Analyses were performed within an item response theory framework. Results of the analyses indicated that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. Differential item functioning across gender was restricted to two criteria and had little impact on the latent ASPD trait level. Patients fulfilling both the adult ASPD criteria and the conduct disorder criteria had similar latent trait distributions as patients fulfilling only the adult ASPD criteria. Overall, the ASPD items fit the purpose of a diagnostic instrument well, that is, distinguishing patients with moderate from those with high antisocial personality scores.
Proposing a Parkinson's disease-specific tremor scale from the MDS-UPDRS.

PubMed

Forjaz, Maria João; Ayala, Alba; Testa, Claudia M; Bain, Peter G; Elble, Rodger; Haubenberger, Dietrich; Rodriguez-Blazquez, Carmen; Deuschl, Günther; Martinez-Martin, Pablo

2015-07-01

This article proposes an International Parkinson and Movement Disorder Society (MDS)-UPDRS tremor-based scale and describes its measurement properties, with a view to developing an improved scale for assessing tremor in Parkinson's disease (PD). This was a cross-sectional, multicenter study of 435 PD patients. Rasch analysis was performed on the 11 MDS-UPDRS tremor items. Construct validity, precision, and test-retest reliability were also analyzed. After some modifications, which included removal of an item owing to redundancy, the obtained MDS-UPDRS tremor scale showed moderate reliability, unidimensionality, absence of differential item functioning, satisfactory convergent validity with medication, and better precision than the raw sum score. However, the scale displayed a floor effect and a need for more items measuring lower levels of tremor. The MDS-UPDRS tremor scale provides linear scores that can be used to assess tremor in PD in a valid, reliable way. The scale might benefit from modifications and studies that analyze its responsiveness. © 2015 International Parkinson and Movement Disorder Society.
Development and validation of PediaTrac™: A web-based tool to track developing infants.

PubMed

Lajiness-O'Neill, Renée; Brooks, Judith; Lukomski, Angela; Schilling, Stephen; Huth-Bocks, Alissa; Warschausky, Seth; Flores, Ana-Mercedes; Swick, Casey; Nyman, Tristin; Andersen, Tiffany; Morris, Natalie; Schmitt, Thomas A; Bell-Smith, Jennifer; Moir, Barbara; Hodges, Elise K; Lyddy, James E

2018-02-01

PediaTrac™, a 363-item web-based tool to track infant development, administered in modules of ∼40-items per sampling period, newborn (NB), 2--, 4--, 6--, 9-- and 12--months was validated. Caregivers answered demographic, medical, and environmental questions, and questions covering the sensorimotor, feeding/eating, sleep, speech/language, cognition, social-emotional, and attachment domains. Expert Panel Reviews and Cognitive Interviews (CI) were conducted to validate the item bank. Classical Test Theory (CTT) and Item Response Theory (IRT) methods were employed to examine the dimensionality and psychometric properties of PediaTrac with pooled longitudinal and cross-sectional cohorts (N = 132). Intraclass correlation coefficients (ICC) for the Expert Panel Review revealed moderate agreement at 6 -months and good reliability at other sampling periods. ICC estimates for CI revealed moderate reliability regarding clarity of the items at NB and 4 months, good reliability at 2--, 9-- and 12--months and excellent reliability at 6 -months. CTT revealed good coefficient alpha estimates (α ≥ 0.77 for five of the six ages) for the Social-Emotional/Communication, Attachment (α ≥ 0.89 for all ages), and Sensorimotor (α ≥ 0.75 at 6-months) domains, revealing the need for better targeting of sensorimotor items. IRT modeling revealed good reliability (r = 0.85-0.95) for three distinct domains (Feeding/Eating, Social-Emotional/Communication and Attachment) and four subdomains (Feeding Breast/Formula, Feeding Solid Food, Social-Emotional Information Processing, Communication/Cognition). Convergent and discriminant construct validity were demonstrated between our IRT-modeled domains and constructs derived from existing developmental, behavioral and caregiver measures. Our Attachment domain was significantly correlated with existing measures at the NB and 2-month periods, while the Social-Emotional/Communication domain was highly correlated with similar constructs at the 6-, 9- and 12-month periods. PediaTrac has potential for producing novel and effective estimates of infant development via the Sensorimotor, Feeding/Eating, Social-Emotional/Communication and Attachment domains. Copyright © 2018 Elsevier Inc. All rights reserved.
Food parenting practices for 5 to 12 year old children: a concept map analysis of parenting and nutrition experts input.

PubMed

O'Connor, Teresia M; Mâsse, Louise C; Tu, Andrew W; Watts, Allison W; Hughes, Sheryl O; Beauchamp, Mark R; Baranowski, Tom; Pham, Truc; Berge, Jerica M; Fiese, Barbara; Golley, Rebecca; Hingle, Melanie; Kremers, Stef P J; Rhee, Kyung E; Skouteris, Helen; Vaughn, Amber

2017-09-11

Parents are an important influence on children's dietary intake and eating behaviors. However, the lack of a conceptual framework and inconsistent assessment of food parenting practices limits our understanding of which food parenting practices are most influential on children. The aim of this study was to develop a food parenting practice conceptual framework using systematic approaches of literature reviews and expert input. A previously completed systematic review of food parenting practice instruments and a qualitative study of parents informed the development of a food parenting practice item bank consisting of 3632 food parenting practice items. The original item bank was further reduced to 110 key food parenting concepts using binning and winnowing techniques. A panel of 32 experts in parenting and nutrition were invited to sort the food parenting practice concepts into categories that reflected their perceptions of a food parenting practice conceptual framework. Multi-dimensional scaling produced a point map of the sorted concepts and hierarchical cluster analysis identified potential solutions. Subjective modifications were used to identify two potential solutions, with additional feedback from the expert panel requested. The experts came from 8 countries and 25 participated in the sorting and 23 provided additional feedback. A parsimonious and a comprehensive concept map were developed based on the clustering of the food parenting practice constructs. The parsimonious concept map contained 7 constructs, while the comprehensive concept map contained 17 constructs and was informed by a previously published content map for food parenting practices. Most of the experts (52%) preferred the comprehensive concept map, while 35% preferred to present both solutions. The comprehensive food parenting practice conceptual map will provide the basis for developing a calibrated Item Response Modeling (IRM) item bank that can be used with computerized adaptive testing. Such an item bank will allow for more consistency in measuring food parenting practices across studies to better assess the impact of food parenting practices on child outcomes and the effect of interventions that target parents as agents of change.
Students' proficiency scores within multitrait item response theory

NASA Astrophysics Data System (ADS)

Scott, Terry F.; Schumayer, Daniel

2015-12-01

In this paper we present a series of item response models of data collected using the Force Concept Inventory. The Force Concept Inventory (FCI) was designed to poll the Newtonian conception of force viewed as a multidimensional concept, that is, as a complex of distinguishable conceptual dimensions. Several previous studies have developed single-trait item response models of FCI data; however, we feel that multidimensional models are also appropriate given the explicitly multidimensional design of the inventory. The models employed in the research reported here vary in both the number of fitting parameters and the number of underlying latent traits assumed. We calculate several model information statistics to ensure adequate model fit and to determine which of the models provides the optimal balance of information and parsimony. Our analysis indicates that all item response models tested, from the single-trait Rasch model through to a model with ten latent traits, satisfy the standard requirements of fit. However, analysis of model information criteria indicates that the five-trait model is optimal. We note that an earlier factor analysis of the same FCI data also led to a five-factor model. Furthermore the factors in our previous study and the traits identified in the current work match each other well. The optimal five-trait model assigns proficiency scores to all respondents for each of the five traits. We construct a correlation matrix between the proficiencies in each of these traits. This correlation matrix shows strong correlations between some proficiencies, and strong anticorrelations between others. We present an interpretation of this correlation matrix.
Social Loafing Construct Validity in Higher Education: How Well Do Three Measures of Social Loafing Stand up to Scrutiny?

ERIC Educational Resources Information Center

de l'Eau, Jacquelyn

2017-01-01

The purpose of this study was to examine the construct validity of social loafing using convergent and discriminant validity principles. Three instruments that purport to measure social loafing were factor analyzed: A ten-item instrument by George (1992), a 13-item instrument by Mulvey and Klein (1998), and a 22-item instrument by Jassawalla,…
Internet Gaming Disorder as a formative construct: Implications for conceptualization and measurement.

PubMed

van Rooij, Antonius J; Van Looy, Jan; Billieux, Joël

2017-07-01

Some people have serious problems controlling their Internet and video game use. The DSM-5 now includes a proposal for 'Internet Gaming Disorder' (IGD) as a condition in need of further study. Various studies aim to validate the proposed diagnostic criteria for IGD and multiple new scales have been introduced that cover the suggested criteria. Using a structured approach, we demonstrate that IGD might be better interpreted as a formative construct, as opposed to the current practice of conceptualizing it as a reflective construct. Incorrectly approaching a formative construct as a reflective one causes serious problems in scale development, including: (i) incorrect reliance on item-to-total scale correlation to exclude items and incorrectly relying on indices of inter-item reliability that do not fit the measurement model (e.g., Cronbach's α); (ii) incorrect interpretation of composite or mean scores that assume all items are equal in contributing value to a sum score; and (iii) biased estimation of model parameters in statistical models. We show that these issues are impacting current validation efforts through two recent examples. A reinterpretation of IGD as a formative construct has broad consequences for current validation efforts and provides opportunities to reanalyze existing data. We discuss three broad implications for current research: (i) composite latent constructs should be defined and used in models; (ii) item exclusion and selection should not rely on item-to-total scale correlations; and (iii) existing definitions of IGD should be enriched further. © 2016 The Authors. Psychiatry and Clinical Neurosciences © 2016 Japanese Society of Psychiatry and Neurology.
Measuring the environmental awareness of young farmers

NASA Astrophysics Data System (ADS)

Kountios, G.; Ragkos, A.; Padadavid, G.; Hadjimitsis, D.

2017-09-01

Young farmers in Europe, especially the beneficiaries of Common Agricultural Policy (CAP) funding schemes, are considered as the ones who could ensure the sustainability of the European Model of Agriculture. Economic efficiency and competitiveness, aversion of depopulation of rural areas and environmental protection constitute some of the key objectives of the CAP and young farmers are expected to play a role to all of them. This study proposes a way of measuring the potential of young farmers to contribute to the latter objectives of the CAP by estimating their environmental attitudes. Data from a questionnaire survey of 492 Greek young farmers were used to design a latent construct measuring their environmental attitudes. The latent construct was designed by means of an Explanatory Factor Analysis (EFA) using the responses to a set of 12 Likert-scale items. The results the EFA yielded a latent construct with three factors related to "Environmental pollution and policies (EPP)", "Environmental factors and food quality (EFF)" and "Farming practices and the environment". These results were validated through a CFA where 8 items in total were categorized in the three factors (latent variables). The utilization of the latent construct for the effective implementation of CAP measures could ameliorate the relationships of agriculture and environment in general.
The SATISPSY-22: development and validation of a French hospitalized patients' satisfaction questionnaire in psychiatry.

PubMed

Zendjidjian, X Y; Auquier, P; Lançon, C; Loundou, A; Parola, N; Faugère, M; Boyer, L

2015-01-01

The aim of our study was to develop a specific French self-administered instrument for measuring hospitalized patients' satisfaction in psychiatry based on exclusive patient point of view: the SATISPSY-22. The development of the SATISPSY was undertaken in three steps: item generation, item reduction, and validation. The content of the SATISPSY was derived from 80 interviews with patients hospitalized in psychiatry. Using item response and classical test theories, item reduction was performed in 2 hospitals on 270 responders. The validation was based on construct validity, reliability, and some aspects of external validity. The SATISPSY contains 22 items describing 6 dimensions (staff, quality of care, personal experience, information, activity, and food). The six-factor structure accounted for 78.0% of the total variance. Each item achieved the 0.40 standard for item-internal consistency, and the Cronbach's alpha coefficients were>0.70. Scores of dimensions were strongly positively correlated with Visual Analogue Scale scores. Significant associations with socioeconomic and clinical indicators showed good discriminant and external validity. INFIT statistics were ranged from 0.71 to 1.25. The SATISPSY-22 presents satisfactory psychometric properties, enabling patient feedback to be incorporated in a continuous quality health care improvement strategy. Copyright © 2014 Elsevier Masson SAS. All rights reserved.
The Curiosity and Exploration Inventory-II: Development, Factor Structure, and Psychometrics

PubMed Central

Kashdan, Todd B.; Gallagher, Matthew W.; Silvia, Paul J.; Winterstein, Beate P.; Breen, William E.; Terhar, Daniel; Steger, Michael F.

2009-01-01

Given curiosity’s fundamental role in motivation, learning, and well-being, we sought to refine the measurement of trait curiosity with an improved version of the Curiosity and Exploration Inventory (CEI; Kashdan, Rose, & Fincham, 2004). A preliminary pool of 36 items was administered to 311 undergraduate students, who also completed measures of emotion, emotion regulation, personality, and well-being. Factor analyses indicated a two factor model—motivation to seek out knowledge and new experiences (Stretching; 5 items) and a willingness to embrace the novel, uncertain, and unpredictable nature of everyday life (Embracing; 5 items). In two additional samples (ns = 150 and 119), we cross-validated this factor structure and provided initial evidence for construct validity. This includes positive correlations with personal growth, openness to experience, autonomy, purpose in life, self-acceptance, psychological flexibility, positive affect, and positive social relations, among others. Applying item response theory (IRT) to these samples (n = 578), we showed that the items have good discrimination and a desirable breadth of difficulty. The item information functions and test information function were centered near zero, indicating that the scale assesses the mid-range of the latent curiosity trait most reliably. The findings thus far provide good evidence for the psychometric properties of the 10-item CEI-II. PMID:20160913
[Cross-cultural adaptation and validation of the PROMIS Global Health scale in the Portuguese language].

PubMed

Zumpano, Camila Eugênia; Mendonça, Tânia Maria da Silva; Silva, Carlos Henrique Martins da; Correia, Helena; Arnold, Benjamin; Pinto, Rogério de Melo Costa

2017-01-23

This study aimed to perform the cross-cultural adaptation and validation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Global Health scale in the Portuguese language. The ten Global Health items were cross-culturally adapted by the method proposed in the Functional Assessment of Chronic Illness Therapy (FACIT). The instrument's final version in Portuguese was self-administered by 1,010 participants in Brazil. The scale's precision was verified by floor and ceiling effects analysis, reliability of internal consistency, and test-retest reliability. Exploratory and confirmatory factor analyses were used to assess the construct's validity and instrument's dimensionality. Calibration of the items used the Gradual Response Model proposed by Samejima. Four global items required adjustments after the pretest. Analysis of the psychometric properties showed that the Global Health scale has good reliability, with Cronbach's alpha of 0.83 and intra-class correlation of 0.89. Exploratory and confirmatory factor analyses showed good fit in the previously established two-dimensional model. The Global Physical Health and Global Mental Health scale showed good latent trait coverage according to the Gradual Response Model. The PROMIS Global Health items showed equivalence in Portuguese compared to the original version and satisfactory psychometric properties for application in clinical practice and research in the Brazilian population.
Validation of psychosocial scales for physical activity in university students

PubMed Central

Tassitano, Rafael Miranda; de Farias, José Cazuza; Rech, Cassiano Ricardo; Tenório, Maria Cecília Marinho; Cabral, Poliana Coelho; da Silva, Giselia Alves Pontes

2015-01-01

OBJECTIVE Translate the Patient-centered Assessment and Counseling for Exercise questionnaire, adapt it cross-culturally and identify the psychometric properties of the psychosocial scales for physical activity in young university students. METHODS The Patient-centered Assessment and Counseling for Exercise questionnaire is made up of 39 items divided into constructs based on the social cognitive theory and the transtheoretical model. The analyzed constructs were, as follows: behavior change strategy (15 items), decision-making process (10), self-efficacy (6), support from family (4), and support from friends (4). The validation procedures were conceptual, semantic, operational, and functional equivalences, in addition to the equivalence of the items and of measurements. The conceptual, of items and semantic equivalences were performed by a specialized committee. During measurement equivalence, the instrument was applied to 717 university students. Exploratory factor analysis was used to verify the loading of each item, explained variance and internal consistency of the constructs. Reproducibility was measured by means of intraclass correlation coefficient. RESULTS The two translations were equivalent and back-translation was similar to the original version, with few adaptations. The layout, presentation order of the constructs and items from the original version were kept in the same form as the original instrument. The sample size was adequate and was evaluated by the Kaiser-Meyer-Olkin test, with values between 0.72 and 0.91. The correlation matrix of the items presented r < 0.8 (p < 0.05). The factor loadings of the items from all the constructs were satisfactory (> 0.40), varying between 0.43 and 0.80, which explained between 45.4% and 59.0% of the variance. Internal consistency was satisfactory (α ≥ 0.70), with support from friends being 0.70 and 0.92 for self-efficacy. Most items (74.3%) presented values above 0.70 for the reproducibility test. CONCLUSIONS The validation process steps were considered satisfactory and adequate for applying to the population. PMID:26270013
Validation of psychosocial scales for physical activity in university students.

PubMed

Tassitano, Rafael Miranda; de Farias Júnior, José Cazuza; Rech, Cassiano Ricardo; Tenório, Maria Cecília Marinho; Cabral, Poliana Coelho; da Silva, Giselia Alves Pontes

2015-01-01

OBJECTIVE Translate the Patient-centered Assessment and Counseling for Exercise questionnaire, adapt it cross-culturally and identify the psychometric properties of the psychosocial scales for physical activity in young university students. METHODS The Patient-centered Assessment and Counseling for Exercise questionnaire is made up of 39 items divided into constructs based on the social cognitive theory and the transtheoretical model. The analyzed constructs were, as follows: behavior change strategy (15 items), decision-making process (10), self-efficacy (6), support from family (4), and support from friends (4). The validation procedures were conceptual, semantic, operational, and functional equivalences, in addition to the equivalence of the items and of measurements. The conceptual, of items and semantic equivalences were performed by a specialized committee. During measurement equivalence, the instrument was applied to 717 university students. Exploratory factor analysis was used to verify the loading of each item, explained variance and internal consistency of the constructs. Reproducibility was measured by means of intraclass correlation coefficient. RESULTS The two translations were equivalent and back-translation was similar to the original version, with few adaptations. The layout, presentation order of the constructs and items from the original version were kept in the same form as the original instrument. The sample size was adequate and was evaluated by the Kaiser-Meyer-Olkin test, with values between 0.72 and 0.91. The correlation matrix of the items presented r < 0.8 (p < 0.05). The factor loadings of the items from all the constructs were satisfactory (> 0.40), varying between 0.43 and 0.80, which explained between 45.4% and 59.0% of the variance. Internal consistency was satisfactory (α ≥ 0.70), with support from friends being 0.70 and 0.92 for self-efficacy. Most items (74.3%) presented values above 0.70 for the reproducibility test. CONCLUSIONS The validation process steps were considered satisfactory and adequate for applying to the population.
Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

ERIC Educational Resources Information Center

Arendasy, Martin E.; Sommer, Markus

2012-01-01

The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…
Rapid and Accurate Behavioral Health Diagnostic Screening: Initial Validation Study of a Web-Based, Self-Report Tool (the SAGE-SR)

PubMed Central

Purcell, Susan E; Rhea, Karen; Maier, Philip; First, Michael; Zweede, Lisa; Sinisterra, Manuela; Nunn, M Brad; Austin, Marie-Paule; Brodey, Inger S

2018-01-01

Background The Structured Clinical Interview for DSM (SCID) is considered the gold standard assessment for accurate, reliable psychiatric diagnoses; however, because of its length, complexity, and training required, the SCID is rarely used outside of research. Objective This paper aims to describe the development and initial validation of a Web-based, self-report screening instrument (the Screening Assessment for Guiding Evaluation-Self-Report, SAGE-SR) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and the SCID-5-Clinician Version (CV) intended to make accurate, broad-based behavioral health diagnostic screening more accessible within clinical care. Methods First, study staff drafted approximately 1200 self-report items representing individual granular symptoms in the diagnostic criteria for the 8 primary SCID-CV modules. An expert panel iteratively reviewed, critiqued, and revised items. The resulting items were iteratively administered and revised through 3 rounds of cognitive interviewing with community mental health center participants. In the first 2 rounds, the SCID was also administered to participants to directly compare their Likert self-report and SCID responses. A second expert panel evaluated the final pool of items from cognitive interviewing and criteria in the DSM-5 to construct the SAGE-SR, a computerized adaptive instrument that uses branching logic from a screener section to administer appropriate follow-up questions to refine the differential diagnoses. The SAGE-SR was administered to healthy controls and outpatient mental health clinic clients to assess test duration and test-retest reliability. Cutoff scores for screening into follow-up diagnostic sections and criteria for inclusion of diagnoses in the differential diagnosis were evaluated. Results The expert panel reduced the initial 1200 test items to 664 items that panel members agreed collectively represented the SCID items from the 8 targeted modules and DSM criteria for the covered diagnoses. These 664 items were iteratively submitted to 3 rounds of cognitive interviewing with 50 community mental health center participants; the expert panel reviewed session summaries and agreed on a final set of 661 clear and concise self-report items representing the desired criteria in the DSM-5. The SAGE-SR constructed from this item pool took an average of 14 min to complete in a nonclinical sample versus 24 min in a clinical sample. Responses to individual items can be combined to generate DSM criteria endorsements and differential diagnoses, as well as provide indices of individual symptom severity. Preliminary measures of test-retest reliability in a small, nonclinical sample were promising, with good to excellent reliability for screener items in 11 of 13 diagnostic screening modules (intraclass correlation coefficient [ICC] or kappa coefficients ranging from .60 to .90), with mania achieving fair test-retest reliability (ICC=.50) and other substance use endorsed too infrequently for analysis. Conclusions The SAGE-SR is a computerized adaptive self-report instrument designed to provide rigorous differential diagnostic information to clinicians. PMID:29572204

Development of the PROMIS health expectancies of smoking item banks.

PubMed

Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cerully, Jennifer; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

Smokers' health-related outcome expectancies are associated with a number of important constructs in smoking research, yet there are no measures currently available that focus exclusively on this domain. This paper describes the development and evaluation of item banks for assessing the health expectancies of smoking. Using data from a sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of health expectancies items for daily and nondaily smokers. We also evaluated the performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess health expectancies. A total of 24 items were included in the Health Expectancies item banks; 13 items are common across daily and nondaily smokers, 6 are unique to daily, and 5 are unique to nondaily. For both daily and nondaily smokers, the Health Expectancies item banks are unidimensional, reliable (reliability = 0.95 and 0.96, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.87). Results from simulated CATs showed that health expectancies can be assessed with good precision with an average of 5-6 items adaptively selected from the item banks. Health expectancies of smoking can be assessed on the basis of these item banks via SFs, CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of the PROMIS nicotine dependence item banks.

PubMed

Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

2014-09-01

Nicotine dependence is a core construct important for understanding cigarette smoking and smoking cessation behavior. This article describes analyses conducted to develop and evaluate item banks for assessing nicotine dependence among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of nicotine dependence items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess dependence. A total of 32 items were included in the Nicotine Dependence item banks; 22 items are common across daily and nondaily smokers, 5 are unique to daily smokers, and 5 are unique to nondaily smokers. For both daily and nondaily smokers, the Nicotine Dependence item banks are strongly unidimensional, highly reliable (reliability = 0.97 and 0.97, respectively), and perform similarly across gender, age, and race/ethnicity groups. SFs common to daily and nondaily smokers consist of 8 and 4 items (reliability = 0.91 and 0.81, respectively). Results from simulated CATs showed that dependence can be assessed with very good precision for most respondents using fewer than 6 items adaptively selected from the item banks. Nicotine dependence on cigarettes can be assessed on the basis of these item banks via one of the SFs, by using CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of a quality of life instrument for children with advanced cancer: the pediatric advanced care quality of life scale (PAC-QoL).

PubMed

Cataudella, Danielle; Morley, Tara Elise; Nesin, April; Fernandez, Conrad V; Johnston, Donna Lynn; Sung, Lillian; Zelcer, Shayna

2014-10-01

There is currently no published, validated measures available that comprehensively capture quality of life (QoL) symptoms for children with poor-prognosis malignancies. The pediatric advanced care-quality of life scale (PAC-QoL) has been developed to address this gap. The current paper describes the first two phases in the development of this measure. The first two phases included: (1) construct and item generation, and (2) preliminary content validation. Domains of QoL relevant to this population were identified from the literature and items generated to capture each; items were then adapted to create versions sensitive to age/developmental differences. Two types of experts reviewed the draft PAC-QoL and rated items for relevance, understandability, and sensitivity of wording: bereaved parents (n = 8) and health care professionals (HCP; n = 7). Content validity was calculated using the index of content validity (CVI [Lynn. Nurs Res 1986;35:382-385]). One hundred and forty-one candidate items congruent with the domains identified as relevant to children with advanced malignancies were generated, and four report versions with a 5-choice response scale created. Parent mean scores for importance, understandability, and sensitivity of wording ranged from 4.29 (SD = 0.52) to 4.66 (SD = 0.50). The CVI ranged from 95% to 100%. These steps resulted in reductions of the PAC-QoL to 57-65 items, as well as a modification of the response scale to a 4-choice option with new anchors. The next phase of this study will be to conduct cognitive probing with the intended population to further modify and reduce candidate items prior to psychometric evaluation. © 2014 Wiley Periodicals, Inc.
Measuring cancer caregiver health literacy: Validation of the Health Literacy of Caregivers Scale-Cancer (HLCS-C) in an Australian population.

PubMed

Yuen, Eva; Knight, Tess; Dodson, Sarity; Chirgwin, Jacqueline; Busija, Lucy; Ricciardelli, Lina A; Burney, Susan; Parente, Phillip; Livingston, Patricia M

2018-05-01

Caregivers have been largely neglected in health literacy measurement. We assess the construct validity, and internal consistency of the Health Literacy of Caregivers Scale-Cancer (HLCS-C), and present a revised, psychometrically robust scale. Using data from 297 cancer caregivers (12.4% response rate) recruited from Melbourne, Australia between January-July 2014, confirmatory factor analysis (CFA) was conducted to evaluate the HLCS-C's proposed factor structure. Items were evaluated for: item difficulty, unidimensionality and overall item fit within their domain. Item-threshold-ordering was examined though one-parameter Item Response Theory models. Internal consistency was assessed using Raykov's reliability coefficient. CFA results identified 42 poorly performing/redundant items which were subsequently removed. A 10-factor model was fitted to 46 acceptable items with no correlated residuals or factor cross-loadings accepted. Adequate fit was revealed (χ 2 WLSMV = 1463.807[df = 944], p < .001, RMSEA = 0.043, CFI = 0.980, TLI = 0.978, WRMR = 1.00). Ten domains were identified: Proactivity and determination to seek information; Adequate information about cancer and cancer management; Supported by healthcare providers (HCP) to understand information; Social support; Cancer-related communication with the care recipient (CR); Understanding CR needs and preferences; Self-care; Understanding the healthcare system; Capacity to process health information; and Active engagement with HCP. Internal consistency was adequate across domains (0.78-0.92). The revised HLCS-C demonstrated good structural, convergent, and discriminant validity, and high internal consistency. The scale may be useful for the development and evaluation of caregiver interventions. © 2017 John Wiley & Sons Ltd.
Developing an item bank and short forms that assess the impact of asthma on quality of life.

PubMed

Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena

2014-02-01

The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. Copyright © 2013 Elsevier Ltd. All rights reserved.
Reliability and construct validity of the College Student Stress Scale.

PubMed

Feldt, Ronald C; Koch, Chris

2011-04-01

Reliability and construct validity of the 11-item College Student Stress Scale were investigated with exploratory (N = 273) and confirmatory factor analyses (N = 185) in undergraduate college students. Two factors were observed; however, reliability of the 3-item factor was too low and one item failed to load on either factor. A 7-item measure (Factor 1) had acceptable reliability (.81) and good convergence with the Perceived Stress Scale. This measure was significantly correlated with Neuroticism, Test Anxiety, and Self-efficacy for Learning, but not Social Desirability or age.
The use of cognitive ability measures as explanatory variables in regression analysis.

PubMed

Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J

2012-12-01

Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual's wage, or a decision such as an individual's education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score , constructed via standard psychometric practice from individuals' responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a "mixed effects structural equations" (MESE) model, may be more appropriate in many circumstances.
The validation of the Supervision of Thesis Questionnaire (STQ).

PubMed

Henricson, Maria; Fridlund, Bengt; Mårtensson, Jan; Hedberg, Berith

2018-06-01

The supervision process is characterized by differences between the supervisors' and the students' expectations before the start of writing a bachelor thesis as well as after its completion. A review of the literature did not reveal any scientifically tested questionnaire for evaluating nursing students' expectations of the supervision process when writing a bachelor thesis. The aim of the study was to determine the construct validity and internal consistency reliability of a questionnaire for measuring nursing students' expectations of the bachelor thesis supervision process. The study had a developmental and methodological design carried out in four steps including construct validity and internal consistency reliability statistical procedures: construction of the items, assessment of face validity, data collection and data analysis. This study was conducted at a university in southern Sweden, where students on the "Nursing student thesis, 15 ECTS" course were consecutively selected for participation. Of the 512 questionnaires distributed, 327 were returned, a response rate of 64%. Five factors with a total variance of 74% and good communalities, ≥0.64, were extracted from the 10-item STQ. The internal consistency of the 10 items was 0.68. The five factors were labelled: The nature of the supervision process, The supervisor's role as a coach, The students' progression to self-support, The interaction between students and supervisor and supervisor competence. A didactic, useful and secure questionnaire measuring nursing students' expectations of the bachelor thesis supervision process based on three main forms of supervision was created. Copyright © 2018 Elsevier Ltd. All rights reserved.
Development and Validation of a Multimedia-based Assessment of Scientific Inquiry Abilities

NASA Astrophysics Data System (ADS)

Kuo, Che-Yu; Wu, Hsin-Kai; Jen, Tsung-Hau; Hsu, Ying-Shao

2015-09-01

The potential of computer-based assessments for capturing complex learning outcomes has been discussed; however, relatively little is understood about how to leverage such potential for summative and accountability purposes. The aim of this study is to develop and validate a multimedia-based assessment of scientific inquiry abilities (MASIA) to cover a more comprehensive construct of inquiry abilities and target secondary school students in different grades while this potential is leveraged. We implemented five steps derived from the construct modeling approach to design MASIA. During the implementation, multiple sources of evidence were collected in the steps of pilot testing and Rasch modeling to support the validity of MASIA. Particularly, through the participation of 1,066 8th and 11th graders, MASIA showed satisfactory psychometric properties to discriminate students with different levels of inquiry abilities in 101 items in 29 tasks when Rasch models were applied. Additionally, the Wright map indicated that MASIA offered accurate information about students' inquiry abilities because of the comparability of the distributions of student abilities and item difficulties. The analysis results also suggested that MASIA offered precise measures of inquiry abilities when the components (questioning, experimenting, analyzing, and explaining) were regarded as a coherent construct. Finally, the increased mean difficulty thresholds of item responses along with three performance levels across all sub-abilities supported the alignment between our scoring rubrics and our inquiry framework. Together with other sources of validity in the pilot testing, the results offered evidence to support the validity of MASIA.
A PROMIS Measure of Neuropathic Pain Quality

PubMed Central

Askew, Robert L.; Cook, Karon F.; Keefe, Francis J.; Nowinski, Cindy J; Cella, David; Revicki, Dennis A.; DeWitt, Esi M. Morgan; Michaud, Kaleb; Trence, Dace L.; Amtmann, Dagmar

2016-01-01

Objectives Neuropathic pain is a consequence of many chronic conditions. This study aimed to develop a unidimensional neuropathic pain scale whose scores represent levels of neuropathic pain and distinguish between individuals with neuropathic and non-neuropathic pain conditions. Methods A candidate item pool of 42 pain quality descriptors was administered to participants with osteoarthritis, rheumatoid arthritis, diabetic neuropathy, and cancer chemotherapy-induced peripheral neuropathy. A subset of pain quality descriptors (items) that best distinguished between participants with and those without neuropathic pain conditions were identified. Dimensionality of pain descriptors was evaluated in a development sample and cross-validated in a hold-out sample. Item responses were calibrated using an item response theory model, and scores were generated on a T-score metric. Neuropathic pain scale scores were evaluated in terms of reliability, validity, and the ability to distinguish between participants with and without conditions typically associated with neuropathic pain. Results Of the 42 initial items, 5 were identified for the Patient Reported Outcome Measurement Information System (PROMIS) Neuropathic Pain Quality scale (PROMIS-PQ-Neuro). The IRT-generated T-scores exhibited good discriminatory ability based on receiver operator characteristic analysis. Score thresholds were identified that optimize sensitivity and specificity. Construct, criterion, and discriminant validity, and reliability of scale scores were supported. Conclusions The 5-item PROMIS PQ-Neuro is a short and practical measure that can be used to identify patients more likely to have neuropathic pain and to distinguish levels of neuropathic pain. The data collected will support future research that targets other unidimensional pain quality domains (e.g., nociceptive pain). PMID:27565279
Culture Kits for the Elementary Classroom.

ERIC Educational Resources Information Center

Hickey, M. Gail

1997-01-01

Outlines an instructional unit where students construct culture kits illustrating a specific culture. Culture kits are constructed out of realia and other material including maps, travel brochures, photographs, newspapers, souvenirs, and other items. Discusses collecting these items and possible multicultural applications. (MJP)
A Computer-Adaptive Disability Instrument for Lower Extremity Osteoarthritis Research Demonstrated Promising Breadth, Precision and Reliability

PubMed Central

Jette, Alan M.; McDonough, Christine M.; Haley, Stephen M.; Ni, Pengsheng; Olarsch, Sippy; Latham, Nancy; Hambleton, Ronald K.; Felson, David; Kim, Young-jo; Hunter, David

2012-01-01

Objective To develop and evaluate a prototype measure (OA-DISABILITY-CAT) for osteoarthritis research using Item Response Theory (IRT) and Computer Adaptive Test (CAT) methodologies. Study Design and Setting We constructed an item bank consisting of 33 activities commonly affected by lower extremity (LE) osteoarthritis. A sample of 323 adults with LE osteoarthritis reported their degree of limitation in performing everyday activities and completed the Health Assessment Questionnaire-II (HAQ-II). We used confirmatory factor analyses to assess scale unidimensionality and IRT methods to calibrate the items and examine the fit of the data. Using CAT simulation analyses, we examined the performance of OA-DISABILITY-CATs of different lengths compared to the full item bank and the HAQ-II. Results One distinct disability domain was identified. The 10-item OA-DISABILITY-CAT demonstrated a high degree of accuracy compared with the full item bank (r=0.99). The item bank and the HAQ-II scales covered a similar estimated scoring range. In terms of reliability, 95% of OA-DISABILITY reliability estimates were over 0.83 versus 0.60 for the HAQ-II. Except at the highest scores the 10-item OA-DISABILITY-CAT demonstrated superior precision to the HAQ-II. Conclusion The prototype OA-DISABILITY-CAT demonstrated promising measurement properties compared to the HAQ-II, and is recommended for use in LE osteoarthritis research. PMID:19216052
Construct Validation of Three Nutrition Questions Using Health and Diet Ratings in Older Canadian Males Living in the Community.

PubMed

Akhtar, Usman; Keller, Heather H; Tate, Robert B; Lengyel, Christina O

2015-12-01

Brief nutrition screening tools are desired for research and practice. Seniors in the Community: Risk Evaluation for Eating and Nutrition (SCREEN-II, 14 items) and the abbreviated version SCREEN-II-AB (8 items) are valid and reliable nutrition screening tools for older adults. This exploratory study used a retrospective cross-sectional design to determine the construct validity of a subset of 3 items (weight loss, appetite, and swallowing difficulty) currently on the SCREEN-II and SCREEN-II-AB tools. Secondary data on community-dwelling senior males (n = 522, mean ± SD age = 86.7 ± 3.0 years) in the Manitoba Follow-up Study (MFUS) study were available for analysis. Participants completed the mailed MFUS Nutrition Survey that included SCREEN-II items and questions pertaining to self-rated health, diet healthiness, and rating of the importance of nutrition towards successful aging as the constructs for comparison. Self-perceived health status (F = 14.7, P < 0.001), diet healthiness (ρ = 0.17, P = 0.002) and the rating of nutrition's importance to aging (ρ = 0.10, P = 0.03) were correlated with the 3-item score. Inferences were consistent with associations between these construct variables and the full SCREEN-II. Three items from SCREEN-II and SCREEN-II-AB demonstrate initial construct validity with self-perceived health status and diet healthiness ratings by older males; further exploration for criterion and predictive validity in more diverse samples is needed.
Investigation of Item-Pair Presentation and Construct Validity of the Navy Computer Adaptive Personality Scales (NCAPS)

DTIC Science & Technology

2006-10-01

Investigation of Item-Pair Presentation and Construct Validity of the Navy Computer Adaptive Personality Scales ( NCAPS ) Christina M. Underhill, Ph.D...Construct Validity of the Navy Computer Adaptive Personality Scales ( NCAPS ) Christina M. Underhill, Ph.D. Reviewed and Approved by Jacqueline A. Mottern...and Construct Validity of the Navy Computer Adaptive Personality Scales ( NCAPS ) 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 0602236N and 0603236N 6
Psychometric properties of the communication Confidence Rating Scale for Aphasia (CCRSA): phase 1.

PubMed

Cherney, Leora R; Babbitt, Edna M; Semik, Patrick; Heinemann, Allen W

2011-01-01

Confidence is a construct that has not been explored previously in aphasia research. We developed the Communication Confidence Rating Scale for Aphasia (CCRSA) to assess confidence in communicating in a variety of activities and evaluated its psychometric properties using rating scale (Rasch) analysis. The CCRSA was administered to 21 individuals with aphasia before and after participation in a computer-based language therapy study. Person reliability of the 8-item CCRSA was .77. The 5-category rating scale demonstrated monotonic increases in average measures from low to high ratings. However, one item ("I follow news, sports, stories on TV/movies") misfit the construct defined by the other items (mean square infit = 1.69, item-measure correlation = .41). Deleting this item improved reliability to .79; the 7 remaining items demonstrated excellent fit to the underlying construct, although there was a modest ceiling effect in this sample. Pre- to posttreatment changes on the 7-item CCRSA measure were statistically significant using a paired samples t test. Findings support the reliability and sensitivity of the CCRSA in assessing participants' self-report of communication confidence. Further evaluation of communication confidence is required with larger and more diverse samples.
Mapping an HIV/STD prevention curriculum for Zambian in-school settings.

PubMed

Mpofu, Elias; Lawrence, Frank; Ngoma, Mary Shilalukey; Siziya, Seter; Malungo, Jacob R S

2008-04-01

HIV/AIDS poses grave risk to human development in sub-Saharan Africa. Evidence-based interventions that are rooted in local culture could help efforts to prevent threats to human development from HIV/AIDS. We used concept mapping (Concept System, 2006 ) to construct the components and content of a locally developed HIV/AIDS curriculum for use by secondary schools in Lusaka, Zambia. Participants were school counsellors (n = 14), youth health program officers (n = 7), and regular education teachers (n = 3) from the education, health, and youth development agencies in Lusaka, Zambia (males = 11; females = 13; mean age 38; SD = 15 years). Concept mapping yielded six statement clusters defining preliminary components of a locally grounded in-school HIV/AIDS prevention curriculum and the content items that define these components: (1) life skills education (18 items), (2) sexuality and reproductive health (10 items), (3) treatment, care and support (13 items), (4) counselling (12 items), (5) basic facts about HIV/AIDS (11 items), and (6) dissemination of information about HIV/AIDS (11 items). Zambian locally constructed constructs for an HIV/STD prevention curriculum overlap those promoted by public health programs in the country and internationally.
Estimating procedure for major highway construction bid item cost : final report.

DOT National Transportation Integrated Search

1978-06-01

The present procedure for estimating construction bid item cost makes use of the quarterly weighted average unit price report coupled with engineering judgement. The limitation to this method is that this report format provides only the lowest bid da...
Toward A Theory of Construct Definition.

ERIC Educational Resources Information Center

Stenner, A. Jackson; And Others

1983-01-01

In an attempt to restore the symmetry and balance between the study of person and item variation, this paper presents a novel methodology construct specification equations, which allows one to ascertain from the lawful behavior of items what an instrument is measuring. (Author/PN)
Development and Validity Testing of an Arthritis Self-Management Assessment Tool.

PubMed

Oh, HyunSoo; Han, SunYoung; Kim, SooHyun; Seo, WhaSook

Because of the chronic, progressive nature of arthritis and the substantial effects it has on quality of life, patients may benefit from self-management. However, no valid, reliable self-management assessment tool has been devised for patients with arthritis. This study was conducted to develop a comprehensive self-management assessment tool for patients with arthritis, that is, the Arthritis Self-Management Assessment Tool (ASMAT). To develop a list of qualified items corresponding to the conceptual definitions and attributes of arthritis self-management, a measurement model was established on the basis of theoretical and empirical foundations. Content validity testing was conducted to evaluate whether listed items were suitable for assessing arthritis self-management. Construct validity and reliability of the ASMAT were tested. Construct validity was examined using confirmatory factor analysis and nomological validity. The 32-item ASMAT was developed with a sample composed of patients in a clinic in South Korea. Content validity testing validated the 32 items, which comprised medical (10 items), behavioral (13 items), and psychoemotional (9 items) management subscales. Construct validity testing of the ASMAT showed that the 32 items properly corresponded with conceptual constructs of arthritis self-management, and were suitable for assessing self-management ability in patients with arthritis. Reliability was also well supported. The ASMAT devised in the present study may aid the evaluation of patient self-management ability and the effectiveness of self-management interventions. The authors believe the developed tool may also aid the identification of problems associated with the adoption of self-management practice, and thus improve symptom management, independence, and quality of life of patients with arthritis.
Teacher Learning of Technology Enhanced Formative Assessment

NASA Astrophysics Data System (ADS)

Feldman, Allan; Capobianco, Brenda M.

2008-02-01

This study examined the integration of technology enhanced formative assessment (FA) into teachers' practice. Participants were high school physics teachers interested in improving their use of a classroom response system (CRS) to promote FA. Data were collected using interviews, direct classroom observations, and collaborative discussions. The physics teachers engaged in collaborative action research (AR) to learn how to use FA and CRS to promote student and teacher learning. Data were analyzed using open coding, cross-case analysis, and content analysis. Results from data analysis allowed researchers to construct a model for knowledge skills necessary for the integration of technology enhanced FA into teachers' practice. The model is as a set of four technologies: hardware and software; methods for constructing FA items; pedagogical methods; and curriculum integration. The model is grounded in the idea that teachers must develop these respective technologies as they interact with the CRS (i.e., hardware and software, item construction) and their existing practice (i.e., pedagogical methods, curriculum). Implications are that for teachers to make FA an integral part of their practice using CRS, they must: 1) engage in the four technologies; 2) understand the nature of FA; and 3) collaborate with other interested teachers through AR.

Work ability as prognostic risk marker of disability pension: single-item work ability score versus multi-item work ability index.

PubMed

Roelen, Corné A M; van Rhenen, Willem; Groothoff, Johan W; van der Klink, Jac J L; Twisk, Jos W R; Heymans, Martijn W

2014-07-01

Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP. This prospective cohort study comprised 11 537 male construction workers, who completed the WAI at baseline and reported DP after a mean 2.3 years of follow-up. WAS and WAI were calibrated for DP risk predictions with the Hosmer-Lemeshow (H-L) test and their ability to discriminate between high- and low-risk construction workers was investigated with the area under the receiver operating characteristic curve (AUC). At follow-up, 336 (3%) construction workers reported DP. Both WAS [odds ratio (OR) 0.72, 95% confidence interval (95% CI) 0.66-0.78] and WAI (OR 0.57, 95% CI 0.52-0.63) scores were associated with DP at follow-up. The WAS showed miscalibration (H-L model χ (�)=10.60; df=3; P=0.01) and poorly discriminated between high- and low-risk construction workers (AUC 0.67, 95% CI 0.64-0.70). In contrast, calibration (H-L model χ �=8.20; df=8; P=0.41) and discrimination (AUC 0.78, 95% CI 0.75-0.80) were both adequate for the WAI. Although associated with the risk of future DP, the single-item WAS poorly identified male construction workers at risk of DP. We recommend using the multi-item WAI to screen for risk of DP in occupational health practice.
Examining Player Anger in World of Warcraft

NASA Astrophysics Data System (ADS)

Barnett, Jane; Coulson, Mark; Foreman, Nigel

This questionnaire study of the sources of anger in World of Warcraft applies classical quantitative measurement scale construction to a new problem, generating a host of questionnaire items that could find use in future studies, and identifying four major categories of events that cause negative effect among players. First, 33 players provided examples of in-game scenarios that had made them angry, and their responses were culled to create a 93-item battery rated by hundreds of player respondents in terms of anger intensity and anger frequency. An iterative process of factor analysis and scale reliability assessment led to a 28-item instrument measuring four anger-provoking factors: Raids/Instances, Griefers, Perceived Time Wasting, and Anti-social Players. These anger-causing scenarios were then illustrated by concrete examples from player and researcher experiences in World of Warcraft. One striking finding is that players become angry at other players' negative behavior, regardless of whether that behavior was intended to harm.
Development and evaluation of the PI-G: a three-scale measure based on the German translation of the PROMIS ® pain interference item bank.

PubMed

Farin, Erik; Nagl, Michaela; Gramm, Lukas; Heyduck, Katja; Glattacker, Manuela

2014-05-01

Study aim was to translate the PROMIS(®) pain interference (PI) item bank (41 items) into German, test its psychometric properties in patients with chronic low back pain and develop static subforms. We surveyed N = 262 patients undergoing rehabilitation who were asked to fill out questionnaires at the beginning and 2 weeks after the end of rehabilitation, applying the Oswestry Disability Index (ODI) and Pain Disability Index (PDI) in addition to the PROMIS(®) PI items. For psychometric testing, a 1-parameter item response theory (IRT) model was used. Exploratory and confirmatory factor analyses as well as reliability and construct validity analyses were conducted. The assumptions regarding IRT scaling of the translated PROMIS(®) PI item bank as a whole were not confirmed. However, we succeeded in devising three static subforms (PI-G scales: PI mental 13 items, PI functional 11 items, PI physical 4 items), revealing good psychometric properties. The PI-G scales in their static form can be recommended for use in German-speaking countries. Their strengths versus the ODI and PDI are that pain interference is assessed in a differentiated manner and that several psychometric values are somewhat better than those associated with the ODI and PDI (distribution properties, IRT model fit, reliability). To develop an IRT-scaled item bank of the German translations of the PROMIS(®) PI items, it would be useful to have additional studies (e.g., with larger sample sizes and using a 2-parameter IRT model).
[Construction and validation of a tool for the evaluation of environmental risks and limitations to the manual handling of loads: cross-sectional study].

PubMed

Galeoto, G; Sili, A; Tamburlani, M; Farina, M; Mannocci, A; Mollica, R; Servadio, A

2017-01-01

The manual handling of loads has a strong impact on many types of work. All health professionals, due to their job, are subjected to a high risk of disease from the manual handling of loads. The purpose of our work has been therefore the construction and the validation of a specific tool for the evaluation of both environmental risks and individual limitations of the manual handling of loads / patients. The questionnaire we created is composed of two main sections: the first section includes the registry card of the operator personal data while the second section, consisting of eleven items it is further organized into two sections/parts. The first part consists of four items about environmental risk factors, while the second part consists of seven items about generic limitations and the assessment of pain from manual handling of loads. The operators'health nurses, including those ones with a coordination responsibility, that are available in the structure are 704 while the response rate to the questionnaire was of 93.18%. The test-retest showed optimal values of the intra-class correlation coefficient (0.843) so demonstrating the absence of measurement errors in the two administrations. The values related to the internal consistency of the two sections of the questionnaire were greater than 0.80that also demonstrated the internal stability of the questionnaire. The tool we described therefore is to be intended as a means of assessment for environmental risks, restrictions on movement of loads and pain associated with the task.
Patient experiences questionnaire for interdisciplinary treatment for substance dependence (PEQ-ITSD): reliability and validity following a national survey in Norway.

PubMed

Haugum, Mona; Iversen, Hilde Hestad; Bjertnaes, Oyvind; Lindahl, Anne Karin

2017-02-20

Patient experiences are an important aspect of health care quality, but there is a lack of validated instruments for their measurement in the substance dependence literature. A new questionnaire to measure inpatients' experiences of interdisciplinary treatment for substance dependence has been developed in Norway. The aim of this study was to psychometrically test the new questionnaire, using data from a national survey in 2013. The questionnaire was developed based on a literature review, qualitative interviews with patients, expert group discussions and pretesting. Data were collected in a national survey covering all residential facilities with inpatients in treatment for substance dependence in 2013. Data quality and psychometric properties were assessed, including ceiling effects, item missing, exploratory factor analysis, and tests of internal consistency reliability, test-retest reliability and construct validity. The sample included 978 inpatients present at 98 residential institutions. After correcting for excluded patients (n = 175), the response rate was 91.4%. 28 out of 33 items had less than 20.5% of missing data or replies in the "not applicable" category. All but one item met the ceiling effect criterion of less than 50.0% of the responses in the most favorable category. Exploratory factor analysis resulted in three scales: "treatment and personnel", "milieu" and "outcome". All scales showed satisfactory internal consistency reliability (Cronbach's alpha ranged from 0.75-0.91) and test-retest reliability (ICC ranged from 0.82-0.85). 17 of 18 significant associations between single variables and the scales supported construct validity of the PEQ-ITSD. The content validity of the PEQ-ITSD was secured by a literature review, consultations with an expert group and qualitative interviews with patients. The PEQ-ITSD was used in a national survey in Norway in 2013 and psychometric testing showed that the instrument had satisfactory internal consistency reliability and construct validity.
Is the Parkinson Anxiety Scale comparable across raters?

PubMed

Forjaz, Maria João; Ayala, Alba; Martinez-Martin, Pablo; Dujardin, Kathy; Pontone, Gregory M; Starkstein, Sergio E; Weintraub, Daniel; Leentjens, Albert F G

2015-04-01

The Parkinson Anxiety Scale is a new scale developed to measure anxiety severity in Parkinson's disease specifically. It consists of three dimensions: persistent anxiety, episodic anxiety, and avoidance behavior. This study aimed to assess the measurement properties of the scale while controlling for the rater (self- vs. clinician-rated) effect. The Parkinson Anxiety Scale was administered to a cross-sectional multicenter international sample of 362 Parkinson's disease patients. Both patients and clinicians rated the patient's anxiety independently. A many-facet Rasch model design was applied to estimate and remove the rater effect. The following measurement properties were assessed: fit to the Rasch model, unidimensionality, reliability, differential item functioning, item local independency, interrater reliability (self or clinician), and scale targeting. In addition, test-retest stability, construct validity, precision, and diagnostic properties of the Parkinson Anxiety Scale were also analyzed. A good fit to the Rasch model was obtained for Parkinson Anxiety Scale dimensions A and B, after the removal of one item and rescoring of the response scale for certain items, whereas dimension C showed marginal fit. Self versus clinician rating differences were of small magnitude, with patients reporting higher anxiety levels than clinicians. The linear measure for Parkinson Anxiety Scale dimensions A and B showed good convergent construct with other anxiety measures and good diagnostic properties. Parkinson Anxiety Scale modified dimensions A and B provide valid and reliable measures of anxiety in Parkinson's disease that are comparable across raters. Further studies are needed with dimension C. © 2014 International Parkinson and Movement Disorder Society.
Psychometric Properties of the Cognitive Emotion Regulation Questionnaire (CERQ) in Patients with Fibromyalgia Syndrome.

PubMed

Feliu-Soler, Albert; Reche-Camba, Elvira; Borràs, Xavier; Pérez-Aranda, Adrián; Andrés-Rodríguez, Laura; Peñarrubia-María, María T; Navarro-Gil, Mayte; García-Campayo, Javier; Bellón, Juan A; Luciano, Juan V

2017-01-01

Given that Fibromyalgia Syndrome (FMS) is associated with problems in emotion regulation, the importance of assessing this construct is widely acknowledged by clinical psychologists and pain specialists. Although the Cognitive Emotion Regulation Questionnaire (CERQ) is a self-report measure used worldwide, there are no data on its psychometric properties in patients with FMS. This study analyzed the dimensionality, reliability, and validity of the CERQ in a sample of 231 patients with FMS. Given that "fibrofog" is one of the most disabling FMS symptoms, in the present study, items in the CERQ were grouped by dimension. This change in item presentation was conceived as an efficient way of facilitating responses as a result of a clear understanding of what the items related to each dimension are attempting to measure. The following battery of measures was administered: the CERQ, the Revised Fibromyalgia Impact Questionnaire, the Pain Catastrophizing Scale, the Center for Epidemiologic Studies Depression Scale, and the State-Trait Anxiety Inventory. Four models of the CERQ structure were examined and confirmatory factor analyses supported the original factor model, consisting of nine factors-Self-blame, Acceptance, Rumination, Positive refocusing, Refocus on planning, Positive reappraisal, Putting into perspective, Catastrophizing, and Other-blame. There was minimal overlap between CERQ subscales and their internal consistency was adequate. Correlational and regression analyses supported the construct validity of the CERQ. Our findings indicate that the CERQ (items-grouped version) is a sound instrument for assessing cognitive emotion regulation in patients with FMS.
Psychometric Properties of the Cognitive Emotion Regulation Questionnaire (CERQ) in Patients with Fibromyalgia Syndrome

PubMed Central

Feliu-Soler, Albert; Reche-Camba, Elvira; Borràs, Xavier; Pérez-Aranda, Adrián; Andrés-Rodríguez, Laura; Peñarrubia-María, María T.; Navarro-Gil, Mayte; García-Campayo, Javier; Bellón, Juan A.; Luciano, Juan V.

2017-01-01

Given that Fibromyalgia Syndrome (FMS) is associated with problems in emotion regulation, the importance of assessing this construct is widely acknowledged by clinical psychologists and pain specialists. Although the Cognitive Emotion Regulation Questionnaire (CERQ) is a self-report measure used worldwide, there are no data on its psychometric properties in patients with FMS. This study analyzed the dimensionality, reliability, and validity of the CERQ in a sample of 231 patients with FMS. Given that “fibrofog” is one of the most disabling FMS symptoms, in the present study, items in the CERQ were grouped by dimension. This change in item presentation was conceived as an efficient way of facilitating responses as a result of a clear understanding of what the items related to each dimension are attempting to measure. The following battery of measures was administered: the CERQ, the Revised Fibromyalgia Impact Questionnaire, the Pain Catastrophizing Scale, the Center for Epidemiologic Studies Depression Scale, and the State-Trait Anxiety Inventory. Four models of the CERQ structure were examined and confirmatory factor analyses supported the original factor model, consisting of nine factors—Self-blame, Acceptance, Rumination, Positive refocusing, Refocus on planning, Positive reappraisal, Putting into perspective, Catastrophizing, and Other-blame. There was minimal overlap between CERQ subscales and their internal consistency was adequate. Correlational and regression analyses supported the construct validity of the CERQ. Our findings indicate that the CERQ (items-grouped version) is a sound instrument for assessing cognitive emotion regulation in patients with FMS. PMID:29321750
Refinement and partial validation of the UNESP-Botucatu multidimensional composite pain scale for assessing postoperative pain in horses.

PubMed

Taffarel, Marilda Onghero; Luna, Stelio Pacca Loureiro; de Oliveira, Flavia Augusta; Cardoso, Guilherme Schiess; Alonso, Juliana de Moura; Pantoja, Jose Carlos; Brondani, Juliana Tabarelli; Love, Emma; Taylor, Polly; White, Kate; Murrell, Joanna C

2015-04-01

Quantification of pain plays a vital role in the diagnosis and management of pain in animals. In order to refine and validate an acute pain scale for horses a prospective, randomized, blinded study was conducted. Twenty-four client owned adult horses were recruited and allocated to one of four following groups: anaesthesia only (GA); pre-emptive analgesia and anaesthesia (GAA,); anaesthesia, castration and postoperative analgesia (GC); or pre-emptive analgesia, anaesthesia and castration (GCA). One investigator, unaware of the treatment group, assessed all horses at time-points before and after intervention and completed the pain scale. Videos were also obtained at these time-points and were evaluated by a further four blinded evaluators who also completed the scale. The data were used to investigate the relevance, specificity, criterion validity and inter- and intra-observer reliability of each item on the pain scale, and to evaluate construct validity and responsiveness of the scale. Construct validity was demonstrated by the observed differences in scores between the groups, four hours after anaesthetic recovery and before administration of systemic analgesia in the GC group. Inter- and intra-observer reliability for the items was only satisfactory. Subsequently the pain scale was refined, based on results for relevance, specificity and total item correlation. Scale refinement and exclusion of items that did not meet predefined requirements generated a selection of relevant pain behaviours in horses. After further validation for reliability, these may be used to evaluate pain under clinical and experimental conditions.
The emotion dysregulation inventory: Psychometric properties and item response theory calibration in an autism spectrum disorder sample.

PubMed

Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A

2018-06-01

Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018, 11: 928-941. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical techniques were applied to identify the best final items. The EDI is unique because it captures common emotional problems in ASD and is appropriate for both nonverbal and verbal youth. It is an efficient and sensitive measure for use in clinical assessments, monitoring, and research with youth with ASD. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
Intention to stay and intention to leave: are they two sides of the same coin? A cross-sectional structural equation modelling study among health and social care workers.

PubMed

Nancarrow, Susan; Bradbury, Joanne; Pit, Sabrina Winona; Ariss, Steven

2014-01-01

"Intention to leave" (ITL) has been used interchangeably with the more positive construct "intention to stay" (ITS). The implicit assumption appears to be that both constructs represent different sides of the same coin. This study challenges this assumption. The objectives were (i) to test whether these constructs were similar measures of the same construct, and (ii) to assess the strength of the relationships between ITL and ITS with work-related outcomes. The Workforce Dynamics Questionnaire (WDQ) was administered to 298 staff. The WDQ included two items on ITL and was supplemented with three items on ITS. Structural equation modelling (SEM) was used. The response rate was 43%. The correlation between the two constructs was negative and quite high (r=-0.84), indicating potential issues with discriminant validity. However, the constructs behaved differently in relation to job satisfaction and job integration. ITS was a strong predictor (0.95, p<0.001), whereas ITL was not significantly related (0.34, p=0.195) to JS. The direct effects of JI on ITS was 0.30 and on ITL was -0.42. The indirect effects of JI were more contrasting, being 0.56 for ITS and -0.30 for ITL, via job satisfaction. This is the first study amongst British health and social care workers that has demonstrated that ITS and ITL are not measuring the same construct. While there is overlap, care should be taken when using these constructs interchangeably, particularly when measuring these concepts in organizations and when developing retention programs, policies, or activities to modify ITS and ITL.
Measuring organizational flexibility in community pharmacy: Building the capacity to implement cognitive pharmaceutical services.

PubMed

Feletto, Eleonora; Wilson, Laura Kate; Roberts, Alison Sarah; Benrimoj, Shalom Isaac

2011-03-01

Community pharmacy is undergoing transformation with increasing pressure to build its capacity to deliver cognitive pharmaceutical services ("services"). The theoretical framework of organizational flexibility (OF) may be used to assess the capacity of community pharmacy to implement change programs and guide capacity-building initiatives. To test the applicability of an existing scale measuring OF to the industry of community pharmacy in Australia. A mail survey was used to test a preexisting scale measuring OF amended from 28 items to 20 items testing 3 underlying factors of operational, structural, and strategic flexibility in the Australian community pharmacy context. The sample was 2006 randomly-stratified community pharmacies. A confirmatory factor analysis was conducted to assess the validity and reliability of the 1-factor models for each underlying construct and the full measurement model. Responses were received from a total of 395 (19.7%) community pharmacies. The 1-factor models of operational, structural, and strategic flexibility fit the data with appropriate respecification. Overall, the favorable fit of the individual factor constructs suggested that the multiple-factor measurement model should be tested. However, this model did not yield an interpretable response. Operational flexibility covaried negatively to the other factors, whereas structural and strategic flexibility shared covariance. Despite this, the results highlighting the individual factor fit suggest the constructs have application to pharmacy. The individual OF constructs were useful in the development and initial testing of a scale adapted for community pharmacy. When further developed and validated, the scale could be used to identify group of pharmacies that require individualized assistance to build capacity and integrate services and other new endeavors. Copyright © 2011 Elsevier Inc. All rights reserved.
[Instruments for evaluating oral health knowledge, attitudes and practice for parents /caregivers of small children].

PubMed

Martignon, Stefania; Bautista-Mendoza, Gloria; González-Carrera, María; Lafaurie-Villamil, Gloria; Morales, Veicy; Santamaría, Ruth

2008-01-01

Designing three instruments for evaluating oral health knowledge, attitudes and practice in parents/caregivers of low social-economic status 0-5 year-olds. Evaluating the instruments' reliability in terms of internal consistency and analysing items. Three instruments were constructed for evaluating low social-economic status 0-5 year-olds' parents/caregivers' oral health knowledge, attitudes and practice in the municipality of Usaquén , Bogotá , Colombia . 47 parents/caregivers were given a test establishing the instrument's reliability in terms of internal consistency and the adults' level of knowledge, attitudes and practice. A sub-sample was qualitatively analysed (content verification and understanding). Reliability was evaluated using Cronbach's alpha coefficient. Items were analysed for improving constructing and understanding the questions, taking four criteria into account: corrected homogeneity index (CHI), response trend, correlation between items and qualitative analysis. Cronbach's alpha coefficient for knowledge, attitudes and practice was 0,82, 0,80 and 0,62, respectively. Participants' level of knowledge, attitudes and practice was acceptable (60 %, 55 % and 91 %, respectively). This study found two out of the three evaluated instruments to be reliable (knowledge and attitudes); all three of them were then redesigned. The resulting instruments represent a valuable tool which can be used in future studies for describing and evaluating preventative programmes.
Passion: Does one scale fit all? Construct validity of two-factor passion scale and psychometric invariance over different activities and languages.

PubMed

Marsh, Herbert W; Vallerand, Robert J; Lafrenière, Marc-André K; Parker, Philip; Morin, Alexandre J S; Carbonneau, Noémie; Jowett, Sophia; Bureau, Julien S; Fernet, Claude; Guay, Frédéric; Salah Abduljabbar, Adel; Paquet, Yvan

2013-09-01

The passion scale, based on the dualistic model of passion, measures 2 distinct types of passion: Harmonious and obsessive passions are predictive of adaptive and less adaptive outcomes, respectively. In a substantive-methodological synergy, we evaluate the construct validity (factor structure, reliability, convergent and discriminant validity) of Passion Scale responses (N = 3,571). The exploratory structural equation model fit to the data was substantially better than the confirmatory factor analysis solution, and resulted in better differentiated (less correlated) factors. Results from a 13-model taxonomy of measurement invariance supported complete invariance (factor loadings, factor correlations, item uniquenesses, item intercepts, and latent means) over language (French vs. English; the instrument was originally devised in French, then translated into English) and gender. Strong measurement partial invariance over 5 passion activity groups (leisure, sport, social, work, education) indicates that the same set of items is appropriate for assessing passion across a wide variety of activities--a previously untested, implicit assumption that greatly enhances practical utility. Support was found for the convergent and discriminant validity of the harmonious and obsessive passion scales, based on a set of validity correlates: life satisfaction, rumination, conflict, time investment, activity liking and valuation, and perceiving the activity as a passion.
PROMIS GH (Patient-Reported Outcomes Measurement Information System Global Health) Scale in Stroke: A Validation Study.

PubMed

Katzan, Irene L; Lapin, Brittany

2018-01-01

The International Consortium for Health Outcomes Measurement recently included the 10-item PROMIS GH (Patient-Reported Outcomes Measurement Information System Global Health) scale as part of their recommended Standard Set of Stroke Outcome Measures. Before collection of PROMIS GH is broadly implemented, it is necessary to assess its performance in the stroke population. The objective of this study was to evaluate the psychometric properties of PROMIS GH in patients with ischemic stroke and intracerebral hemorrhage. PROMIS GH and 6 PROMIS domain scales measuring same/similar constructs were electronically collected on 1102 patients with ischemic and hemorrhagic strokes at various stages of recovery from their stroke who were seen in a cerebrovascular clinic from October 12, 2015, through June 2, 2017. Confirmatory factor analysis was performed to evaluate the adequacy of 2-factor structure of component scores. Test-retest reliability and convergent validity of PROMIS GH items and component scores were assessed. Discriminant validity and responsiveness were compared between PROMIS GH and PROMIS domain scales measuring the same or related constructs. Analyses were repeated stratified by stroke subtype and modified Rankin Scale score <2 versus ≥2. There was moderate internal reliability (ordinal α, 0.82-0.88) and marginal model fit for the 2-factor solution for component scores (root mean square error of approximation, 0.11). Convergent validity was good with significant correlations between all PROMIS GH items and PROMIS domain scales ( P <0.001 for all). There was excellent discrimination for all PROMIS GH items and component scores across modified Rankin Scale levels. Good responsiveness (effect size, >0.5) was demonstrated for 8 of the 10 PROMIS GH items. Reliability and validity remained consistent across stroke subtype and disability level (modified Rankin Scale, <2 versus ≥2). PROMIS GH exhibits acceptable performance in patients with stroke. Our findings support International Consortium for Health Outcomes Measurement recommendation to use PROMIS GH as part of the standard set of outcome measures in stroke. © 2017 American Heart Association, Inc.
1999 Survey of Active Duty Personnel: Administration, Datasets, and Codebook. Appendix G: Frequency and Percentage Distributions for Variables in the Survey Analysis Files.

DTIC Science & Technology

2000-12-01

A SKIP FLAG INDICATING THE RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP...RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5, NOTE 2, IN APPENDIX D. G-52...RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5
The medial tibial stress syndrome score: a new patient-reported outcome measure.

PubMed

Winters, Marinus; Moen, Maarten H; Zimmermann, Wessel O; Lindeboom, Robert; Weir, Adam; Backx, Frank Jg; Bakker, Eric Wp

2016-10-01

At present, there is no validated patient-reported outcome measure (PROM) for patients with medial tibial stress syndrome (MTSS). Our aim was to select and validate previously generated items and create a valid, reliable and responsive PROM for patients with MTSS: the MTSS score. A prospective cohort study was performed in multiple sports medicine, physiotherapy and military facilities in the Netherlands. Participants with MTSS filled out the previously generated items for the MTSS score on 3 occasions. From previously generated items, we selected the best items. We assessed the MTSS score for its validity, reliability and responsiveness. The MTSS score was filled out by 133 participants with MTSS. Factor analysis showed the MTSS score to exhibit a single-factor structure with acceptable internal consistency (α=0.58) and good test-retest reliability (intraclass correlation coefficient=0.81). The MTSS score ranges from 0 to 10 points. The smallest detectable change in our sample was 0.69 at the group level and 4.80 at the individual level. Construct validity analysis showed significant moderate-to-large correlations (r=0.34-0.52, p<0.01). Responsiveness of the MTSS score was confirmed by a significant relation with the global perceived effect scale (β=-0.288, R(2)=0.21, p<0.001). The MTSS score is a valid, reliable and responsive PROM to measure the severity of MTSS. It is designed to evaluate treatment outcomes in clinical studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Validation of PROMIS ® Physical Function computerized adaptive tests for orthopaedic foot and ankle outcome research.

PubMed

Hung, Man; Baumhauer, Judith F; Latt, L Daniel; Saltzman, Charles L; SooHoo, Nelson F; Hunt, Kenneth J

2013-11-01

In 2012, the American Orthopaedic Foot & Ankle Society(®) established a national network for collecting and sharing data on treatment outcomes and improving patient care. One of the network's initiatives is to explore the use of computerized adaptive tests (CATs) for patient-level outcome reporting. We determined whether the CAT from the NIH Patient Reported Outcome Measurement Information System(®) (PROMIS(®)) Physical Function (PF) item bank provides efficient, reliable, valid, precise, and adequately covered point estimates of patients' physical function. After informed consent, 288 patients with a mean age of 51 years (range, 18-81 years) undergoing surgery for common foot and ankle problems completed a web-based questionnaire. Efficiency was determined by time for test administration. Reliability was assessed with person and item reliability estimates. Validity evaluation included content validity from expert review and construct validity measured against the PROMIS(®) Pain CAT and patient responses based on tradeoff perceptions. Precision was assessed by standard error of measurement (SEM) across patients' physical function levels. Instrument coverage was based on a person-item map. Average time of test administration was 47 seconds. Reliability was 0.96 for person and 0.99 for item. Construct validity against the Pain CAT had an r value of -0.657 (p < 0.001). Precision had an SEM of less than 3.3 (equivalent to a Cronbach's alpha of ≥ 0.90) across a broad range of function. Concerning coverage, the ceiling effect was 0.32% and there was no floor effect. The PROMIS(®) PF CAT appears to be an excellent method for measuring outcomes for patients with foot and ankle surgery. Further validation of the PROMIS(®) item banks may ultimately provide a valid and reliable tool for measuring patient-reported outcomes after injuries and treatment.
Method of locating related items in a geometric space for data mining

DOEpatents

Hendrickson, B.A.

1999-07-27

A method for locating related items in a geometric space transforms relationships among items to geometric locations. The method locates items in the geometric space so that the distance between items corresponds to the degree of relatedness. The method facilitates communication of the structure of the relationships among the items. The method is especially beneficial for communicating databases with many items, and with non-regular relationship patterns. Examples of such databases include databases containing items such as scientific papers or patents, related by citations or keywords. A computer system adapted for practice of the present invention can include a processor, a storage subsystem, a display device, and computer software to direct the location and display of the entities. The method comprises assigning numeric values as a measure of similarity between each pairing of items. A matrix is constructed, based on the numeric values. The eigenvectors and eigenvalues of the matrix are determined. Each item is located in the geometric space at coordinates determined from the eigenvectors and eigenvalues. Proper construction of the matrix and proper determination of coordinates from eigenvectors can ensure that distance between items in the geometric space is representative of the numeric value measure of the items' similarity. 12 figs.
Method of locating related items in a geometric space for data mining

DOEpatents

Hendrickson, Bruce A.

1999-01-01

A method for locating related items in a geometric space transforms relationships among items to geometric locations. The method locates items in the geometric space so that the distance between items corresponds to the degree of relatedness. The method facilitates communication of the structure of the relationships among the items. The method is especially beneficial for communicating databases with many items, and with non-regular relationship patterns. Examples of such databases include databases containing items such as scientific papers or patents, related by citations or keywords. A computer system adapted for practice of the present invention can include a processor, a storage subsystem, a display device, and computer software to direct the location and display of the entities. The method comprises assigning numeric values as a measure of similarity between each pairing of items. A matrix is constructed, based on the numeric values. The eigenvectors and eigenvalues of the matrix are determined. Each item is located in the geometric space at coordinates determined from the eigenvectors and eigenvalues. Proper construction of the matrix and proper determination of coordinates from eigenvectors can ensure that distance between items in the geometric space is representative of the numeric value measure of the items' similarity.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.