individual test items: Topics by Science.gov

Sample records for individual test items

Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Technical Characteristics of the Peabody Individual Achievement Test as a Function of Item Arrangement and Basal and Ceiling Rules.

ERIC Educational Resources Information Center

Browning, Robert; And Others

1979-01-01

Effects that item order and basal and ceiling rules have on test means, variances, and internal consistency estimates for the Peabody Individual Achievement Test mathematics and reading recognition subtests were examined. Items on the math and reading recognition subtests were significantly easier or harder than test placements indicated. (Author)
Measuring change for a multidimensional test using a generalized explanatory longitudinal item response model.

PubMed

Cho, Sun-Joo; Athay, Michele; Preacher, Kristopher J

2013-05-01

Even though many educational and psychological tests are known to be multidimensional, little research has been done to address how to measure individual differences in change within an item response theory framework. In this paper, we suggest a generalized explanatory longitudinal item response model to measure individual differences in change. New longitudinal models for multidimensional tests and existing models for unidimensional tests are presented within this framework and implemented with software developed for generalized linear models. In addition to the measurement of change, the longitudinal models we present can also be used to explain individual differences in change scores for person groups (e.g., learning disabled students versus non-learning disabled students) and to model differences in item difficulties across item groups (e.g., number operation, measurement, and representation item groups in a mathematics test). An empirical example illustrates the use of the various models for measuring individual differences in change when there are person groups and multiple skill domains which lead to multidimensionality at a time point. © 2012 The British Psychological Society.
Item Parameter Invariance of the Kaufman Adolescent and Adult Intelligence Test across Male and Female Samples

ERIC Educational Resources Information Center

Immekus, Jason C.; Maller, Susan J.

2009-01-01

The Kaufman Adolescent and Adult Intelligence Test (KAIT[TM]) is an individually administered test of intelligence for individuals ranging in age from 11 to 85+ years. The item response theory-likelihood ratio procedure, based on the two-parameter logistic model, was used to detect differential item functioning (DIF) in the KAIT across males and…
An Item Response Theory Model for Test Bias.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
Advancing the efficiency and efficacy of patient reported outcomes with multivariate computer adaptive testing.

PubMed

Morris, Scott; Bass, Mike; Lee, Mirinae; Neapolitan, Richard E

2017-09-01

The Patient Reported Outcomes Measurement Information System (PROMIS) initiative developed an array of patient reported outcome (PRO) measures. To reduce the number of questions administered, PROMIS utilizes unidimensional item response theory and unidimensional computer adaptive testing (UCAT), which means a separate set of questions is administered for each measured trait. Multidimensional item response theory (MIRT) and multidimensional computer adaptive testing (MCAT) simultaneously assess correlated traits. The objective was to investigate the extent to which MCAT reduces patient burden relative to UCAT in the case of PROs. One MIRT and 3 unidimensional item response theory models were developed using the related traits anxiety, depression, and anger. Using these models, MCAT and UCAT performance was compared with simulated individuals. Surprisingly, the root mean squared error for both methods increased with the number of items. These results were driven by large errors for individuals with low trait levels. A second analysis focused on individuals aligned with item content. For these individuals, both MCAT and UCAT accuracies improved with additional items. Furthermore, MCAT reduced the test length by 50%. For the PROMIS Emotional Distress banks, neither UCAT nor MCAT provided accurate estimates for individuals at low trait levels. Because the items in these banks were designed to detect clinical levels of distress, there is little information for individuals with low trait values. However, trait estimates for individuals targeted by the banks were accurate and MCAT asked substantially fewer questions. By reducing the number of items administered, MCAT can allow clinicians and researchers to assess a wider range of PROs with less patient burden. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Treatment of Not-Administered Items on Individually Administered Intelligence Tests

ERIC Educational Resources Information Center

He, Wei; Wolfe, Edward W.

2012-01-01

In administration of individually administered intelligence tests, items are commonly presented in a sequence of increasing difficulty, and test administration is terminated after a predetermined number of incorrect answers. This practice produces stochastically censored data, a form of nonignorable missing data. By manipulating four factors…
Methodology for the development and calibration of the SCI-QOL item banks

PubMed Central

Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David

2015-01-01

Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Methodology for the development and calibration of the SCI-QOL item banks.

PubMed

Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David

2015-05-01

To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Enhanced accessibility of ignored neutral and negative items in nonclinical dissociative individuals.

PubMed

Chiu, Chui-De

2018-01-01

While clinical studies showed paradoxical memory phenomena, including the intrusion and amnesia of stressful experiences that are features of dissociation, the results of laboratory studies on dissociative individuals' forgetting of experimental stimuli through cognitive control varied. Some studies demonstrated ineffective inhibition, and others found that dissociative individuals could remember fewer trauma words in a divided-attention context. Dissociative individuals may utilize superior cognitive disengagement to forget the representations. This hypothesis was tested in nonclinical individuals with high, medium, and low dissociation proneness. In the study phase, the participants learned several lists of experimental words and kept updating working memory by remembering the last four items on a list (target) and ignoring those non-target items. A recognition test was then conducted. The high dissociation group performed better on updating working memory. However, the accessibility of the representations of neutral and negative non-target items was elevated. Dissociative individuals disengaged attention effectively from items they intended to ignore, and the representations of the ignored items were more accessible when cues were available. Copyright © 2017 Elsevier Inc. All rights reserved.
Do people with and without medical conditions respond similarly to the short health anxiety inventory? An assessment of differential item functioning using item response theory.

PubMed

LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G

2015-04-01

Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Item response theory analysis of the mechanics baseline test

NASA Astrophysics Data System (ADS)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Estimating Total-test Scores from Partial Scores in a Matrix Sampling Design.

ERIC Educational Resources Information Center

Sachar, Jane; Suppes, Patrick

It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary.

PubMed

Petscher, Yaacov; Mitchell, Alison M; Foorman, Barbara R

2015-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

PubMed Central

Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

2016-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568
Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Analysis of Individual "Test Of Astronomy STandards" (TOAST) Item Responses

ERIC Educational Resources Information Center

Slater, Stephanie J.; Schleigh, Sharon Price; Stork, Debra J.

2015-01-01

The development of valid and reliable strategies to efficiently determine the knowledge landscape of introductory astronomy college students is an effort of great interest to the astronomy education community. This study examines individual item response rates from a widely used conceptual understanding survey, the Test Of Astronomy Standards…
Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form.

PubMed

Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W

2015-05-01

To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form

PubMed Central

Kisala, Pamela A.; Tulsky, David S.; Pace, Natalie; Victorson, David; Choi, Seung W.; Heinemann, Allen W.

2015-01-01

Objective To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Design Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Stigma Item Bank Results A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. Conclusions The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications. PMID:26010973
A Study on Detecting of Differential Item Functioning of PISA 2006 Science Literacy Items in Turkish and American Samples

ERIC Educational Resources Information Center

Çikirikçi Demirtasli, Nükhet; Ulutas, Seher

2015-01-01

Problem Statement: Item bias occurs when individuals from different groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test…

Uncertainty in BRCA1 cancer susceptibility testing.

PubMed

Baty, Bonnie J; Dudley, William N; Musters, Adrian; Kinney, Anita Y

2006-11-15

This study investigated uncertainty in individuals undergoing genetic counseling/testing for breast/ovarian cancer susceptibility. Sixty-three individuals from a single kindred with a known BRCA1 mutation rated uncertainty about 12 items on a five-point Likert scale before and 1 month after genetic counseling/testing. Factor analysis identified a five-item total uncertainty scale that was sensitive to changes before and after testing. The items in the scale were related to uncertainty about obtaining health care, positive changes after testing, and coping well with results. The majority of participants (76%) rated reducing uncertainty as an important reason for genetic testing. The importance of reducing uncertainty was stable across time and unrelated to anxiety or demographics. Yet, at baseline, total uncertainty was low and decreased after genetic counseling/testing (P = 0.004). Analysis of individual items showed that after genetic counseling/testing, there was less uncertainty about the participant detecting cancer early (P = 0.005) and coping well with their result (P < 0.001). Our findings support the importance to clients of genetic counseling/testing as a means of reducing uncertainty. Testing may help clients to reduce the uncertainty about items they can control, and it may be important to differentiate the sources of uncertainty that are more or less controllable. Genetic counselors can help clients by providing anticipatory guidance about the role of uncertainty in genetic testing. (c) 2006 Wiley-Liss, Inc.
Missouri Assessment Program (MAP), Spring 2000: High School Health/Physical Education, Released Items, Grade 9.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Item Analyses of Memory Differences

PubMed Central

Salthouse, Timothy A.

2017-01-01

Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285
Measuring self-esteem after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Self-esteem item bank and short form

PubMed Central

Kalpakjian, Claire Z.; Tate, Denise G.; Kisala, Pamela A.; Tulsky, David S.

2015-01-01

Objective To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Design Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory- (IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI completed the self-esteem items. Results A unidimensional model was observed (CFI = 0.946; RMSEA = 0.087) and measurement precision was good (theta range between −2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. Conclusion This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010972
Measuring self-esteem after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Self-esteem item bank and short form.

PubMed

Kalpakjian, Claire Z; Tate, Denise G; Kisala, Pamela A; Tulsky, David S

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory-(IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. A total of 717 individuals with SCI completed the self-esteem items. A unidimensional model was observed (CFI=0.946; RMSEA=0.087) and measurement precision was good (theta range between -2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Measuring resilience after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Resilience item bank and short form.

PubMed

Victorson, David; Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Weiland, Brian; Choi, Seung W

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Resilience item bank and short form. Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. A total of 717 individuals with SCI completed the Resilience items. A unidimensional model was observed (CFI=0.968; RMSEA=0.074) and measurement precision was good (theta range between -3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Measuring resilience after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Resilience item bank and short form

PubMed Central

Victorson, David; Tulsky, David S.; Kisala, Pamela A.; Kalpakjian, Claire Z.; Weiland, Brian; Choi, Seung W.

2015-01-01

Objective To describe the development and psychometric properties of the Spinal Cord Injury - Quality of Life (SCI-QOL) Resilience item bank and short form. Design Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). Setting We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. Participants A total of 717 individuals with SCI completed the Resilience items. Results A unidimensional model was observed (CFI = 0.968; RMSEA = 0.074) and measurement precision was good (theta range between −3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. Conclusion This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010971
Capturing specific abilities as a window into human individuality: the example of face recognition.

PubMed

Wilmer, Jeremy B; Germine, Laura; Chabris, Christopher F; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken

2012-01-01

Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality.
Enhanced Automatic Question Creator--EAQC: Concept, Development and Evaluation of an Automatic Test Item Creation Tool to Foster Modern e-Education

ERIC Educational Resources Information Center

Gutl, Christian; Lankmayr, Klaus; Weinhofer, Joachim; Hofler, Margit

2011-01-01

Research in automated creation of test items for assessment purposes became increasingly important during the recent years. Due to automatic question creation it is possible to support personalized and self-directed learning activities by preparing appropriate and individualized test items quite easily with relatively little effort or even fully…
Analysis of Item Response Patterns: Consistency Indices and Their Application to Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Harnisch, Delwyn L.

The major emphasis of this paper is in the examination of test item response patterns. Tatsuoka and Tatsuoka (1980) have developed two indices of response consistency: the norm-conformity index (NCI) and the individual consistency index (ICI). The NCI provides a measure of the degree of consistency between the response pattern of an individual and…
Test-Retest Reproducibility of Two Short-Form Balance Measures Used in Individuals with Stroke

ERIC Educational Resources Information Center

Liaw, Lih-Jiun; Hsieh, Ching-Lin; Hsu, Miao-Ju; Chen, Hui-Mei; Lin, Jau-Hong; Lo, Sing-Kai

2012-01-01

The aim of this study is to determine the test-retest reproducibility of the seven-item Short-Form Berg Balance Scale (SFBBS) and the five-item Short-Form Postural Assessment Scale for Stroke Patients (SFPASS) in individuals with chronic stroke. Fifty-two chronic stroke patients from two rehabilitation departments were included in the study. Both…
Modeling Booklet Effects for Nonequivalent Group Designs in Large-Scale Assessment

ERIC Educational Resources Information Center

Hecht, Martin; Weirich, Sebastian; Siegle, Thilo; Frey, Andreas

2015-01-01

Multiple matrix designs are commonly used in large-scale assessments to distribute test items to students. These designs comprise several booklets, each containing a subset of the complete item pool. Besides reducing the test burden of individual students, using various booklets allows aligning the difficulty of the presented items to the assumed…
7 CFR 2902.5 - Item designation.

Code of Federal Regulations, 2011 CFR

2011-01-01

..., USDA will use life cycle cost information only from tests using the BEES analytical method. (c... availability of such items and the economic and technological feasibility of using such items, including life cycle costs. USDA will gather information on individual products within an item and extrapolate that...
Measuring more than we know? An examination of the motivational and situational influences in science achievement

NASA Astrophysics Data System (ADS)

Haydel, Angela Michelle

The purpose of this dissertation was to advance theoretical understanding about fit between the personal resources of individuals and the characteristics of science achievement tasks. Testing continues to be pervasive in schools, yet we know little about how students perceive tests and what they think and feel while they are actually working on test items. This study focused on both the personal (cognitive and motivational) and situational factors that may contribute to individual differences in achievement-related outcomes. 387 eighth grade students first completed a survey including measures of science achievement goals, capability beliefs, efficacy related to multiple-choice items and performance assessments, validity beliefs about multiple-choice items and performance assessments, and other perceptions of these item formats. Students then completed science achievement tests including multiple-choice items and two performance assessments. A sample of students was asked to verbalize both thoughts and feelings as they worked through the test items. These think-alouds were transcribed and coded for evidence of cognitive, metacognitive and motivational engagement. Following each test, all students completed measures of effort, mood, energy level and strategy use during testing. Students reported that performance assessments were more challenging, authentic, interesting and valid than multiple-choice tests. They also believed that comparisons between students were easier using multiple-choice items. Overall, students tried harder, felt better, had higher levels of energy and used more strategies while working on performance assessments. Findings suggested that performance assessments might be more congruent with a mastery achievement goal orientation, while multiple-choice tests might be more congruent with a performance achievement goal orientation. A variable-centered analytic approach including regression analyses provided information about how students, on average, who differed in terms of their teachers' ratings of their science ability, achievement goals, capability beliefs and experiences with science achievement tasks perceived, engaged in, and performed on multiple-choice items and performance assessments. Person-centered analyses provided information about the perceptions, engagement and performance of subgroups of individuals who had different motivational characteristics. Generally, students' personal goals and capability beliefs related more strongly to test perceptions, but not performance, while teacher ratings of ability and test-specific beliefs related to performance.
A Procedure to Detect Item Bias Present Simultaneously in Several Items

DTIC Science & Technology

1991-04-25

exhibit a coherent and major biasing influence at the test level. In partic- ular, this can be true even if each individual item displays only a minor...response functions (IRFs) without the use of item parameter estimation algorithms when the sample size is too small for their use. Thissen, Steinberg...convention). A random sample of examinees is drawn from each group, and a test of N items is administered to them. Typically it is suspected that a
[Transcultural adaptation of the Antifat Attitudes Test to Brazilian Portuguese].

PubMed

Obara, Angélica Almeida; Alvarenga, Marle Dos Santos

2018-05-01

Obese individuals are often blamed for their own condition and the targets of discrimination and prejudice. The scope of this study is to describe the cross-cultural adaptation to Brazilian Portuguese and the validation of the Antifat Attitudes Test - specifically developed for evaluation of negative attitudes toward the obese individual. The scale has 34 statements distributed in three subscales - Social/Character Disparagement (15 items), Physical/Romantic Unattractiveness (10 items) and Weight Control/Blame (9 items). The method involved the translation of the scale; evaluation of the conceptual, operational and item equivalence; evaluation of the semantic equivalence using the paired t test, the Pearson correlation coefficient and the intraclass correlation coefficient (ICC); internal consistency evaluation (Cronbach's alpha) and test-retest reliability (ICC) and Confirmatory Factor Analysis - after application in 340 college students in the area of health. The results showed good global internal consistency and reliability (α 0.85; CCI 0.83), and factor analysis showed that the original subscales can be kept in the adaptation, and therefore the scale adapted to the Brazilian-Portuguese version is valid and useful in studies to explore negative attitudes toward obese individuals.
Evaluation of Item Candidates: The PROMIS Qualitative Item Review

PubMed Central

DeWalt, Darren A.; Rothrock, Nan; Yount, Susan; Stone, Arthur A.

2009-01-01

One of the PROMIS (Patient-Reported Outcome Measurement Information System) network's primary goals is the development of a comprehensive item bank for patient-reported outcomes of chronic diseases. For its first set of item banks, PROMIS chose to focus on pain, fatigue, emotional distress, physical function, and social function. An essential step for the development of an item pool is the identification, evaluation, and revision of extant questionnaire items for the core item pool. In this work, we also describe the systematic process wherein items are classified for subsequent statistical processing by the PROMIS investigators. Six phases of item development are documented: identification of extant items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. Identification of items refers to the systematic search for existing items in currently available scales. Expert item review and revision was conducted by trained professionals who reviewed the wording of each item and revised as appropriate for conventions adopted by the PROMIS network. Focus groups were used to confirm domain definitions and to identify new areas of item development for future PROMIS item banks. Cognitive interviews were used to examine individual items. Items successfully screened through this process were sent to field testing and will be subjected to innovative scale construction procedures. PMID:17443114
Clinical utility of a single-item test for DSM-5 alcohol use disorder among outpatients with anxiety and depressive disorders.

PubMed

Bartoli, Francesco; Crocamo, Cristina; Biagi, Enrico; Di Carlo, Francesco; Parma, Francesca; Madeddu, Fabio; Capuzzi, Enrico; Colmegna, Fabrizia; Clerici, Massimo; Carrà, Giuseppe

2016-08-01

There is a lack of studies testing accuracy of fast screening methods for alcohol use disorder in mental health settings. We aimed at estimating clinical utility of a standard single-item test for case finding and screening of DSM-5 alcohol use disorder among individuals suffering from anxiety and mood disorders. We recruited adults consecutively referred, in a 12-month period, to an outpatient clinic for anxiety and depressive disorders. We assessed the National Institute on Alcohol Abuse and Alcoholism (NIAAA) single-item test, using the Mini- International Neuropsychiatric Interview (MINI), plus an additional item of Composite International Diagnostic Interview (CIDI) for craving, as reference standard to diagnose a current DSM-5 alcohol use disorder. We estimated sensitivity and specificity of the single-item test, as well as positive and negative Clinical Utility Indexes (CUIs). 242 subjects with anxiety and mood disorders were included. The NIAAA single-item test showed high sensitivity (91.9%) and specificity (91.2%) for DSM-5 alcohol use disorder. The positive CUI was 0.601, whereas the negative one was 0.898, with excellent values also accounting for main individual characteristics (age, gender, diagnosis, psychological distress levels, smoking status). Testing for relevant indexes, we found an excellent clinical utility of the NIAAA single-item test for screening true negative cases. Our findings support a routine use of reliable methods for rapid screening in similar mental health settings. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
FIRST GRADE CHILDREN'S CONCEPT OF ADDITION OF NATURAL NUMBERS.

ERIC Educational Resources Information Center

STEFFE, LESLIE; VAN ENGEN, HENRY

MIDDLE-CLASS, FIRST-GRADE STUDENTS (100) WERE TESTED INDIVIDUALLY ON 4 ITEMS OF CONCEPT OF ADDITION AND CONSERVATION OF NUMBER. THE TEST ITEMS WERE IDENTICAL EXCEPT FOR THE NUMBER OF OBJECTS INVOLVED. FOR EACH ITEM, TWO PILES OF CANDY WERE PLACED BEFORE EACH CHILD AND THEN MOVED TOGETHER. THE STUDY SHOWED NO MAJOR DIFFERENCE IN THE MEAN…
Covariates of the Rating Process in Hierarchical Models for Multiple Ratings of Test Items

ERIC Educational Resources Information Center

Mariano, Louis T.; Junker, Brian W.

2007-01-01

When constructed response test items are scored by more than one rater, the repeated ratings allow for the consideration of individual rater bias and variability in estimating student proficiency. Several hierarchical models based on item response theory have been introduced to model such effects. In this article, the authors demonstrate how these…

Capturing specific abilities as a window into human individuality: The example of face recognition

PubMed Central

Wilmer, Jeremy B.; Germine, Laura; Chabris, Christopher F.; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken

2013-01-01

Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality. PMID:23428079
The False-Friend Effect in Three Profoundly Deaf Learners of French: Disentangling Morphology, Phonology and Orthography

ERIC Educational Resources Information Center

Janke, Vikki; Kolokonte, Marina

2015-01-01

Three profoundly deaf individuals undertook a low-frequency backward lexical translation task (French/English), where morphological structure was manipulated and orthographic distance between test items was measured. Conditions included monomorphemic items (simplex), polymorphemic items (complex), items whose French morphological structure…
Development of The Science Processes Test.

ERIC Educational Resources Information Center

Ludeman, Robert R.

Presented is a description and copy of a test manual developed to include items in the test on the basis of children's performance; each item correlated highly with performance on an external criterion. The external criterion was the Individual Competency Measures of the elementary science program Science - A Process Approach (SAPA). The test…
Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

ERIC Educational Resources Information Center

Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

2017-01-01

Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
Effect of individual thinking styles on item selection during study time allocation.

PubMed

Jia, Xiaoyu; Li, Weijian; Cao, Liren; Li, Ping; Shi, Meiling; Wang, Jingjing; Cao, Wei; Li, Xinyu

2018-04-01

The influence of individual differences on learners' study time allocation has been emphasised in recent studies; however, little is known about the role of individual thinking styles (analytical versus intuitive). In the present study, we explored the influence of individual thinking styles on learners' application of agenda-based and habitual processes when selecting the first item during a study-time allocation task. A 3-item cognitive reflection test (CRT) was used to determine individuals' degree of cognitive reliance on intuitive versus analytical cognitive processing. Significant correlations between CRT scores and the choices of first item selection were observed in both Experiment 1a (study time was 5 seconds per triplet) and Experiment 1b (study time was 20 seconds per triplet). Furthermore, analytical decision makers constructed a value-based agenda (prioritised high-reward items), whereas intuitive decision makers relied more upon habitual responding (selected items from the leftmost of the array). The findings of Experiment 1a were replicated in Experiment 2 notwithstanding ruling out the possible effects from individual intelligence and working memory capacity. Overall, the individual thinking style plays an important role on learners' study time allocation and the predictive ability of CRT is reliable in learners' item selection strategy. © 2016 International Union of Psychological Science.
Developing self-concept instrument for pre-service mathematics teachers

NASA Astrophysics Data System (ADS)

Afgani, M. W.; Suryadi, D.; Dahlan, J. A.

2018-01-01

This study aimed to develop self-concept instrument for undergraduate students of mathematics education in Palembang, Indonesia. Type of this study was development research of non-test instrument in questionnaire form. A Validity test of the instrument was performed with construct validity test by using Pearson product moment and factor analysis, while reliability test used Cronbach’s alpha. The instrument was tested by 65 undergraduate students of mathematics education in one of the universities at Palembang, Indonesia. The instrument consisted of 43 items with 7 aspects of self-concept, that were the individual concern, social identity, individual personality, view of the future, the influence of others who become role models, the influence of the environment inside or outside the classroom, and view of the mathematics. The result of validity test showed there was one invalid item because the value of Pearson’s r was 0.107 less than the critical value (0.244; α = 0.05). The item was included in social identity aspect. After the invalid item was removed, Construct validity test with factor analysis generated only one factor. The Kaiser-Meyer-Olkin (KMO) coefficient was 0.846 and reliability coefficient was 0.91. From that result, we concluded that the self-concept instrument for undergraduate students of mathematics education in Palembang, Indonesia was valid and reliable with 42 items.
A Case Study on an Item Writing Process: Use of Test Specifications, Nature of Group Dynamics, and Individual Item Writers' Characteristics

ERIC Educational Resources Information Center

Kim, Jiyoung; Chi, Youngshin; Huensch, Amanda; Jun, Heesung; Li, Hongli; Roullion, Vanessa

2010-01-01

This article discusses a case study on an item writing process that reflects on our practical experience in an item development project. The purpose of the article is to share our lessons from the experience aiming to demystify item writing process. The study investigated three issues that naturally emerged during the project: how item writers use…
A Combined IRT and SEM Approach for Individual-Level Assessment in Test-Retest Studies

ERIC Educational Resources Information Center

Ferrando, Pere J.

2015-01-01

The standard two-wave multiple-indicator model (2WMIM) commonly used to analyze test-retest data provides information at both the group and item level. Furthermore, when applied to binary and graded item responses, it is related to well-known item response theory (IRT) models. In this article the IRT-2WMIM relations are used to obtain additional…
Appropriateness Measurement with Polychotomous Item Response Models and Standardized Indices. Measurement Series, 84-1.

ERIC Educational Resources Information Center

Drasgow, Fritz; And Others

The test scores of some examinees on a multiple-choice test may not provide adequate measures of their abilities. The goal of appropriateness measurement is to identify such individuals. Earlier theoretical and experimental work considered examinees answering all, or almost all, test items. This article reports research that extends…
Fairness in Computerized Testing: Detecting Item Bias Using CATSIB with Impact Present

ERIC Educational Resources Information Center

Chu, Man-Wai; Lai, Hollis

2013-01-01

In educational assessment, there is an increasing demand for tailoring assessments to individual examinees through computer adaptive tests (CAT). As such, it is particularly important to investigate the fairness of these adaptive testing processes, which require the investigation of differential item function (DIF) to yield information about item…
Air Force Officer Qualifying Test Form O: Development and Standardization.

ERIC Educational Resources Information Center

Rogers, Deborah L.; And Others

This report presents the rationale, development, and standardization of the Air Force Officer Qualifying Test (AFOQT) Form O. The test is used to select individuals for officer commissioning programs, and candidates for pilot and navigator training. Form O contains 380 items organized in 16 subtests. All items are administered in a single test…
Detection of Item Preknowledge Using Likelihood Ratio Test and Score Test

ERIC Educational Resources Information Center

Sinharay, Sandip

2017-01-01

An increasing concern of producers of educational assessments is fraudulent behavior during the assessment (van der Linden, 2009). Benefiting from item preknowledge (e.g., Eckerly, 2017; McLeod, Lewis, & Thissen, 2003) is one type of fraudulent behavior. This article suggests two new test statistics for detecting individuals who may have…
What is the Ability Emotional Intelligence Test (MSCEIT) good for? An evaluation using item response theory.

PubMed

Fiori, Marina; Antonietti, Jean-Philippe; Mikolajczak, Moira; Luminet, Olivier; Hansenne, Michel; Rossier, Jérôme

2014-01-01

The ability approach has been indicated as promising for advancing research in emotional intelligence (EI). However, there is scarcity of tests measuring EI as a form of intelligence. The Mayer Salovey Caruso Emotional Intelligence Test, or MSCEIT, is among the few available and the most widespread measure of EI as an ability. This implies that conclusions about the value of EI as a meaningful construct and about its utility in predicting various outcomes mainly rely on the properties of this test. We tested whether individuals who have the highest probability of choosing the most correct response on any item of the test are also those who have the strongest EI ability. Results showed that this is not the case for most items: The answer indicated by experts as the most correct in several cases was not associated with the highest ability; furthermore, items appeared too easy to challenge individuals high in EI. Overall results suggest that the MSCEIT is best suited to discriminate persons at the low end of the trait. Results are discussed in light of applied and theoretical considerations.
Assessing the capacity of ministries of health to use research in decision-making: conceptual framework and tool.

PubMed

Rodríguez, Daniela C; Hoe, Connie; Dale, Elina M; Rahman, M Hafizur; Akhter, Sadika; Hafeez, Assad; Irava, Wayne; Rajbangshi, Preety; Roman, Tamlyn; Ţîrdea, Marcela; Yamout, Rouham; Peters, David H

2017-08-01

The capacity to demand and use research is critical for governments if they are to develop policies that are informed by evidence. Existing tools designed to assess how government officials use evidence in decision-making have significant limitations for low- and middle-income countries (LMICs); they are rarely tested in LMICs and focus only on individual capacity. This paper introduces an instrument that was developed to assess Ministry of Health (MoH) capacity to demand and use research evidence for decision-making, which was tested for reliability and validity in eight LMICs (Bangladesh, Fiji, India, Lebanon, Moldova, Pakistan, South Africa, Zambia). Instrument development was based on a new conceptual framework that addresses individual, organisational and systems capacities, and items were drawn from existing instruments and a literature review. After initial item development and pre-testing to address face validity and item phrasing, the instrument was reduced to 54 items for further validation and item reduction. In-country study teams interviewed a systematic sample of 203 MoH officials. Exploratory factor analysis was used in addition to standard reliability and validity measures to further assess the items. Thirty items divided between two factors representing organisational and individual capacity constructs were identified. South Africa and Zambia demonstrated the highest level of organisational capacity to use research, whereas Pakistan and Bangladesh were the lowest two. In contrast, individual capacity was highest in Pakistan, followed by South Africa, whereas Bangladesh and Lebanon were the lowest. The framework and related instrument represent a new opportunity for MoHs to identify ways to understand and improve capacities to incorporate research evidence in decision-making, as well as to provide a basis for tracking change.
Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7.

PubMed

Kisala, Pamela A; Tulsky, David S; Kalpakjian, Claire Z; Heinemann, Allen W; Pohlig, Ryan T; Carle, Adam; Choi, Seung W

2015-05-01

To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.
Qualitative Evaluation of Pediatric Pain-Behavior, -Quality and -Intensity Item Candidates and the PROMIS Pain Domain Framework in Children with Chronic Pain

PubMed Central

Jacobson, C. Jeffrey; Kashikar-Zuck, Susmita; Farrell, Jennifer; Barnett, Kimberly; Goldschneider, Ken; Dampier, Carlton; Cunningham, Natoshia; Crosby, Lori; DeWitt, Esi Morgan

2015-01-01

As initial steps in a broader effort to develop and test pediatric Pain Behavior and Pain Quality item banks for the Patient Reported Outcomes Measurement Information System (PROMIS®), we employed qualitative interview and item review methods to 1) evaluate the overall conceptual scope and content validity of the PROMIS pain domain framework among children with chronic /recurrent pain conditions, and 2) develop item candidates for further psychometric testing. To elicit the experiential and conceptual scope of pain outcomes across a variety of pediatric recurrent/chronic pain conditions, we conducted semi-structured individual (32) and focus-group interviews (2) with children and adolescents (8–17 years), and parents of children with pain (individual (32) and focus group (2)). Interviews with pain experts (10) explored the operational limits of pain measurement in children. For item bank development, we identified existing items from measures in the literature, grouped them by concept, removed redundancies, and modified remaining items to match PROMIS formatting. New items were written as needed and cognitive debriefing was completed with children and their parents, resulting in 98 Pain Behavior (47 self, 51 proxy), 54 Quality and 4 Intensity items for further testing. Qualitative content analyses suggest that reportable pain outcomes that matter to children with pain are captured within and consistent with the pain domain framework in PROMIS. PMID:26335990
Measuring depression after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Depression item bank and linkage with PHQ-9.

PubMed

Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Bombardier, Charles H; Pohlig, Ryan T; Heinemann, Allen W; Carle, Adam; Choi, Seung W

2015-05-01

To develop a calibrated spinal cord injury-quality of life (SCI-QOL) item bank, computer adaptive test (CAT), and short form to assess depressive symptoms experienced by individuals with SCI, transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a crosswalk to the Patient Health Questionnaire (PHQ)-9. We used grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, item response theory (IRT) analyses, and statistical linking techniques to transform scores to a PROMIS metric and to provide a crosswalk with the PHQ-9. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. Spinal Cord Injury--Quality of Life (SCI-QOL) Depression Item Bank Individuals with SCI were involved in all phases of SCI-QOL development. A sample of 716 individuals with traumatic SCI completed 35 items assessing depression, 18 of which were PROMIS items. After removing 7 non-PROMIS items, factor analyses confirmed a unidimensional pool of items. We used a graded response IRT model to estimate slopes and thresholds for the 28 retained items. The SCI-QOL Depression measure correlated 0.76 with the PHQ-9. The SCI-QOL Depression item bank provides a reliable and sensitive measure of depressive symptoms with scores reported in terms of general population norms. We provide a crosswalk to the PHQ-9 to facilitate comparisons between measures. The item bank may be administered as a CAT or as a short form and is suitable for research and clinical applications.
Do impulsive individuals benefit more from food go/no-go training? Testing the role of inhibition capacity in the no-go devaluation effect.

PubMed

Chen, Zhang; Veling, Harm; Dijksterhuis, Ap; Holland, Rob W

2018-05-01

Not responding to food items in a go/no-go task can lead to devaluation of these food items, which may help people regulate their eating behavior. The Behavior Stimulus Interaction (BSI) theory explains this devaluation effect by assuming that inhibiting impulses triggered by appetitive foods elicits negative affect, which in turn devalues the food items. BSI theory further predicts that the devaluation effect will be stronger when food items are more appetitive and when individuals have low inhibition capacity. To test these hypotheses, we manipulated the appetitiveness of food items and measured individual inhibition capacity with the stop-signal task. Food items were consistently paired with either go or no-go cues, so that participants responded to go items and not to no-go items. Evaluations of these items were measured before and after go/no-go training. Across two preregistered experiments, we consistently found no-go foods were liked less after the training compared to both go foods and foods not used in the training. Unexpectedly, this devaluation effect occurred for both appetitive and less appetitive food items. Exploratory signal detection analyses suggest this latter finding might be explained by increased learning of stimulus-response contingencies for the less appetitive items when they are presented among appetitive items. Furthermore, the strength of devaluation did not consistently correlate with individual inhibition capacity, and Bayesian analyses combining data from both experiments provided moderate support for the null hypothesis. The current project demonstrated the devaluation effect induced by the go/no-go training, but failed to obtain further evidence for BSI theory. Since the devaluation effect was reliably obtained across experiments, the results do reinforce the notion that the go/no-go training is a promising tool to help people regulate their eating behavior. Copyright © 2017 Elsevier Ltd. All rights reserved.
Exercise barriers self-efficacy: development and validation of a subcale for individuals with cancer-related lymphedema.

PubMed

Buchan, Jena; Janda, Monika; Box, Robyn; Rogers, Laura; Hayes, Sandi

2015-03-18

No tool exists to measure self-efficacy for overcoming lymphedema-related exercise barriers in individuals with cancer-related lymphedema. However, an existing scale measures confidence to overcome general exercise barriers in cancer survivors. Therefore, the purpose of this study was to develop, validate and assess the reliability of a subscale, to be used in conjunction with the general barriers scale, for determining exercise barriers self-efficacy in individuals facing lymphedema-related exercise barriers. A lymphedema-specific exercise barriers self-efficacy subscale was developed and validated using a cohort of 106 cancer survivors with cancer-related lymphedema, from Brisbane, Australia. An initial ten-item lymphedema-specific barrier subscale was developed and tested, with participant feedback and principal components analysis results used to guide development of the final version. Validity and test-retest reliability analyses were conducted on the final subscale. The final lymphedema-specific subscale contained five items. Principal components analysis revealed these items loaded highly (>0.75) on a separate factor when tested with a well-established nine-item general barriers scale. The final five-item subscale demonstrated good construct and criterion validity, high internal consistency (Cronbach's alpha = 0.93) and test-retest reliability (ICC = 0.67, p < 0.01). A valid and reliable lymphedema-specific subscale has been developed to assess exercise barriers self-efficacy in individuals with cancer-related lymphedema. This scale can be used in conjunction with an existing general exercise barriers scale to enhance exercise adherence in this understudied patient group.
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

PubMed Central

Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

2014-01-01

We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930

7 CFR 3201.5 - Item designation.

Code of Federal Regulations, 2012 CFR

2012-01-01

..., including life cycle costs. USDA will gather information on individual products within an item and... these factors, USDA will use life cycle cost information only from tests using the BEES analytical...
7 CFR 3201.5 - Item designation.

Code of Federal Regulations, 2014 CFR

2014-01-01

..., including life cycle costs. USDA will gather information on individual products within an item and... these factors, USDA will use life cycle cost information only from tests using the BEES analytical...
7 CFR 3201.5 - Item designation.

Code of Federal Regulations, 2013 CFR

2013-01-01

..., including life cycle costs. USDA will gather information on individual products within an item and... these factors, USDA will use life cycle cost information only from tests using the BEES analytical...
Comparison of Male and Female Performance on the ATP Physics Test.

ERIC Educational Resources Information Center

Wheeler, Patricia; Harris, Abigail

This exploratory study on the College Board's Admissions Testing Program (ATP) Physics Test can be divided into two main parts, each designed to address a specific set of questions: Part I, Are there any systematic differences in male/female performance on individual items or subgroups of items that can help in interpreting the differences between…
Development and psychometric characteristics of the SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks and short forms and the SCI-QOL Bladder Complications scale.

PubMed

Tulsky, David S; Kisala, Pamela A; Tate, Denise G; Spungen, Ann M; Kirshblum, Steven C

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Bladder Management Difficulties and Bowel Management Difficulties item banks and Bladder Complications scale. Using a mixed-methods design, a pool of items assessing bladder and bowel-related concerns were developed using focus groups with individuals with spinal cord injury (SCI) and SCI clinicians, cognitive interviews, and item response theory (IRT) analytic approaches, including tests of model fit and differential item functioning. Thirty-eight bladder items and 52 bowel items were tested at the University of Michigan, Kessler Foundation Research Center, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters VA Medical Center, Bronx, NY. Seven hundred fifty-seven adults with traumatic SCI. The final item banks demonstrated unidimensionality (Bladder Management Difficulties CFI=0.965; RMSEA=0.093; Bowel Management Difficulties CFI=0.955; RMSEA=0.078) and acceptable fit to a graded response IRT model. The final calibrated Bladder Management Difficulties bank includes 15 items, and the final Bowel Management Difficulties item bank consists of 26 items. Additionally, 5 items related to urinary tract infections (UTI) did not fit with the larger Bladder Management Difficulties item bank but performed relatively well independently (CFI=0.992, RMSEA=0.050) and were thus retained as a separate scale. The SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks are psychometrically robust and are available as computer adaptive tests or short forms. The SCI-QOL Bladder Complications scale is a brief, fixed-length outcomes instrument for individuals with a UTI.
Physical performance testing in mucopolysaccharidosis I: a pilot study.

PubMed

Dumas, Helene M; Fragala, Maria A; Haley, Stephen M; Skrinar, Alison M; Wraith, James E; Cox, Gerald F

2004-01-01

To develop and field-test a physical performance measure (MPS-PPM) for individuals with Mucopolysaccharidosis I (MPS I), a rare genetic disorder. Motor performance and endurance items were developed based on literature review, clinician feedback, feasibility, and equipment and training needs. A standardized testing protocol and scoring rules were created. The MPS-PPM includes: Arm Function (7 items), Leg Function (5 items), and Endurance (2 items). Pilot data were collected for 10 subjects (ages 5-29 years). We calculated Spearman's rho correlations between age, severity and summary z-scores on the MPS-PPM. Subjects had variable presentations, as correlations among the three sub-test scores were not significant. Increasing age was related to greater severity in physical performance (r = 0.72, p<0.05) and lower scores on the Leg Function (r = -0.67, p<0.05) and Endurance (r = -0.65, p<0.05) sub-tests. The MPS-PPM was sensitive to detecting physical performance deficits, as six subjects could not complete the full battery of Arm Function items and eight subjects were unable to complete all Leg Function items. Subjects walked more slowly and expended more energy than typically developing peers. Individuals with MPS I have difficulty with arm and leg function and reduced endurance. The MPS-PPM is a clinically feasible measure that detects limitations in physical performance and may have potential to quantify changes in function following intervention. Copyright 2004 Taylor and Francis Ltd.
Developing an item bank to measure economic quality of life for individuals with disabilities.

PubMed

Tulsky, David S; Kisala, Pamela A; Lai, Jin-Shei; Carlozzi, Noelle; Hammel, Joy; Heinemann, Allen W

2015-04-01

To develop and evaluate the psychometric properties of an item set measuring economic quality of life (QOL) for use by individuals with disabilities. Survey. Community settings. Individuals with disabilities completed individual interviews (n=64), participated in focus groups (n=172), and completed cognitive interviews (n=15). Inclusion criteria included the following: traumatic brain injury, spinal cord injury, or stroke; age ≥18 years; and ability to read and speak English. We calibrated the items with 305 former rehabilitation inpatients. None. Economic QOL. Confirmatory factor analysis showed acceptable fit indices (comparative fit index=.939, root mean square error of approximation=.089) for the 37 items. However, 3 items demonstrated local item dependence. Dropping 9 items improved fit and obviated local dependence. Rasch analysis of the remaining 28 items yielded a person reliability of .92, suggesting that these items discriminate about 4 economic QOL levels. We developed a 28-item bank that measures economic aspects of QOL. Preliminary confirmatory factor analysis and Rasch analysis results support the psychometric properties of this new measure. It fills a gap in health-related QOL measurement by describing the economic barriers and facilitators of community participation. Future development will make the item bank available as a computer adaptive test. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form

PubMed Central

Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.

2015-01-01

Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967
Web-based computer adaptive assessment of individual perceptions of job satisfaction for hospital workplace employees

PubMed Central

2011-01-01

Background To develop a web-based computer adaptive testing (CAT) application for efficiently collecting data regarding workers' perceptions of job satisfaction, we examined whether a 37-item Job Content Questionnaire (JCQ-37) could evaluate the job satisfaction of individual employees as a single construct. Methods The JCQ-37 makes data collection via CAT on the internet easy, viable and fast. A Rasch rating scale model was applied to analyze data from 300 randomly selected hospital employees who participated in job-satisfaction surveys in 2008 and 2009 via non-adaptive and computer-adaptive testing, respectively. Results Of the 37 items on the questionnaire, 24 items fit the model fairly well. Person-separation reliability for the 2008 surveys was 0.88. Measures from both years and item-8 job satisfaction for groups were successfully evaluated through item-by-item analyses by using t-test. Workers aged 26 - 35 felt that job satisfaction was significantly worse in 2009 than in 2008. Conclusions A Web-CAT developed in the present paper was shown to be more efficient than traditional computer-based or pen-and-paper assessments at collecting data regarding workers' perceptions of job content. PMID:21496311
Development and psychometric evaluation of a health-related quality of life instrument for individuals with adult-onset hearing loss.

PubMed

Stika, Carren J; Hays, Ron D

2015-07-01

Self-reports of 'hearing handicap' are available, but a comprehensive measure of health-related quality of life (HRQOL) for individuals with adult-onset hearing loss (AOHL) does not exist. Our objective was to develop and evaluate a multidimensional HRQOL instrument for individuals with AOHL. The Impact of Hearing Loss Inventory Tool (IHEAR-IT) was developed using results of focus groups, a literature review, advisory expert panel input, and cognitive interviews. The 73-item field-test instrument was completed by 409 adults (22-91 years old) with varying degrees of AOHL and from different areas of the USA. Multitrait scaling analysis supported four multi-item scales and five individual items. Internal consistency reliabilities ranged from 0.93 to 0.96 for the scales. Construct validity was supported by correlations between the IHEAR-IT scales and scores on the 36-item Short Form Health Survey, version 2.0 (SF-36v2) mental composite summary (r = 0.32-0.64) and the Hearing Handicap Inventory for the Elderly/Adults (HHIE/HHIA) (r ≥ -0.70). The field test provides initial support for the reliability and construct validity of the IHEAR-IT for evaluating HRQOL of individuals with AOHL. Further research is needed to evaluate the responsiveness to change of the IHEAR-IT scales and identify items for a short-form.
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form.

PubMed

Kisala, Pamela A; Tulsky, David S; Choi, Seung W; Kirshblum, Steven C

2015-05-01

To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Pressure Ulcers scale. 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item "short form" and is available for both research and clinical practice.
Scales for assessing self-efficacy of nurses and assistants for preventing falls

PubMed Central

Dykes, Patricia C.; Carroll, Diane; McColgan, Kerry; Hurley, Ann C.; Lipsitz, Stuart R.; Colombo, Lisa; Zuyev, Lyubov; Middleton, Blackford

2011-01-01

Aim This paper is a report of the development and testing of the Self-Efficacy for Preventing Falls Nurse and Assistant scales. Background Patient falls and fall-related injuries are traumatic ordeals for patients, family members and providers, and carry a toll for hospitals. Self-efficacy is an important factor in determining actions persons take and levels of performance they achieve. Performance of individual caregivers is linked to the overall performance of hospitals. Scales to assess nurses and certified nursing assistants’ self-efficacy to prevent patients from falling would allow for targeting resources to increase SE, resulting in improved individual performance and ultimately decreased numbers of patient falls. Method Four phases of instrument development were carried out to (1) generate individual items from eight focus groups (four each nurse and assistant conducted in October 2007), (2) develop prototype scales, (3) determine content validity during a second series of four nurse and assistant focus groups (January 2008) and (4) conduct item analysis, paired t-tests, Student’s t-tests and internal consistency reliability to refine and confirm the scales. Data were collected during February–December, 2008. Results The 11-item Self-Efficacy for Preventing Falls Nurse had an alpha of 0·89 with all items in the range criterion of 0·3–0·7 for item total correlation. The 8-item Self-Efficacy for Preventing Falls Assistant had an alpha of 0·74 and all items had item total correlations in the 0·3–0·7 range. Conclusions The Self-Efficacy for Preventing Falls Nurse and Self-Efficacy for Preventing Falls Assistant scales demonstrated psychometric adequacy and are recommended to measure bedside staff’s self-efficacy beliefs in preventing patient falls. PMID:21073506
Analyzing Longitudinal Item Response Data via the Pairwise Fitting Method

ERIC Educational Resources Information Center

Fu, Zhi-Hui; Tao, Jian; Shi, Ning-Zhong; Zhang, Ming; Lin, Nan

2011-01-01

Multidimensional item response theory (MIRT) models can be applied to longitudinal educational surveys where a group of individuals are administered different tests over time with some common items. However, computational problems typically arise as the dimension of the latent variables increases. This is especially true when the latent variable…
Acquisition of generic memory in amnesia.

PubMed

Verfaellie, M; Cermak, L S

1994-06-01

Amnesic patients' ability to acquire generic, semantic information was assessed relative to their own level of episodic memory. Patients studied a list of words in which some items were presented twice and others once. Upon each presentation, the words were tagged episodically by presenting them in a unique color. Recall of the colors in which words were presented suggested that individual presentations of repeated items were less likely to be recalled than presentations of nonrepeated items; however, actual recall of repeated items exceeded that of nonrepeated items. This outcome demonstrated that amnesics can recall some items generically without recalling either of their individual presentations. However, amnesics' recall of twice-presented items remained far below that of the control group, even when their recall of once-presented items was matched by testing the control group after a delay. This finding suggests that amnesic patients can acquire new generic knowledge but do so much less efficiently than do normal individuals. Furthermore, this deficit occurs independently of the amnesics' episodic memory impairments, reflecting instead a disruption in semantic learning per se.
A knowledge-based theory of rising scores on "culture-free" tests.

PubMed

Fox, Mark C; Mitchum, Ainsley L

2013-08-01

Secular gains in intelligence test scores have perplexed researchers since they were documented by Flynn (1984, 1987). Gains are most pronounced on abstract, so-called culture-free tests, prompting Flynn (2007) to attribute them to problem-solving skills availed by scientifically advanced cultures. We propose that recent-born individuals have adopted an approach to analogy that enables them to infer higher level relations requiring roles that are not intrinsic to the objects that constitute initial representations of items. This proposal is translated into item-specific predictions about differences between cohorts in pass rates and item-response patterns on the Raven's Matrices (Flynn, 1987), a seemingly culture-free test that registers the largest Flynn effect. Consistent with predictions, archival data reveal that individuals born around 1940 are less able to map objects at higher levels of relational abstraction than individuals born around 1990. Polytomous Rasch models verify predicted violations of measurement invariance, as raw scores are found to underestimate the number of analogical rules inferred by members of the earlier cohort relative to members of the later cohort who achieve the same overall score. The work provides a plausible cognitive account of the Flynn effect, furthers understanding of the cognition of matrix reasoning, and underscores the need to consider how test-takers select item responses. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Item-Level Effects of the Read-Aloud Accommodation for Students with Reading Disabilities

ERIC Educational Resources Information Center

Bolt, Sara E.; Thurlow, Martha L.

2007-01-01

Research support for providing a read-aloud accommodation (i.e., having an individual read test items and directions aloud) to students with disabilities has been somewhat limited, particularly when merely examining effects of the accommodation on overall test scores for general groups of students with disabilities. We examined data on…
Using Person Response Functions to Investigate Areas of Person Misfit Related to Item Characteristics

ERIC Educational Resources Information Center

Walker, A. Adrienne; Jennings, Jeremy Kyle; Engelhard, George, Jr.

2018-01-01

Individual person fit analyses provide important information regarding the validity of test score inferences for an "individual" test taker. In this study, we use data from an undergraduate statistics test (N = 1135) to illustrate a two-step method that researchers and practitioners can use to examine individual person fit. First, person…
Awareness and the Effect of Rate Rehearsal on Free Recall

ERIC Educational Resources Information Center

Kestner, Jane; Walter, Donald A.

1977-01-01

The effect of time of awareness of a subsequent test of recall and the relationship of that awareness and rote rehearsal were studied by telling subjects which specific items to encode before the item's presentation (prior instructions) or after its rehearsal (postrehearsal instructions) and by varying rehearsal intervals for individual items.…
Detection of Q-Matrix Misspecification Using Two Criteria for Validation of Cognitive Structures under the Least Squares Distance Model

ERIC Educational Resources Information Center

Romero, Sonia J.; Ordoñez, Xavier G.; Ponsoda, Vincente; Revuelta, Javier

2014-01-01

Cognitive Diagnostic Models (CDMs) aim to provide information about the degree to which individuals have mastered specific attributes that underlie the success of these individuals on test items. The Q-matrix is a key element in the application of CDMs, because contains links item-attributes representing the cognitive structure proposed for solve…
On the Role of Individual Items in Recognition Memory and Metacognition: Challenges for Signal Detection Theory

ERIC Educational Resources Information Center

Busey, Thomas A.; Arici, Anne

2009-01-01

The authors tested the role of individual items in recognition memory using a forced-choice paradigm with face stimuli. They constructed distractor stimuli using morphing procedures that were similar to two parent faces and then compared a studied morph against an unstudied morph that was similar to two studied parents. The similarity of the…

Diagnostic accuracy research in glaucoma is still incompletely reported: An application of Standards for Reporting of Diagnostic Accuracy Studies (STARD) 2015.

PubMed

Michelessi, Manuele; Lucenteforte, Ersilia; Miele, Alba; Oddone, Francesco; Crescioli, Giada; Fameli, Valeria; Korevaar, Daniël A; Virgili, Gianni

2017-01-01

Research has shown a modest adherence of diagnostic test accuracy (DTA) studies in glaucoma to the Standards for Reporting of Diagnostic Accuracy Studies (STARD). We have applied the updated 30-item STARD 2015 checklist to a set of studies included in a Cochrane DTA systematic review of imaging tools for diagnosing manifest glaucoma. Three pairs of reviewers, including one senior reviewer who assessed all studies, independently checked the adherence of each study to STARD 2015. Adherence was analyzed on an individual-item basis. Logistic regression was used to evaluate the effect of publication year and impact factor on adherence. We included 106 DTA studies, published between 2003-2014 in journals with a median impact factor of 2.6. Overall adherence was 54.1% for 3,286 individual rating across 31 items, with a mean of 16.8 (SD: 3.1; range 8-23) items per study. Large variability in adherence to reporting standards was detected across individual STARD 2015 items, ranging from 0 to 100%. Nine items (1: identification as diagnostic accuracy study in title/abstract; 6: eligibility criteria; 10: index test (a) and reference standard (b) definition; 12: cut-off definitions for index test (a) and reference standard (b); 14: estimation of diagnostic accuracy measures; 21a: severity spectrum of diseased; 23: cross-tabulation of the index and reference standard results) were adequately reported in more than 90% of the studies. Conversely, 10 items (3: scientific and clinical background of the index test; 11: rationale for the reference standard; 13b: blinding of index test results; 17: analyses of variability; 18; sample size calculation; 19: study flow diagram; 20: baseline characteristics of participants; 28: registration number and registry; 29: availability of study protocol; 30: sources of funding) were adequately reported in less than 30% of the studies. Only four items showed a statistically significant improvement over time: missing data (16), baseline characteristics of participants (20), estimates of diagnostic accuracy (24) and sources of funding (30). Adherence to STARD 2015 among DTA studies in glaucoma research is incomplete, and only modestly increasing over time.
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form

PubMed Central

Kisala, Pamela A.; Tulsky, David S.; Choi, Seung W.; Kirshblum, Steven C.

2015-01-01

Objective To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Design Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Pressure Ulcers scale. Results 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. Conclusions The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item “short form” and is available for both research and clinical practice. PMID:26010965
An NCME Instructional Module on Booklet Designs in Large-Scale Assessments of Student Achievement: Theory and Practice

ERIC Educational Resources Information Center

Frey, Andreas; Hartig, Johannes; Rupp, Andre A.

2009-01-01

In most large-scale assessments of student achievement, several broad content domains are tested. Because more items are needed to cover the content domains than can be presented in the limited testing time to each individual student, multiple test forms or booklets are utilized to distribute the items to the students. The construction of an…
Development and Psychometric Evaluation of a Health-Related Quality of Life Instrument for Individuals with Adult-Onset Hearing Loss

PubMed Central

Stika, Carren J.; Hays, Ron D.

2016-01-01

Objective Self-reports of “hearing handicap” are available, but a comprehensive measure of health-related quality of life (HRQOL) for individuals with adult-onset hearing loss (AOHL) does not exist. Our objective was to develop and evaluate a multidimensional HRQOL instrument for individuals with AOHL. Design The Impact of Hearing Loss Inventory Tool (IHEAR-IT) was developed using results of focus groups, a literature review, Advisory Expert Panel input, and cognitive interviews. Study Sample The 73-item field-test instrument was completed by 409 adults (22-91 years old) with varying degrees of AOHL and from different areas of the US. Results Multitrait scaling analysis supported four multi-item scales and five individual items. Internal consistency reliabilities ranged from 0.93 to 0.96 for the scales. Construct validity was supported by correlations between the IHEAR-IT scales and scores on the 36-Item Short Form Health Survey, Version 2.0 (SF-36v2) Mental Composite Summary (r’s = 0.32 – 0.64) and the Hearing Handicap Inventory for the Elderly/Adults (HHIE/HHIA) (r’s > −0.70). Conclusions The field test provide initial support for the reliability and construct validity of the IHEAR-IT for evaluating HRQOL of individuals with AOHL. Further research is needed to evaluate the responsiveness to change of the IHEAR-IT scales and identify items for a short-form. PMID:27104754
The Development, Implementation, and Evaluation of a Computer-Assisted Branched Test for a Program of Individually Prescribed Instruction.

ERIC Educational Resources Information Center

Ferguson, Richard L.

The focus of this study was upon the development and evaluation of a computer-assisted branched test to be used in making instructional decisions for individuals in the program of Individually Prescribed Instruction. A Branched Test is one in which the presentation of test items is contingent upon the previous responses of the examinee. The…
Remote memory as a function of age and sex.

PubMed

Storandt, M; Grant, E A; Gordon, B C

1978-10-01

Memory for events which occurred between 1910 and 1969 was examined in individuals ranging in age from 20 to 80 years. Two types of events were included: Those which represented happenings of historical significance and those which dealt with the entertainment world of the past. Men were found to recall historical items significantly better than women, while entertainment items were equally well recalled by the two sexes. Age of peak memory for past events from the entertainment world increased with the age of the item; individuals seemed to remember best those events which occurred in their youth or young adulthood. This pattern was not replicated with respect to the historical current events items; however, these items may be a biased test of remote memory in women.
Validation of the Erlangen Test of Activities of Daily Living in Persons with Mild Dementia or Mild Cognitive Impairment (ETAM).

PubMed

Luttenberger, Katharina; Reppermund, Simone; Schmiedeberg-Sohn, Anke; Book, Stephanie; Graessel, Elmar

2016-05-26

There are currently no valid, fast, and easy-to-administer performance tests that are designed to assess the capacities to perform activities of daily living in persons with mild dementia and mild cognitive impairment (MCI). However, such measures are urgently needed for determining individual support needs as well as the efficacy of interventions. The aim of the present study was therefore to validate the Erlangen Test of Activities of Daily Living in Persons with Mild Dementia and Mild Cognitive Impairment (ETAM), a performance test that is based on the International Classification of Functioning and Health (ICF), which assesses the relevant domains of living in older adults with MCI and mild dementia who live independently. The 10 ICF-based items on the research version of the ETAM were tested in a final sample of 81 persons with MCI or mild dementia. The items were selected for the final version in accordance with 6 criteria: 1) all domains must be represented and have equal weight, 2) all items must load on the same factor, 3) item difficulties and item discriminatory powers, 4) convergent validity (Bayer Activities of Daily Living Scale [B-ADL]) and discriminant validity (Mini Mental State Examination [MMSE], Geriatric Depression Scale 15 [GDS-15]), 5) inter-rater reliabilities of the individual items, 6) as little material as possible. Retest reliability was also examined. Cohen's ds were calculated to determine the magnitudes of the differences in ETAM scores between participants diagnosed with different grades of severity of cognitive impairment. The final version of the ETAM consists of 6 items that cover the five ICF domains communication, mobility, self-care, domestic life (assessed by two 3-point items), and major life areas (specifically, the economic life sub-category) and load on a single factor. The maximum achievable score is 30 points (6 points per domain). The average administration time was 35 min, 19 of which were needed for pure item performance. The internal consistency was α = .71. The three-week test-retest reliability was r = .78, and the inter-rater reliability was r = .97. The ETAM also provided satisfactory discrimination between healthy individuals and persons with MCI or mild dementia as well as between persons with mild and moderate dementia. The 6-item final version of the ETAM shows satisfactory psychometric characteristics and can be administered quickly. It is therefore suitable for use in both clinical practice and research.
Construction and validation of the fatigue impact and severity self-assessment for youth and young adults with cerebral palsy.

PubMed

Brunton, Laura K; Bartlett, Doreen J

2017-07-01

The Fatigue Impact and Severity Self-Assessment (FISSA) was created to assess the impact, severity, and self-management of fatigue for individuals with cerebral palsy (CP) aged 14-31 years. Items were generated from a review of measures and interviews with individuals with CP. Focus groups with health-care professionals were used for item reduction. A mailed survey was conducted (n=163/367) to assess the factor structure, known-groups validity, and test-retest reliability. The final measure contained 31 items in two factors and discriminated between individuals expected to have different levels of fatigue. Individuals with more functional abilities reported less fatigue (p < 0.002) and those with higher pain reported higher fatigue (p < 0.001). The FISSA was shown to have adequate test-retest reliability, intraclass correlation coefficient (ICC)(3,1)=0.74 (95% confidence interval [CI] 0.53-0.87). The FISSA valid and reliable for individuals with CP. It allows for identification of the activities that may be compromised by fatigue to enhance collaborative goal setting and intervention planning.
Item-Level Effects of the Read-Aloud Accommodation for Students with Reading Disabilities. Synthesis Report 65

ERIC Educational Resources Information Center

Bolt, Sara E.; Thurlow, Martha L.

2006-01-01

Research support for providing a read-aloud accommodation (i.e., having an individual read test items and directions aloud) to students with disabilities has been somewhat limited, particularly when merely examining effects of the accommodation on overall test scores for general groups of students with disabilities. The authors examined data on…
An Experimental Analysis of Memory Processing

PubMed Central

Wright, Anthony A

2007-01-01

Rhesus monkeys were trained and tested in visual and auditory list-memory tasks with sequences of four travel pictures or four natural/environmental sounds followed by single test items. Acquisitions of the visual list-memory task are presented. Visual recency (last item) memory diminished with retention delay, and primacy (first item) memory strengthened. Capuchin monkeys, pigeons, and humans showed similar visual-memory changes. Rhesus learned an auditory memory task and showed octave generalization for some lists of notes—tonal, but not atonal, musical passages. In contrast with visual list memory, auditory primacy memory diminished with delay and auditory recency memory strengthened. Manipulations of interitem intervals, list length, and item presentation frequency revealed proactive and retroactive inhibition among items of individual auditory lists. Repeating visual items from prior lists produced interference (on nonmatching tests) revealing how far back memory extended. The possibility of using the interference function to separate familiarity vs. recollective memory processing is discussed. PMID:18047230
Influence of dominant- as compared with nondominant-side symptoms on Disabilities of the Arm, Shoulder and Hand and Western Ontario Rotator Cuff scores in patients with rotator cuff tendinopathy.

PubMed

Christiansen, David Høyrup; Michener, Lori; Roy, Jean-Sébastien

2018-02-13

The Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire and the Western Ontario Rotator Cuff (WORC) index are 2 widely used patient-reported questionnaires in individuals with rotator cuff (RC) tendinopathy. In contrast to the WORC index, for which the items are specific to the affected shoulder, the items of the DASH questionnaire assess the ability to perform activities regardless of the arm used. The objective of this study is to determine whether scores on the DASH questionnaire and WORC index are affected if the symptoms are on the dominant or nondominant side in individuals with RC tendinopathy. Given the number of items that can be influenced by dominance, the hypothesis is that DASH scores will be impacted by the side of the symptoms. Individuals with RC tendinopathy (N = 149) completed questions on symptomatology and hand dominance, the DASH questionnaire, and the WORC index. Differences in total scores (independent t test) and single items (Wilcoxon rank sum test) were compared between groups of participants with dominant-side symptoms and those without dominant-side symptoms. No significant differences were observed for WORC or DASH total scores when comparing participants with and without symptoms on their dominant side. Single-item comparison revealed more items being affected by symptom side on the DASH questionnaire (6 of 30 items) than on the WORC index (2 of 21 items). The side of the symptoms does not influence the DASH and WORC total scores, as there are no systematic differences between individuals with and without symptoms in their dominant shoulder. However, the presence of dominant symptoms does influence item scores more on the DASH questionnaire than on the WORC index. Copyright © 2018 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
An Item Response Analysis of the Motor and Behavioral Subscales of the Unified Huntington's Disease Rating Scale in Huntington Disease Gene Expansion Carriers

PubMed Central

Vaccarino, Anthony L.; Anderson, Karen; Borowsky, Beth; Duff, Kevin; Giuliano, Joseph; Guttman, Mark; Ho, Aileen K.; Orth, Michael; Paulsen, Jane S.; Sills, Terrence; van Kammen, Daniel P.; Evans, Kenneth R.

2011-01-01

Although the Unified Huntington's Disease Rating Scale (UHDRS) is widely used in the assessment of Huntington disease (HD), the ability of individual items to discriminate individual differences in motor or behavioral manifestations has not been extensively studied in HD gene expansion carriers without a motor-defined clinical diagnosis (i.e., prodromal-HD or prHD). To elucidate the relationship between scores on individual motor and behavioral UHDRS items and total score for each subscale, a non-parametric item response analysis was performed on retrospective data from two multicentre, longitudinal studies. Motor and Behavioral assessments were supplied for 737 prHD individuals with data from 2114 visits (PREDICT-HD) and 686 HD individuals with data from 1482 visits (REGISTRY). Option characteristic curves were generated for UHDRS subscale items in relation to their subscale score. In prHD, overall severity of motor signs was low and participants had scores of 2 or above on very few items. In HD, motor items that assessed ocular pursuit, saccade initiation, finger tapping, tandem walking, and to a lesser extent saccade velocity, dysarthia, tongue protrusion, pronation/supination, Luria, bradykinesia, choreas, gait and balance on the retropulsion test were found to discriminate individual differences across a broad range of motor severity. In prHD, depressed mood, anxiety, and irritable behavior demonstrated good discriminative properties. In HD, depressed mood demonstrated a good relationship with the overall behavioral score. These data suggest that at least some UHDRS items appear to have utility across a broad range of severity, although many items demonstrate problematic features. PMID:21370269
An item response analysis of the motor and behavioral subscales of the unified Huntington's disease rating scale in huntington disease gene expansion carriers.

PubMed

Vaccarino, Anthony L; Anderson, Karen; Borowsky, Beth; Duff, Kevin; Giuliano, Joseph; Guttman, Mark; Ho, Aileen K; Orth, Michael; Paulsen, Jane S; Sills, Terrence; van Kammen, Daniel P; Evans, Kenneth R

2011-04-01

Although the Unified Huntington's Disease Rating Scale (UHDRS) is widely used in the assessment of Huntington disease (HD), the ability of individual items to discriminate individual differences in motor or behavioral manifestations has not been extensively studied in HD gene expansion carriers without a motor-defined clinical diagnosis (ie, prodromal-HD or prHD). To elucidate the relationship between scores on individual motor and behavioral UHDRS items and total score for each subscale, a nonparametric item response analysis was performed on retrospective data from 2 multicenter longitudinal studies. Motor and behavioral assessments were supplied for 737 prHD individuals with data from 2114 visits (PREDICT-HD) and 686 HD individuals with data from 1482 visits (REGISTRY). Option characteristic curves were generated for UHDRS subscale items in relation to their subscale score. In prHD, overall severity of motor signs was low, and participants had scores of 2 or above on very few items. In HD, motor items that assessed ocular pursuit, saccade initiation, finger tapping, tandem walking, and to a lesser extent, saccade velocity, dysarthria, tongue protrusion, pronation/supination, Luria, bradykinesia, choreas, gait, and balance on the retropulsion test were found to discriminate individual differences across a broad range of motor severity. In prHD, depressed mood, anxiety, and irritable behavior demonstrated good discriminative properties. In HD, depressed mood demonstrated a good relationship with the overall behavioral score. These data suggest that at least some UHDRS items appear to have utility across a broad range of severity, although many items demonstrate problematic features. Copyright © 2011 Movement Disorder Society.
Measurement properties of the Spinal Cord Injury-Functional Index (SCI-FI) short forms.

PubMed

Heinemann, Allen W; Dijkers, Marcel P; Ni, Pengsheng; Tulsky, David S; Jette, Alan

2014-07-01

To evaluate the psychometric properties of the Spinal Cord Injury-Functional Index (SCI-FI) short forms (basic mobility, self-care, fine motor, ambulation, manual wheelchair, and power wheelchair) based on internal consistency; correlations between short forms banks, full item bank forms, and a 10-item computer adaptive test version; magnitude of ceiling and floor effects; and test information functions. Cross-sectional cohort study. Six rehabilitation hospitals in the United States. Individuals with traumatic spinal cord injury (N=855) recruited from 6 national Spinal Cord Injury Model Systems facilities. Not applicable. SCI-FI full item bank, 10-item computer adaptive test, and parallel short form scores. The SCI-FI short forms (with separate versions for individuals with paraplegia and tetraplegia) demonstrate very good internal consistency, group-level reliability, excellent correlations between short forms and scores based on the total item bank, and minimal ceiling and floor effects (except ceiling effects for persons with paraplegia on self-care, fine motor, and power wheelchair ability and floor effects for persons with tetraplegia on self-care, fine motor, and manual wheelchair ability). The test information functions are acceptable across the range of scores where most persons in the sample performed. Clinicians and researchers should consider the SCI-FI short forms when computer adaptive testing is not feasible. Copyright © 2014 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
An Analysis of the Individual Effects of Sex Bias.

ERIC Educational Resources Information Center

Smith, Richard M.

Most attempts to correct for the presence of biased test items in a measurement instrument have been either to remove the items or to adjust the scores to correct for the bias. Using the Rasch Dichotomous Response Model and the independent ability estimates derived from three sets of items, those which favor females, those which favor males, and…
Item analysis of three Spanish naming tests: a cross-cultural investigation.

PubMed

Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
Qualitative Evaluation of Pediatric Pain Behavior, Quality, and Intensity Item Candidates and the PROMIS Pain Domain Framework in Children With Chronic Pain.

PubMed

Jacobson, C Jeffrey; Kashikar-Zuck, Susmita; Farrell, Jennifer; Barnett, Kimberly; Goldschneider, Ken; Dampier, Carlton; Cunningham, Natoshia; Crosby, Lori; DeWitt, Esi Morgan

2015-12-01

As initial steps in a broader effort to develop and test pediatric pain behavior and pain quality item banks for the Patient-Reported Outcomes Measurement Information System (PROMIS), we used qualitative interview and item review methods to 1) evaluate the overall conceptual scope and content validity of the PROMIS pain domain framework among children with chronic/recurrent pain conditions, and 2) develop item candidates for further psychometric testing. To elicit the experiential and conceptual scope of pain outcomes across a variety of pediatric recurrent/chronic pain conditions, we conducted 32 semi-structured individual and 2 focus-group interviews with children and adolescents (8-17 years), and 32 individual and 2 focus-group interviews with parents of children with pain. Interviews with pain experts (10) explored the operational limits of pain measurement in children. For item bank development, we identified existing items from measures in the literature, grouped them by concept, removed redundancies, and modified the remaining items to match PROMIS formatting. New items were written as needed and cognitive debriefing was completed with the children and their parents, resulting in 98 pain behavior (47 self, 51 proxy), 54 quality, and 4 intensity items for further testing. Qualitative content analyses suggest that reportable pain outcomes that matter to children with pain are captured within and consistent with the pain domain framework in PROMIS. PROMIS pediatric pain behavior, quality, and intensity items were developed based on a theoretical framework of pain that was evaluated by multiple stakeholders in the measurement of pediatric pain, including researchers, clinicians, and children with pain and their parents, and the appropriateness of the framework was verified. Copyright © 2015 American Pain Society. Published by Elsevier Inc. All rights reserved.
Development and preliminary evaluation of a music-based attention assessment for patients with traumatic brain injury.

PubMed

Jeong, Eunju; Lesiuk, Teresa L

2011-01-01

Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory.

PubMed

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.
Investigating diagnostic bias in autism spectrum conditions: An item response theory analysis of sex bias in the AQ-10.

PubMed

Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie

2017-05-01

Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.

The effects of initial testing on false recall and false recognition in the social contagion of memory paradigm.

PubMed

Huff, Mark J; Davis, Sara D; Meade, Michelle L

2013-08-01

In three experiments, participants studied photographs of common household scenes. Following study, participants completed a category-cued recall test without feedback (Exps. 1 and 3), a category-cued recall test with feedback (Exp. 2), or a filler task (no-test condition). Participants then viewed recall tests from fictitious previous participants that contained erroneous items presented either one or four times, and then completed final recall and source recognition tests. The participants in all conditions reported incorrect items during final testing (a social contagion effect), and across experiments, initial testing had no impact on false recall of erroneous items. However, on the final source-monitoring recognition test, initial testing had a protective effect against false source recognition: Participants who were initially tested with and without feedback on category-cued initial tests attributed fewer incorrect items to the original event on the final source-monitoring recognition test than did participants who were not initially tested. These data demonstrate that initial testing may protect individuals' memories from erroneous suggestions.
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.

PubMed

Gibbons, C J; Skevington, S M

2018-04-01

Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
Belief-bias reasoning in non-clinical delusion-prone individuals.

PubMed

Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R

2017-03-01

It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
Belief-bias reasoning in non-clinical delusion-prone individuals.

PubMed

Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R

2017-09-01

It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
Development and initial psychometric evaluation of an item bank created to measure upper extremity function in persons with stroke.

PubMed

Higgins, Johanne; Finch, Lois E; Kopec, Jacek; Mayo, Nancy E

2010-02-01

To create and illustrate the development of a method to parsimoniously and hierarchically assess upper extremity function in persons after stroke. Data were analyzed using Rasch analysis. Re-analysis of data from 8 studies involving persons after stroke. Over 4000 patients with stroke who participated in various studies in Montreal and elsewhere in Canada. Data comprised 17 tests or indices of upper extremity function and health-related quality of life, for a total of 99 items related to upper extremity function. Tests and indices included, among others, the Box and Block Test, the Nine-Hole Peg Test and the Stroke Impact Scale. Data were collected at various times post-stroke from 3 days to 1 year. Once the data fit the model, a bank of items measuring upper extremity function with persons and items organized hierarchically by difficulty and ability in log units was produced. This bank forms the basis for eventual computer adaptive testing. The calibration of the items should be tested further psychometrically, as should the interpretation of the metric arising from using the item calibration to measure the upper extremity of individuals.
Psychometric properties of the Chinese version of resilience scale specific to cancer: an item response theory analysis.

PubMed

Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong

2018-06-01

Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Early blindness alters the spatial organization of verbal working memory.

PubMed

Bottini, Roberto; Mattioni, Stefania; Collignon, Olivier

2016-10-01

Several studies suggest that serial order in working memory (WM) is grounded on space. For a list of ordered items held in WM, items at the beginning of the list are associated with the left side of space and items at the end of the list with the right side. This suggests that maintaining items in verbal WM is performed in strong analogy to writing these items down on a physical whiteboard for later consultation (The Mental Whiteboard Hypothesis). What drives this spatial mapping of ordered series in WM remains poorly understood. In the present study we tested whether visual experience is instrumental in establishing the link between serial order in WM and spatial processing. We tested early blind (EB), late blind (LB) and sighted individuals in an auditory WM task. Replicating previous studies, left-key responses were faster for early items in the list whereas later items facilitated right-key responses in the sighted group. The same effect was observed in LB individuals. In contrast, EB participants did not show any association between space and serial position in WM. These results suggest that early visual experience plays a critical role in linking ordered items in WM and spatial representations. The analogical spatial structure of WM may depend in part on the actual experience of using spatially organized devices (e.g., notes, whiteboards) to offload WM. These practices are largely precluded to EB individuals, who instead rely to mnemonic devices that are less spatially organized (e.g., recordings, vocal notes). The way we habitually organize information in the external world may bias the way we organize information in our WM. Copyright © 2016 Elsevier Ltd. All rights reserved.
Development and validation of the positive affect and well-being scale for the neurology quality of life (Neuro-QOL) measurement system.

PubMed

Salsman, John M; Victorson, David; Choi, Seung W; Peterman, Amy H; Heinemann, Allen W; Nowinski, Cindy; Cella, David

2013-11-01

To develop and validate an item-response theory-based patient-reported outcomes assessment tool of positive affect and well-being (PAW). This is part of a larger NINDS-funded study to develop a health-related quality of life measurement system across major neurological disorders, called Neuro-QOL. Informed by a literature review and qualitative input from clinicians and patients, item pools were created to assess PAW concepts. Items were administered to a general population sample (N = 513) and a group of individuals with a variety of neurologic conditions (N = 581) for calibration and validation purposes, respectively. A 23-item calibrated bank and a 9-item short form of PAW was developed, reflecting components of positive affect, life satisfaction, or an overall sense of purpose and meaning. The Neuro-QOL PAW measure demonstrated sufficient unidimensionality and displayed good internal consistency, test-retest reliability, model fit, convergent and discriminant validity, and responsiveness. The Neuro-QOL PAW measure was designed to aid clinicians and researchers to better evaluate and understand the potential role of positive health processes for individuals with chronic neurological conditions. Further psychometric testing within and between neurological conditions, as well as testing in non-neurologic chronic diseases, will help evaluate the generalizability of this new tool.
Development of Self-Report Measures of Social Attitudes that Act as Environmental Barriers and Facilitators for People with Disabilities

PubMed Central

Garcia, Sofia F.; Hahn, Elizabeth A.; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W.

2014-01-01

Objective To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. Design A mixed methods approach included a literature review; item classification, selection and writing; cognitive interviews and field testing with participants with spinal cord injury (SCI), traumatic brain injury (TBI) or stroke; and rating scale analysis to evaluate initial psychometric properties. Setting General community. Participants Nine individuals with SCI, TBI or stroke participated in cognitive interviews; 305 community residents with those same conditions participated in field testing. Interventions None. Main Outcome Measure(s) Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. Results An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing 82 items. Field test data indicated that the pool satisfies a one-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Conclusions Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample in order to develop a social attitudes item bank for persons with disabilities. PMID:25045803
Development of self-report measures of social attitudes that act as environmental barriers and facilitators for people with disabilities.

PubMed

Garcia, Sofia F; Hahn, Elizabeth A; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W

2015-04-01

To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. A mixed-methods approach included a literature review; item classification, selection, and writing; cognitive interviews and field testing of participants with spinal cord injury (SCI), traumatic brain injury (TBI), or stroke; and rating scale analysis to evaluate initial psychometric properties. General community. Individuals with SCI, TBI, or stroke participated in cognitive interviews (n=9); community residents with those same conditions participated in field testing (n=305). None. Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing of 82 items. Field test data indicated that the pool satisfies a 1-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample to develop a social attitudes item bank for persons with disabilities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
A Classical Test Theory Analysis of the Light and Spectroscopy Concept Inventory National Study Data Set

ERIC Educational Resources Information Center

Schlingman, Wayne M.; Prather, Edward E.; Wallace, Colin S.; Brissenden, Gina; Rudolph, Alexander L.

2012-01-01

This paper is the first in a series of investigations into the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI). In this paper, we use classical test theory to form a framework of results that will be used to evaluate individual item difficulties, item discriminations, and the overall reliability of the…
Analyzing force concept inventory with item response theory

NASA Astrophysics Data System (ADS)

Wang, Jing; Bao, Lei

2010-10-01

Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

PubMed

Smolen, Tomasz; Chuderski, Adam

2015-01-01

Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
The Hierarchical Rater Model for Rated Test Items and Its Application to Large-Scale Educational Assessment Data.

ERIC Educational Resources Information Center

Patz, Richard J.; Junker, Brian W.; Johnson, Matthew S.; Mariano, Louis T.

2002-01-01

Discusses the hierarchical rater model (HRM) of R. Patz (1996) and shows how it can be used to scale examinees and items, model aspects of consensus among raters, and model individual rater severity and consistency effects. Also shows how the HRM fits into the generalizability theory framework. Compares the HRM to the conventional item response…
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION

PubMed Central

de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

ERIC Educational Resources Information Center

Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

2015-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is…
Psychometric Properties and Performance of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Depression Short Forms in Ethnically Diverse Groups

PubMed Central

Teresi, Jeanne A.; Ocepek-Welikson, Katja; Kleinman, Marjorie; Ramirez, Mildred; Kim, Giyeon

2017-01-01

Short form measures from the Patient Reported Outcomes Measurement Information System® (PROMIS®) are used widely. The present study was among the first to examine differential item functioning (DIF) in the PROMIS Depression short form scales in a sample of over 5000 racially/ethnically diverse patients with cancer. DIF analyses were conducted across different racial/ethnic, educational, age, gender and language groups. Methods DIF hypotheses, generated by content experts, informed the evaluation of the DIF analyses. The graded item response theory (IRT) model was used to evaluate the five-level ordinal items. The primary tests of DIF were Wald tests; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude was evaluated using expected item score functions, and the non-compensatory differential item functioning (NCDIF) and T1 indexes, both based on group differences in the item curves. Aggregate impact was evaluated with expected scale score (test) response functions; individual impact was assessed through examination of differences in DIF adjusted and unadjusted depression estimates. Results Many items evidenced DIF; however, only a few had slightly elevated magnitude. No items evidenced salient DIF with respect to NCDIF and the scale-level impact was minimal for all group comparisons. The following short form items might be targeted for further study because they were also hypothesized to evidence DIF. One item showed slightly higher magnitude of DIF for age: nothing to look forward to; conditional on depression, this item was more likely to be endorsed in the depressed direction by individuals in older groups as contrasted with the cohort aged 21 to 49. This item was also hypothesized to show age DIF. Only one item (failure) showed DIF of slightly higher magnitude (just above threshold) for Whites vs. Asians/Pacific Islanders in the direction of higher likelihood of endorsement for Asians/Pacific Islanders. This item was also hypothesized to show DIF for minority groups. The impact of DIF was negligible. Conditional on depression, the items, worthless and hopeless were more likely to be endorsed in the depressed direction by respondents with less than high school education vs. those with a graduate degree; the magnitude of DIF was slightly above the T1 threshold, but not that of NCDIF. These items were also hypothesized to show DIF in the direction of more feelings of worthlessness by groups with lower education. While the magnitude and aggregate impact of DIF was small, in a few instances, individual impact was observed. Information provided was relatively high, particularly in the middle upper (depressed) tail of the distribution. Reliability estimates were high (> 0.90) across all studied groups, regardless of estimation method. Conclusions This was the first study to evaluate measurement equivalence of the PROMIS Depression short forms across large samples of ethnically diverse groups. There were few items with DIF, and none of high magnitude, thus supporting the use of PROMIS Depression short form measures across such groups. These results could be informative for those using the short forms in minority populations or clinicians evaluating individuals with the depression short forms. PMID:28553573
Quality of life and swallowing questionnaire for individuals with Parkinson's disease: development and validation.

PubMed

Diniz, Juliana Garcia; da Silva, Alfredo Carlos; Nóbrega, Ana Caline

2018-05-21

Individuals with Parkinson's disease (PD) may exhibit some degree of change in swallowing dynamics during the course of the disease. These changes can affect their physical, functional and emotional quality of life. To develop a quality of life and swallowing questionnaire for individuals with PD. The first version of the questionnaire comprised 29 items taken from the accounts of 50 patients treated over a 2-month period at Sarah Hospital in Salvador, Bahia, Brazil. A committee of 10 experts in the field analyzed the content and reduced the questionnaire to 28 questions. The questionnaire was then administered to 140 PD patients and 47 healthy individuals. A factor analysis of the items guided the drafting of the final questionnaire, which consisted of 19 items grouped into four factors, encompassing physical, functional and emotional aspects. A test-retest assessment was conducted with 44 individuals with PD. The internal consistency, estimated by the mean of Cronbach's alpha coefficient, varied between 0.71 (domain 3) and 0.94 (domain 1) in the test and between 0.69 (domain 3) and 0.95 (domain 1) in the retest. The correlation coefficient in the test/retest comparison was high and significant, demonstrating that the measurement was stable. A significant difference was observed between the PD group and the comparison group. The questionnaire developed is a valid, statistically appropriate and clinically effective self-administered instrument for individuals with PD. © 2018 Royal College of Speech and Language Therapists.
An evaluation of computerized adaptive testing for general psychological distress: combining GHQ-12 and Affectometer-2 in an item bank for public mental health research.

PubMed

Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J

2016-05-20

Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.
Flexible Execution of Cognitive Procedures.

DTIC Science & Technology

1987-06-30

were drawn from three third-grade classrooms The classrooms were pre-tested twice using a paper-and-pencil diagnostic test. We selected 33 students...tested individually in a small room adjacent to their classroom . Each student solved an individualized paper-and-pencil test whose items were designed... tablet , and students filled out the test with a special pen. Equipment malfunctions caused the data from 7 students to be lost. Tablet data from each of

Building an Evaluation Scale using Item Response Theory.

PubMed

Lalor, John P; Wu, Hao; Yu, Hong

2016-11-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory

PubMed Central

Lalor, John P.; Wu, Hao; Yu, Hong

2016-01-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
Development of a psychological test to measure ability-based emotional intelligence in the Indonesian workplace using an item response theory

PubMed Central

Fajrianthi; Zein, Rizqy Amelia

2017-01-01

This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = −2), and 2.398 for subset 3 (level of ability = −2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. PMID:29238234
History of United States Army physical fitness and physical readiness training.

PubMed

Knapik, Joseph J; East, Whitfield B

2014-01-01

This article traces the history of US Army physical fitness assessments from the first test developed for Cadets at the US Military Academy in 1858 through efforts to revise the current Army Physical Fitness Test (APFT). The first "Individual Efficiency Test" (1920) for all Soldiers consisted of a 100-yard run, running broad jump, wall climb, hand grenade throw, and obstacle course. The first scientific efforts involved testing of 400 Soldiers and a factor analysis of 25 individual test items. In 1944, this resulted in a 7-item test (pull-up, burpee, squat jump, push-up, man-carry, sit-up and 300-yard run) with a 100-point scoring system. In 1943, women were encouraged to take a "self-assessment" consisting of push-ups, bent knee sit-ups, wing lifts, squat thrusts, running, and a stork stand. In 1946, age-adjusted standards were introduced and in 1965 semiannual fitness assessments were mandated. The number of tests proliferated in the 1969-1973 period with 7 separate assessments. The current APFT consisting of push-ups, sit-ups, and a 2-mile run was introduced in 1980 and alternative tests for those with physical limitations in 1982. Current efforts to revise the assessment involve systematic literature reviews and validating the relationship between test items and common Soldiering tasks.
Rapid and Accurate Behavioral Health Diagnostic Screening: Initial Validation Study of a Web-Based, Self-Report Tool (the SAGE-SR)

PubMed Central

Purcell, Susan E; Rhea, Karen; Maier, Philip; First, Michael; Zweede, Lisa; Sinisterra, Manuela; Nunn, M Brad; Austin, Marie-Paule; Brodey, Inger S

2018-01-01

Background The Structured Clinical Interview for DSM (SCID) is considered the gold standard assessment for accurate, reliable psychiatric diagnoses; however, because of its length, complexity, and training required, the SCID is rarely used outside of research. Objective This paper aims to describe the development and initial validation of a Web-based, self-report screening instrument (the Screening Assessment for Guiding Evaluation-Self-Report, SAGE-SR) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and the SCID-5-Clinician Version (CV) intended to make accurate, broad-based behavioral health diagnostic screening more accessible within clinical care. Methods First, study staff drafted approximately 1200 self-report items representing individual granular symptoms in the diagnostic criteria for the 8 primary SCID-CV modules. An expert panel iteratively reviewed, critiqued, and revised items. The resulting items were iteratively administered and revised through 3 rounds of cognitive interviewing with community mental health center participants. In the first 2 rounds, the SCID was also administered to participants to directly compare their Likert self-report and SCID responses. A second expert panel evaluated the final pool of items from cognitive interviewing and criteria in the DSM-5 to construct the SAGE-SR, a computerized adaptive instrument that uses branching logic from a screener section to administer appropriate follow-up questions to refine the differential diagnoses. The SAGE-SR was administered to healthy controls and outpatient mental health clinic clients to assess test duration and test-retest reliability. Cutoff scores for screening into follow-up diagnostic sections and criteria for inclusion of diagnoses in the differential diagnosis were evaluated. Results The expert panel reduced the initial 1200 test items to 664 items that panel members agreed collectively represented the SCID items from the 8 targeted modules and DSM criteria for the covered diagnoses. These 664 items were iteratively submitted to 3 rounds of cognitive interviewing with 50 community mental health center participants; the expert panel reviewed session summaries and agreed on a final set of 661 clear and concise self-report items representing the desired criteria in the DSM-5. The SAGE-SR constructed from this item pool took an average of 14 min to complete in a nonclinical sample versus 24 min in a clinical sample. Responses to individual items can be combined to generate DSM criteria endorsements and differential diagnoses, as well as provide indices of individual symptom severity. Preliminary measures of test-retest reliability in a small, nonclinical sample were promising, with good to excellent reliability for screener items in 11 of 13 diagnostic screening modules (intraclass correlation coefficient [ICC] or kappa coefficients ranging from .60 to .90), with mania achieving fair test-retest reliability (ICC=.50) and other substance use endorsed too infrequently for analysis. Conclusions The SAGE-SR is a computerized adaptive self-report instrument designed to provide rigorous differential diagnostic information to clinicians. PMID:29572204
Predicting Naming Latencies with an Analogical Model

ERIC Educational Resources Information Center

Chandler, Steve

2008-01-01

Skousen's (1989, Analogical modeling of language, Kluwer Academic Publishers, Dordrecht) Analogical Model (AM) predicts behavior such as spelling pronunciation by comparing the characteristics of a test item (a given input word) to those of individual exemplars in a data set of previously encountered items. While AM and other exemplar-based models…
Derivation and Applicability of Asymptotic Results for Multiple Subtests Person-Fit Statistics

PubMed Central

Albers, Casper J.; Meijer, Rob R.; Tendeiro, Jorge N.

2016-01-01

In high-stakes testing, it is important to check the validity of individual test scores. Although a test may, in general, result in valid test scores for most test takers, for some test takers, test scores may not provide a good description of a test taker’s proficiency level. Person-fit statistics have been proposed to check the validity of individual test scores. In this study, the theoretical asymptotic sampling distribution of two person-fit statistics that can be used for tests that consist of multiple subtests is first discussed. Second, simulation study was conducted to investigate the applicability of this asymptotic theory for tests of finite length, in which the correlation between subtests and number of items in the subtests was varied. The authors showed that these distributions provide reasonable approximations, even for tests consisting of subtests of only 10 items each. These results have practical value because researchers do not have to rely on extensive simulation studies to simulate sampling distributions. PMID:29881053
Creation and Delivery of New Superpixelized DIRBE Map Products

NASA Technical Reports Server (NTRS)

Weiland, J.

1998-01-01

Phase 1 called for the following tasks: (1) completion of code to generate intermediate files containing the individual DIRBE observations which would be used to make the superpixelized maps; (2) completion of code necessary to generate the maps themselves; and (3) quality control on test-case maps in the form of point-source extraction and photometry. Items 1 and 2 are well in hand and the tested code is nearly complete. A few test maps have been generated for the tests mentioned in item 3. Map generation is not in production mode yet.
Adaptation of the Practice Environment Scale for military nurses: a psychometric analysis.

PubMed

Swiger, Pauline A; Raju, Dheeraj; Breckenridge-Sproat, Sara; Patrician, Patricia A

2017-09-01

The aim of this study was to confirm the psychometric properties of Practice Environment Scale of the Nursing Work Index in a military population. This study also demonstrates association rule analysis, a contemporary exploratory technique. One of the instruments most commonly used to evaluate the nursing practice environment is the Practice Environment Scale of the Nursing Work Index. Although the instrument has been widely used, the reliability, validity and individual item function are not commonly evaluated. Gaps exist with regard to confirmatory evaluation of the subscale factors, individual item analysis and evaluation in the outpatient setting and with non-registered nursing staff. This was a secondary data analysis of existing survey data. Multiple psychometric methods were used for this analysis using survey data collected in 2014. First, descriptive analyses were conducted, including exploration using association rules. Next, internal consistency was tested and confirmatory factor analysis was performed to test the factor structure. The specified factor structure did not hold; therefore, exploratory factor analysis was performed. Finally, item analysis was executed using item response theory. The differential item functioning technique allowed the comparison of responses by care setting and nurse type. The results of this study indicate that responses differ between groups and that several individual items could be removed without altering the psychometric properties of the instrument. The instrument functions moderately well in a military population; however, researchers may want to consider nurse type and care setting during analysis to identify any meaningful variation in responses. © 2017 John Wiley & Sons Ltd.
Item Banks for Substance Use from the Patient-Reported Outcomes Measurement Information System (PROMIS®): Severity of Use and Positive Appeal of Use*

PubMed Central

Pilkonis, Paul A.; Yu, Lan; Dodds, Nathan E.; Johnston, Kelly L.; Lawrence, Suzanne; Hilton, Thomas F.; Daley, Dennis C.; Patkar, Ashwin A.; McCarty, Dennis

2015-01-01

Background Two item banks for substance use were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®): severity of substance use and positive appeal of substance use. Methods Qualitative item analysis (including focus groups, cognitive interviewing, expert review, and item revision) reduced an initial pool of more than 5,300 items for substance use to 119 items included in field testing. Items were written in a first-person, past-tense format, with 5 response options reflecting frequency or severity. Both 30-day and 3-month time frames were tested. The calibration sample of 1,336 respondents included 875 individuals from the general population (ascertained through an internet panel) and 461patients from addiction treatment centers participating in the National Drug Abuse Treatment Clinical Trials Network. Results Final banks of 37 and 18 items were calibrated for severity of substance use and positive appeal of substance use, respectively, using the two-parameter graded response model from item response theory (IRT). Initial calibrations were similar for the 30-day and 3-month time frames, and final calibrations used data combined across the time frames, making the items applicable with either interval. Seven-item static short forms were also developed from each item bank. Conclusions Test information curves showed that the PROMIS item banks provided substantial information in a broad range of severity, making them suitable for treatment, observational, and epidemiological research in both clinical and community settings. PMID:26423364
Cross-Cultural Adaptation, Validation, and Reliability Testing of the Modified Oswestry Disability Questionnaire in Persian Population with Low Back Pain.

PubMed

Baradaran, Aslan; Ebrahimzadeh, Mohammad H; Birjandinejad, Ali; Kachooei, Amir Reza

2016-04-01

Prospective study. We aimed to validate the Persian version of the modified Oswestry disability questionnaire (MODQ) in patients with low back pain. Modified Oswestry low back pain disability questionnaire is a well-known condition-specific outcome measure that helps quantify disability in patients with lumbar syndromes. To test the validity in a pilot study, the Persian MODQ was administered to 25 individuals with low back pain. We then enrolled 200 consecutive patients with low back pain to fill the Persian MODQ as well as the short form 36 (SF-36) questionnaire. Convergent validity of the MODQ was tested using the Spearman's correlation coefficient between the MODQ and SF-36 subscales. Intraclass correlation coefficient (ICC) and Cronbach's α coefficient were measured to test the reliability between test and retest and internal consistency of all items, respectively. ICC for individual items ranged from 0.43 to 0.80 showing good reliability and reproducibility of each individual item. Cronbach's α coefficient was 0.69 showing good internal consistency across all 10 items of the Persian MODQ. Total MODQ score showed moderate to strong correlation with the eight subscales and the two domains of the SF-36. The highest correlation was between the MODQ and the physical functioning subscale of the SF-36 (r=-0.54, p<0.001) and the physical component domain of the SF-36 (r=-0.55, p<0.001) showing that MODQ is measuring what it is supposed to measure in terms of disability and physical function. Persian version of the MODQ is a valid and reliable tool for the assessment of the disability following low back pain.
Transitive inference in adults with autism spectrum disorders

PubMed Central

Solomon, Marjorie; Frank, Michael J.; Smith, Anne C.; Ly, Stanford; Carter, Cameron S.

2012-01-01

Individuals with autism spectrum disorders (ASDs) exhibit intact rote learning with impaired generalization. A transitive inference paradigm, involving training on four sequentially presented stimulus pairs containing overlapping items, with subsequent testing on two novel pairs, was used to investigate this pattern of learning in 27 young adults with ASDs and 31 matched neurotypical individuals (TYPs). On the basis of findings about memory and neuropathology, we hypothesized that individuals with ASDs would use a relational flexibility/conjunctive strategy reliant on an intact hippocampus, versus an associative strength/value transfer strategy requiring intact interactions between the prefrontal cortex and the striatum. Hypotheses were largely confirmed. ASDs demonstrated reduced interference from intervening pairs in early training; only TYPs formed a serial position curve by test; and ASDs exhibited impairments on the novel test pair consisting of end items with intact performance on the inner test pair. However, comparable serial position curves formed for both groups by the end of the first block. PMID:21656344
Modeling individualized coefficient alpha to measure quality of test score data.

PubMed

Liu, Molei; Hu, Ming; Zhou, Xiao-Hua

2018-05-23

Individualized coefficient alpha is defined. It is item and subject specific and is used to measure the quality of test score data with heterogenicity among the subjects and items. A regression model is developed based on 3 sets of generalized estimating equations. The first set of generalized estimating equation models the expectation of the responses, the second set models the response's variance, and the third set is proposed to estimate the individualized coefficient alpha, defined and used to measure individualized internal consistency of the responses. We also use different techniques to extend our method to handle missing data. Asymptotic property of the estimators is discussed, based on which inference on the coefficient alpha is derived. Performance of our method is evaluated through simulation study and real data analysis. The real data application is from a health literacy study in Hunan province of China. Copyright © 2018 John Wiley & Sons, Ltd.
Analysis instrument test on mathematical power the material geometry of space flat side for grade 8

NASA Astrophysics Data System (ADS)

Kusmaryono, Imam; Suyitno, Hardi; Dwijanto, Karomah, Nur

2017-08-01

The main problem of research to determine the quality of test items on the material side of flat geometry to assess students' mathematical power. The method used is quantitative descriptive. The subjects were students of class 8 as many as 20 students. The object of research is the quality of test items in terms of the power of mathematics: validity, reliability, level of difficulty and power differentiator. Instrument mathematical power ratings are tested include: written tests and questionnaires about the disposition of mathematical power. Data were obtained from the field, in the form of test data on the material geometry of space flat side and questionnaires. The results of the test instrument to the reliability of the test item is influenced by many factors. Factors affecting the reliability of the instrument is the number of items, homogeneity test questions, the time required, the uniformity of conditions of the test taker, the homogeneity of the group, the variability problem, and motivation of the individual (person taking the test). Overall, the evaluation results of this study stated that the test instrument can be used as a tool to measure students' mathematical power.
Development of an item bank for computerized adaptive test (CAT) measurement of pain.

PubMed

Petersen, Morten Aa; Aaronson, Neil K; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Hammerlid, Eva; Hjermstad, Marianne J; Kaasa, Stein; Loge, Jon H; Velikova, Galina; Young, Teresa; Groenvold, Mogens

2016-01-01

Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured by the EORTC QLQ-C30 questionnaire. The development process consisted of four steps: (1) literature search, (2) formulation of new items and expert evaluations, (3) pretesting and (4) field-testing and psychometric analyses for the final selection of items. In step 1, we identified 337 pain items from the literature. Twenty-nine new items fitting the QLQ-C30 item style were formulated in step 2 that were reduced to 26 items by expert evaluations. Based on interviews with 31 patients from Denmark, France and the UK, the list was further reduced to 21 items in step 3. In phase 4, responses were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25% compared to using the QLQ-C30 pain scale. We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ-C30 pain items. The EORTC pain CAT is currently available for "experimental" purposes.
34 CFR 462.3 - What definitions apply?

Code of Federal Regulations, 2012 CFR

2012-07-01

... items across pre- and post-testing. Test administrator means an individual who is trained to administer... instructional time a student needs before post-testing. Violation of these protocols often invalidates the test... defined in the Act. Test means a standardized test, assessment, or instrument that has a formal protocol...
34 CFR 462.3 - What definitions apply?

Code of Federal Regulations, 2010 CFR

2010-07-01

... items across pre- and post-testing. Test administrator means an individual who is trained to administer... instructional time a student needs before post-testing. Violation of these protocols often invalidates the test... defined in the Act. Test means a standardized test, assessment, or instrument that has a formal protocol...
34 CFR 462.3 - What definitions apply?

Code of Federal Regulations, 2011 CFR

2011-07-01

... items across pre- and post-testing. Test administrator means an individual who is trained to administer... instructional time a student needs before post-testing. Violation of these protocols often invalidates the test... defined in the Act. Test means a standardized test, assessment, or instrument that has a formal protocol...
34 CFR 462.3 - What definitions apply?

Code of Federal Regulations, 2014 CFR

2014-07-01

... items across pre- and post-testing. Test administrator means an individual who is trained to administer... instructional time a student needs before post-testing. Violation of these protocols often invalidates the test... defined in the Act. Test means a standardized test, assessment, or instrument that has a formal protocol...
34 CFR 462.3 - What definitions apply?

Code of Federal Regulations, 2013 CFR

2013-07-01

... items across pre- and post-testing. Test administrator means an individual who is trained to administer... instructional time a student needs before post-testing. Violation of these protocols often invalidates the test... defined in the Act. Test means a standardized test, assessment, or instrument that has a formal protocol...

The methodological quality of diagnostic test accuracy studies for musculoskeletal conditions can be improved.

PubMed

Henschke, Nicholas; Keuerleber, Julia; Ferreira, Manuela; Maher, Christopher G; Verhagen, Arianne P

2014-04-01

To provide an overview of reporting and methodological quality in diagnostic test accuracy (DTA) studies in the musculoskeletal field and evaluate the use of the QUality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist. A literature review identified all systematic reviews that evaluated the accuracy of clinical tests to diagnose musculoskeletal conditions and used the QUADAS checklist. Two authors screened all identified reviews and extracted data on the target condition, index tests, reference standard, included studies, and QUADAS items. A descriptive analysis of the QUADAS checklist was performed, along with Rasch analysis to examine the construct validity and internal reliability. A total of 19 systematic reviews were included, which provided data on individual items of the QUADAS checklist for 392 DTA studies. In the musculoskeletal field, uninterpretable or intermediate test results are commonly not reported, with 175 (45%) studies scoring "no" to this item. The proportion of studies fulfilling certain items varied from 22% (item 11) to 91% (item 3). The interrater reliability of the QUADAS checklist was good and Rasch analysis showed excellent construct validity and internal consistency. This overview identified areas where the reporting and performance of diagnostic studies within the musculoskeletal field can be improved. Copyright © 2014 Elsevier Inc. All rights reserved.
Evaluation of adding item-response theory analysis for evaluation of the European Board of Ophthalmology Diploma examination.

PubMed

Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José

2013-11-01

To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has proved beneficial for the statistical analysis of EBOD. Introduction of negative marking has led to a significant increase in the reliability (KR-20 > 0.90), indicating that the current examination format can be kept for future EBOD examinations. © 2013 Acta Ophthalmologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
Measuring grief and loss after spinal cord injury: Development, validation and psychometric characteristics of the SCI-QOL Grief and Loss item bank and short form

PubMed Central

Kalpakjian, Claire Z.; Tulsky, David S.; Kisala, Pamela A.; Bombardier, Charles H.

2015-01-01

Objective To develop an item response theory (IRT) calibrated Grief and Loss item bank as part of the Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system. Design A literature review guided framework development of grief/loss. New items were created from focus groups. Items were revised based on expert review and patient feedback and were then field tested. Analyses included confirmatory factor analysis (CFA), graded response IRT modeling and evaluation of differential item functioning (DIF). Setting We tested a 20-item pool at several rehabilitation centers across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI answered the grief and loss questions. Results The final calibrated item bank resulted in 17 retained items. A unidimensional model was observed (CFI = 0.976; RMSEA = 0.078) and measurement precision was good (theta range between −1.48 to 2.48). Ten items were flagged for DIF, however, after examination of effect sizes found this to be negligible with little practical impact on score estimates. Conclusions This study indicates that the SCI-QOL Grief and Loss item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010969
Social Withdrawal Among Individuals Receiving Psychiatric Care: Derivation of a Scale Using Routine Clinical Assessment Data to Support Screening and Outcome Measurement.

PubMed

Rios, Sebastian; Perlman, Christopher M

2017-04-24

Social withdrawal is a symptom experienced by individuals with an array of mental health conditions, particularly those with schizophrenia and mood disorders. Assessments of social withdrawal are often lengthy and may not be routinely integrated within the comprehensive clinical assessment of the individual. This study utilized item response and classical test theory methods to derive a Social Withdrawal Scale (SWS) using items embedded within a routine clinical assessment, the RAI-Mental Health (RAI-MH). Using data from 60,571 inpatients in Ontario, Canada, a common factor analysis identified seven items from the RAI-MH that measure social withdrawal. A graded response model found that six items had acceptable discrimination parameters: lack of motivation, reduced interaction, decreased energy, flat affect, anhedonia, and loss of interest. Summing these items, the SWS was found to have strong internal consistency (Cronbach's alpha = 0.82) and showed a medium to large effect size (d = 0.77) from admission to discharge. Fewer individuals with high SWS scores participated in social activity or reported having a confidant compared to those with lower scores. Since the RAI-MH is available across clinical subgroups in several jurisdictions, the SWS is a useful tool for screening, clinical decision support, and evaluation.
A large-scale, long-term study of scale drift: The micro view and the macro view

NASA Astrophysics Data System (ADS)

He, W.; Li, S.; Kingsbury, G. G.

2016-11-01

The development of measurement scales for use across years and grades in educational settings provides unique challenges, as instructional approaches, instructional materials, and content standards all change periodically. This study examined the measurement stability of a set of Rasch measurement scales that have been in place for almost 40 years. In order to investigate the stability of these scales, item responses were collected from a large set of students who took operational adaptive tests using items calibrated to the measurement scales. For the four scales that were examined, item samples ranged from 2183 to 7923 items. Each item was administered to at least 500 students in each grade level, resulting in approximately 3000 responses per item. Stability was examined at the micro level analysing change in item parameter estimates that have occurred since the items were first calibrated. It was also examined at the macro level, involving groups of items and overall test scores for students. Results indicated that individual items had changes in their parameter estimates, which require further analysis and possible recalibration. At the same time, the results at the total score level indicate substantial stability in the measurement scales over the span of their use.
Development and validation of a brief screening instrument for psychosocial risk associated with genetic testing: a pan-Canadian cohort study

PubMed Central

Esplen, Mary Jane; Cappelli, Mario; Wong, Jiahui; Bottorff, Joan L; Hunter, Jon; Carroll, June; Dorval, Michel; Wilson, Brenda; Allanson, Judith; Semotiuk, Kara; Aronson, Melyssa; Bordeleau, Louise; Charlemagne, Nicole; Meschino, Wendy

2013-01-01

Objectives To develop a brief, reliable and valid instrument to screen psychosocial risk among those who are undergoing genetic testing for Adult-Onset Hereditary Disease (AOHD). Design A prospective two-phase cohort study. Setting 5 genetic testing centres for AOHD, such as cancer, Huntington's disease or haemochromatosis, in ambulatory clinics of tertiary hospitals across Canada. Participants 141 individuals undergoing genetic testing were approached and consented to the instrument development phase of the study (Phase I). The Genetic Psychosocial Risk Instrument (GPRI) developed in Phase I was tested in Phase II for item refinement and validation. A separate cohort of 722 individuals consented to the study, 712 completed the baseline package and 463 completed all follow-up assessments. Most participants were female, at the mid-life stage. Individuals in advanced stages of the illness or with cognitive impairment or a language barrier were excluded. Interventions Phase I: GPRI items were generated from (1) a review of the literature, (2) input from genetic counsellors and (3) phase I participants. Phase II: further item refinement and validation were conducted with a second cohort of participants who completed the GPRI at baseline and were followed for psychological distress 1-month postgenetic testing results. Primary and secondary outcome measures GPRI, Hamilton Depression Rating Scale (HAM-D), Hamilton Anxiety Rating Scale (HAM-A), Brief Symptom Inventory (BSI) and Impact of Event Scale (IES). Results The final 20-item GPRI had a high reliability—Cronbach's α at 0.81. The construct validity was supported by high correlations between GPRI and BSI and IES. The predictive value was demonstrated by a receiver operating characteristic curve of 0.78 plotting GPRI against follow-up assessments using HAM-D and HAM-A. Conclusions With a cut-off score of 50, GPRI identified 84% of participants who displayed distress postgenetic testing results, supporting its potential usefulness in a clinical setting. PMID:23485718
Measuring Access to Information and Technology: Environmental Factors Affecting Persons With Neurologic Disorders.

PubMed

Hahn, Elizabeth A; Garcia, Sofia F; Lai, Jin-Shei; Miskovic, Ana; Jerousek, Sara; Semik, Patrick; Wong, Alex; Heinemann, Allen W

2016-08-01

To develop and validate a patient-reported measure of access to information and technology (AIT) for persons with spinal cord injury, stroke, or traumatic brain injury. A mixed-methods approach was used to develop items, refine them through cognitive interviews, and evaluate their psychometric properties. Item responses were evaluated with the Rasch rating scale model. Correlational and analysis-of-variance methods were used to evaluate construct validity. Community-dwelling individuals participated in telephone interviews or traveled to the academic medical centers where this research took place. Individuals with a diagnosis of spinal cord injury, stroke, or traumatic brain injury (aged ≥18y, English speaking) participated in cognitive interviews (n=12 persons), field testing of the items (n=305 persons), and validation testing of the final set of items (n=604 persons). Not applicable. A set of items to measure AIT for people with disabilities. A user-friendly multimedia touchscreen was used for self-administration of the items. A 23-item AIT measure demonstrated good evidence of internal consistency reliability, and content and construct validity. This new AIT measure will enable researchers and clinicians to determine to what extent environmental factors influence health outcomes and social participation in people with disabilities. The AIT measure could also provide disability advocates with more specific and detailed information about environmental factors to lobby for elimination of barriers. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Item response theory and the measurement of motor behavior.

PubMed

Safrit, M J; Cohen, A S; Costa, M G

1989-12-01

Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.
Development and Initial Validation of Military Deployment-Related TBI Quality-of-Life Item Banks.

PubMed

Toyinbo, Peter A; Vanderploeg, Rodney D; Donnell, Alison J; Mutolo, Sandra A; Cook, Karon F; Kisala, Pamela A; Tulsky, David S

2016-01-01

To investigate unique factors that affect health-related quality of life (QOL) in individuals with military deployment-related traumatic brain injury (MDR-TBI) and to develop appropriate assessment tools, consistent with the TBI-QOL/PROMIS/Neuro-QOL systems. Three focus groups from each of the 4 Veterans Administration (VA) Polytrauma Rehabilitation Centers, consisting of 20 veterans with mild to severe MDR-TBI, and 36 VA providers were involved in early stage of new item banks development. The item banks were field tested in a sample (N = 485) of veterans enrolled in VA and diagnosed with an MDR-TBI. Focus groups and survey. Developed item banks and short forms for Guilt, Posttraumatic Stress Disorder/Trauma, and Military-Related Loss. Three new item banks representing unique domains of MDR-TBI health outcomes were created: 15 new Posttraumatic Stress Disorder items plus 16 SCI-QOL legacy Trauma items, 37 new Military-Related Loss items plus 18 TBI-QOL legacy Grief/Loss items, and 33 new Guilt items. Exploratory and confirmatory factor analyses plus bifactor analysis of the items supported sufficient unidimensionality of the new item pools. Convergent and discriminant analyses results, as well as known group comparisons, provided initial support for the validity and clinical utility of the new item response theory-calibrated item banks and their short forms. This work provides a unique opportunity to identify issues specific to individuals with MDR-TBI and ensure that they are captured in QOL assessment, thus extending the existing TBI-QOL measurement system.
Development and psychometric testing of a barriers to HIV testing scale among individuals with HIV infection in Sweden; The Barriers to HIV testing scale-Karolinska version.

PubMed

Wiklander, Maria; Brännström, Johanna; Svedhem, Veronica; Eriksson, Lars E

2015-11-19

Barriers to HIV testing experienced by individuals at risk for HIV can result in treatment delay and further transmission of the disease. Instruments to systematically measure barriers are scarce, but could contribute to improved strategies for HIV testing. Aims of this study were to develop and test a barriers to HIV testing scale in a Swedish context. An 18-item scale was developed, based on an existing scale with addition of six new items related to fear of the disease or negative consequences of being diagnosed as HIV-infected. Items were phrased as statements about potential barriers with a three-point response format representing not important, somewhat important, and very important. The scale was evaluated regarding missing values, floor and ceiling effects, exploratory factor analysis, and internal consistencies. The questionnaire was completed by 292 adults recently diagnosed with HIV infection, of whom 7 were excluded (≥9 items missing) and 285 were included (≥12 items completed) in the analyses. The participants were 18-70 years old (mean 40.5, SD 11.5), 39 % were females and 77 % born outside Sweden. Routes of transmission were heterosexual transmission 63 %, male to male sex 20 %, intravenous drug use 5 %, blood product/transfusion 2 %, and unknown 9 %. All scale items had <3 % missing values. The data was feasible for factor analysis (KMO = 0.92) and a four-factor solution was chosen, based on level of explained common variance (58.64 %) and interpretability of factor structure. The factors were interpreted as; personal consequences, structural barriers, social and economic security, and confidentiality. Ratings on the minimum level (suggested barrier not important) were common, resulting in substantial floor effects on the scales. The scales were internally consistent (Cronbach's α 0.78-0.91). This study gives preliminary evidence of the scale being feasible, reliable and valid to identify different types of barriers to HIV testing.
Rapid and Accurate Behavioral Health Diagnostic Screening: Initial Validation Study of a Web-Based, Self-Report Tool (the SAGE-SR).

PubMed

Brodey, Benjamin; Purcell, Susan E; Rhea, Karen; Maier, Philip; First, Michael; Zweede, Lisa; Sinisterra, Manuela; Nunn, M Brad; Austin, Marie-Paule; Brodey, Inger S

2018-03-23

The Structured Clinical Interview for DSM (SCID) is considered the gold standard assessment for accurate, reliable psychiatric diagnoses; however, because of its length, complexity, and training required, the SCID is rarely used outside of research. This paper aims to describe the development and initial validation of a Web-based, self-report screening instrument (the Screening Assessment for Guiding Evaluation-Self-Report, SAGE-SR) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and the SCID-5-Clinician Version (CV) intended to make accurate, broad-based behavioral health diagnostic screening more accessible within clinical care. First, study staff drafted approximately 1200 self-report items representing individual granular symptoms in the diagnostic criteria for the 8 primary SCID-CV modules. An expert panel iteratively reviewed, critiqued, and revised items. The resulting items were iteratively administered and revised through 3 rounds of cognitive interviewing with community mental health center participants. In the first 2 rounds, the SCID was also administered to participants to directly compare their Likert self-report and SCID responses. A second expert panel evaluated the final pool of items from cognitive interviewing and criteria in the DSM-5 to construct the SAGE-SR, a computerized adaptive instrument that uses branching logic from a screener section to administer appropriate follow-up questions to refine the differential diagnoses. The SAGE-SR was administered to healthy controls and outpatient mental health clinic clients to assess test duration and test-retest reliability. Cutoff scores for screening into follow-up diagnostic sections and criteria for inclusion of diagnoses in the differential diagnosis were evaluated. The expert panel reduced the initial 1200 test items to 664 items that panel members agreed collectively represented the SCID items from the 8 targeted modules and DSM criteria for the covered diagnoses. These 664 items were iteratively submitted to 3 rounds of cognitive interviewing with 50 community mental health center participants; the expert panel reviewed session summaries and agreed on a final set of 661 clear and concise self-report items representing the desired criteria in the DSM-5. The SAGE-SR constructed from this item pool took an average of 14 min to complete in a nonclinical sample versus 24 min in a clinical sample. Responses to individual items can be combined to generate DSM criteria endorsements and differential diagnoses, as well as provide indices of individual symptom severity. Preliminary measures of test-retest reliability in a small, nonclinical sample were promising, with good to excellent reliability for screener items in 11 of 13 diagnostic screening modules (intraclass correlation coefficient [ICC] or kappa coefficients ranging from .60 to .90), with mania achieving fair test-retest reliability (ICC=.50) and other substance use endorsed too infrequently for analysis. The SAGE-SR is a computerized adaptive self-report instrument designed to provide rigorous differential diagnostic information to clinicians. ©Benjamin Brodey, Susan E Purcell, Karen Rhea, Philip Maier, Michael First, Lisa Zweede, Manuela Sinisterra, M Brad Nunn, Marie-Paule Austin, Inger S Brodey. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 23.03.2018.
Content range and precision of a computer adaptive test of upper extremity function for children with cerebral palsy.

PubMed

Montpetit, Kathleen; Haley, Stephen; Bilodeau, Nathalie; Ni, Pengsheng; Tian, Feng; Gorton, George; Mulcahey, M J

2011-02-01

This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function.
Trading Up: Chimpanzees (Pan troglodytes) Show Self-Control Through Their Exchange Behavior

PubMed Central

Beran, Michael J.; Rossettie, Mattea S.; Parrish, Audrey E.

2015-01-01

Self-control is defined as the ability or capacity to obtain an objectively more valuable outcome rather than an objectively less valuable outcome though tolerating a longer delay or a greater effort requirement (or both) in obtaining that more valuable outcome. A number of tests have been devised to assess self-control in nonhuman animals, including exchange tasks. In this study, three chimpanzees (Pan troglodytes) participated in a delay of gratification task that required food exchange as the behavioral response that reflected self-control. The chimpanzees were offered opportunities to inhibit eating and instead exchange a currently possessed food item for a different (and sometimes better) item, often needing to exchange several food items before obtaining the highest-valued reward. We manipulated reward type, reward size, reward visibility, delay to exchange, and location of the highest-valued reward in the sequence of exchange events to compare performance within the same individuals. The chimpanzees successfully traded until obtaining the best item in most cases, although there were individual differences among participants in some variations of the test. These results support the idea that self-control is robust in chimpanzees even in contexts in which they perhaps anticipate future rewards and sustain delay of gratification until they can obtain the ultimately most-valuable item. PMID:26325355
Trading up: chimpanzees (Pan troglodytes) show self-control through their exchange behavior.

PubMed

Beran, Michael J; Rossettie, Mattea S; Parrish, Audrey E

2016-01-01

Self-control is defined as the ability or capacity to obtain an objectively more valuable outcome rather than an objectively less valuable outcome though tolerating a longer delay or a greater effort requirement (or both) in obtaining that more valuable outcome. A number of tests have been devised to assess self-control in non-human animals, including exchange tasks. In this study, three chimpanzees (Pan troglodytes) participated in a delay of gratification task that required food exchange as the behavioral response that reflected self-control. The chimpanzees were offered opportunities to inhibit eating and instead exchange a currently possessed food item for a different (and sometimes better) item, often needing to exchange several food items before obtaining the highest valued reward. We manipulated reward type, reward size, reward visibility, delay to exchange, and location of the highest valued reward in the sequence of exchange events to compare performance within the same individuals. The chimpanzees successfully traded until obtaining the best item in most cases, although there were individual differences among participants in some variations of the test. These results support the idea that self-control is robust in chimpanzees even in contexts in which they perhaps anticipate future rewards and sustain delay of gratification until they can obtain the ultimately most valuable item.
An Evaluation of a New Method of IRT Scaling

ERIC Educational Resources Information Center

Ragland, Shelley

2010-01-01

In order to be able to fairly compare scores derived from different forms of the same test within the Item Response Theory framework, all individual item parameters must be on the same scale. A new approach, the RPA method, which is based on transformations of predicted score distributions was evaluated here and was shown to produce results…
Computerized Adaptive Testing: An Overview and an Example.

ERIC Educational Resources Information Center

McBride, James R.

The advantages of computerized adaptive testing are discussed, and an example illustrates its use in sixth grade mathematics. These tests are administered at a computer terminal, and the test items to be administered are selected according to the difficulty level appropriate to the individual's ability. Tailoring increases the psychometric…
Optimising mobility outcome measures in Huntington's disease.

PubMed

Busse, Monica; Quinn, Lori; Khalil, Hanan; McEwan, Kirsten

2014-01-01

Many of the performance-based mobility measures that are currently used in Huntington's disease (HD) were developed for assessment in other neurological conditions such as stroke. We aimed to assess the individual item-response of commonly used performance-based mobility measures, with a view to optimizing the scales for specific application in Huntington's Disease (HD). Data from a larger multicentre, observational study were used. Seventy-five people with HD (11 pre-manifest & 64 manifest) were assessed on the Six-Minute Walk Test, 10-Meter Walk Test, Timed "Up & Go" Test (TUG), Berg Balance Scale (BBS), Physical Performance Test (PPT), Four Square Step Test, and Tinetti Mobility Test (TMT). The Unified Huntington's Disease Rating Scale (UHDRS) Total Motor Score, Functional Assessment Scale and Total Functional Capacity scores were recorded, alongside cognitive measures. Standard regression analysis was used to assess predictive validity. Individual item responses were investigated using a sequence of approaches to allow for gradual removal of items and the subsequent creation of shortened versions. Psychometric properties (reliability and discriminant ability) of the shortened scales were assessed. TUG (β 0.46, CI 0.20-3.47), BBS (β -0.35, CI -2.10-0.14), and TMT (β -0.45, CI -3.14-0.64) were good disease-specific mobility measures. PPT was the best measure of functional performance (β 0.42, CI 0.00-0.43 for TFC & β 0.57 CI 0.15-0.81 for FAS). Shortened versions of BBS and TMT were developed based on item analysis. The resultant BBS and TMT shortened scales were reliable for use in manifest HD. ROC analysis showed that shortened scales were able to discriminate between manifest and pre-manifest disease states. Our data suggests that the PPT is appropriate as a general measure of function in individuals with HD, and we have identified shortened versions of the BBS and TMT that measure the unique gait and balance impairments in HD. These scales, alongside the TUG, may therefore be important measures to consider in future clinical trials.
Performance of Certification and Recertification Examinees on Multiple Choice Test Items: Does Physician Age Have an Impact?

PubMed

Shen, Linjun; Juul, Dorthea; Faulkner, Larry R

2016-01-01

The development of recertification programs (now referred to as Maintenance of Certification or MOC) by the members of the American Board of Medical Specialties provides the opportunity to study knowledge base across the professional lifespan of physicians. Research results to date are mixed with some studies finding negative associations between age and various measures of competency and others finding no or minimal relationships. Four groups of multiple choice test items that were independently developed for certification and MOC examinations in psychiatry and neurology were administered to certification and MOC examinees within each specialty. Percent correct scores were calculated for each examinee. Differences between certification and MOC examinees were compared using unpaired t tests, and logistic regression was used to compare MOC and certification examinee performance on the common test items. Except for the neurology certification test items that addressed basic neurology concepts, the performance of the certification and MOC examinees was similar. The differences in performance on individual test items did not consistently favor one group or the other and could not be attributed to any distinguishable content or format characteristics of those items. The findings of this study are encouraging in that physicians who had recently completed residency training possessed clinical knowledge that was comparable to that of experienced physicians, and the experienced physicians' clinical knowledge was equivalent to that of recent residency graduates. The role testing can play in enhancing expertise is described.
Spatial transposition gradients in visual working memory.

PubMed

Rerko, Laura; Oberauer, Klaus; Lin, Hsuan-Yu

2014-01-01

In list memory, access to individual items reflects limits of temporal distinctiveness. This is reflected in the finding that neighbouring list items tend to be confused most often. This article investigates the analogous effect of spatial proximity in a visual working-memory task. Items were presented in different locations varying in spatial distance. A retro-cue indicated the location of the item relevant for the subsequent memory test. In two recognition experiments, probes matching spatially close neighbours of the relevant item led to more false alarms than probes matching distant neighbours or non-neighbouring memory items. In two probed-recall experiments, one with simultaneous, the other with sequential memory item presentation, items closer to the cued location were more frequently chosen for recall than more distant items. These results reflect a spatial transposition gradient analogous to the temporal transposition gradient in serial recall and challenge fixed-capacity models of visual working memory (WM).
Tests for Adult Basic Education Teachers. "28 Suggestions for Classroom Teachers".

ERIC Educational Resources Information Center

Vonderhaar, Kathleen; And Others

An updated and improved listing of test and measurement items useful in Adult Basic Education Classrooms is provided. Diagnostic, placement, achievement, and group and individual intelligence tests are reviewed. Information on test type and purpose, appropriate grade level, test time, number of forms, the manual, scoring, and format is included.…

The language of science and the high school student: The recognition of concept definitions: A comparison between hindi speaking students in India and english speaking students in Australia

NASA Astrophysics Data System (ADS)

Lynch, P. P.; Chipman, H. H.; Pachaury, A. C.

Sixteen concept words (mass, length, area, volume, solid, liquid, gas, element, compound, mixture, electron, proton, neutron, atom, molecule, and ion) associated with the theme, the nature of matter were described as simple text book definitions after examination of classroom notes and school texts of the last three decades. Sixteen multiple-choice items all of the same form were constructed for each of the concept definitions. The English version of the sixteen item test was given to 1635 high school students in Tasmania (where the language of instruction and the home language is English) and the Hindi version of the test was given to 826 students from the Bhopal/Barwani region of India where the medium of instruction is Hindi. The English and Hindi speaking data are compared from the point of view of development, performance for individual items, and overall performance at grade 10. A number of linguistic hypotheses are examined and reported upon. Although the overall score at grade 10 was identical (10.8/16) for both groups there are differences in development overall and for individual items which are of interest. Overall, the science specificity of the Hindi words does not appear to confer any clearly defined advantage or disadvantage though again there are some interesting individual anomolies.
Construct validity of the items on the Stroke Specific Quality of Life (SS-QOL) questionnaire that evaluate the participation component of the International Classification of Functioning, Disability and Health.

PubMed

Silva, Soraia Micaela; Corrêa, Fernanda Ishida; Pereira, Gabriela Santos; Faria, Christina Danielli Coelho de Morais; Corrêa, João Carlos Ferrari

2018-01-01

Analyze the construct validity and internal consistency of the Stroke Specific Quality of Life (SS-QOL) items that address the participation component of the ICF as well as analyze the ceiling and floor effects. One hundred subjects were analyzed: 85 community-dwelling and 15 institutionalized individuals. The analysis of construct validity was performed using classic psychometrics: (1) the comparison of known groups (individuals without restriction to participation vs. those with restriction to participation) using the Mann-Whitney test and (2) convergent validity - correlation between the scores on the SS-QOL items that address participation and the subscale scores of measures used to evaluate the similar constructs and concepts [the Short-Form Health Survey (SF-36), Functional Independence Measure (FIM) and grip strength test]. Spearman's correlation coefficients were calculated for this analysis. Cronbach's α was used for the analysis of internal consistency and both the ceiling and floor effects were analyzed. The level of significance for all analyses was α = 0.05. The a priori hypotheses regarding construct validity were partially demonstrated, as only five of the eight domains exhibited positive moderate to strong correlations (r > 0.40) with measures that address constructs similar to those addressed on the SS-QOL questionnaire. The items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The ceiling and floor effects were considered adequate for the total SS-QOL score, but beyond acceptable standards for some domains. The 26 items of the SS-QOL questionnaire measure a multidimensional construct and therefore do not only address participation. However, the items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. Implications for rehabilitation The 26 items of the SS-QOL questionnaire demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The present findings can guide healthcare professionals regarding the selection of an assessment tool for the evaluation of post-stroke participation. The findings can lead to consistent and standardization evaluations, which facilitates comparisons and discussion on functional health and social participation after stroke.
Are We There Yet? Exploring the Impact of Translating Cognitive Tests for Dementia Using Mobile Technology in an Aging Population.

PubMed

Ruggeri, Kai; Maguire, Áine; Andrews, Jack L; Martin, Eric; Menon, Shantanu

2016-01-01

This study examines implications of the expanded use of mobile platforms in testing cognitive function, and generates evidence on the impact utilizing mobile platforms for dementia screen. The Saint Louis University Mental State examination (SLUMS) was ported onto a computerized mobile application named the Cambridge University Pen to Digital Equivalence assessment (CUPDE). CUPDE was piloted and compared to the traditional pen and paper version, with a common comparator test for both groups. Sixty healthy participants (aged 50-79) completed both measurements. Differences were tested between overall outcomes, individual items, and relationship with the comparator. Significant differences in the overall scores between the two testing versions as well as within individual items were observed. Even when groups were matched by cognitive function and age, scores on SLUMS original version (M = 19.75, SD = 3) were significantly higher than those on CUPDE (M = 15.88, SD = 3.5), t (15) = 3.02, p < 0.01. Mobile platforms require the development of new normative standards, even when items can be directly translated. Furthermore, these must fit aging populations with significant variance in familiarity with mobile technology. Greater understanding of the interplay and related mechanisms between auditory and visual systems, which are not well understood yet in the context of mobile technologies, is mandatory.
Preliminary development of an ultrabrief two-item bedside test for delirium.

PubMed

Fick, Donna M; Inouye, Sharon K; Guess, Jamey; Ngo, Long H; Jones, Richard N; Saczynski, Jane S; Marcantonio, Edward R

2015-10-01

Delirium is common, morbid, and costly, yet is greatly under-recognized among hospitalized older adults. To identify the best single and pair of mental status test items that predict the presence of delirium. Diagnostic test evaluation study that enrolled medicine inpatients aged 75 years or older at an academic medical center. Patients underwent a clinical reference standard assessment involving a patient interview, medical record review, and interviews with family members and nurses to determine the presence or absence of Diagnostic and Statistical Manual of Mental Disorders, 4th Edition defined delirium. Participants also underwent the three-dimensional Confusion Assessment Method (3D-CAM), a brief, validated assessment for delirium. Individual items and pairs of items from the 3D-CAM were evaluated to determine sensitivity and specificity relative to the reference standard delirium diagnosis. Of the 201 participants (mean age 84 years, 62% female), 42 (21%) had delirium based on the clinical reference standard. The single item with the best test characteristics was "months of the year backwards" with a sensitivity of 83% (95% confidence interval [CI]: 69%-93%) and specificity of 69% (95% CI: 61%-76%). The best 2-item screen was the combination of "months of the year backwards" and "what is the day of the week?" with a sensitivity of 93% (95% CI: 81%-99%) and specificity of 64% (95% CI: 56%-70%). We identified a single item with >80% and pair of items with >90% sensitivity for delirium. If validated prospectively, these items will serve as an initial innovative screening step for delirium identification in hospitalized older adults. © 2015 Society of Hospital Medicine.
Influence of inter-item symmetry in visual search.

PubMed

Roggeveen, Alexa B; Kingstone, Alan; Enns, James T

2004-01-01

Does visual search involve a serial inspection of individual items (Feature Integration Theory) or are items grouped and segregated prior to their consideration as a possible target (Attentional Engagement Theory)? For search items defined by motion and shape there is strong support for prior grouping (Kingstone and Bischof, 1999). The present study tested for grouping based on inter-item shape symmetry. Results showed that target-distractor symmetry strongly influenced search whereas distractor-distractor symmetry influenced search more weakly. This indicates that static shapes are evaluated for similarity to one another prior to their explicit identification as 'target' or 'distractor'. Possible reasons for the unequal contributions of target-distractor and distractor-distractor relations are discussed.
The ugliness-in-averageness effect: Tempering the warm glow of familiarity.

PubMed

Carr, Evan W; Huber, David E; Pecher, Diane; Zeelenberg, Rene; Halberstadt, Jamin; Winkielman, Piotr

2017-06-01

Mere exposure (i.e., stimulus repetition) and blending (i.e., stimulus averaging) are classic ways to increase social preferences, including facial attractiveness. In both effects, increases in preference involve enhanced familiarity. Prominent memory theories assume that familiarity depends on a match between the target and similar items in memory. These theories predict that when individual items are weakly learned, their blends (morphs) should be relatively familiar, and thus liked-a beauty-in-averageness effect ( BiA ). However, when individual items are strongly learned, they are also more distinguishable. This "differentiation" hypothesis predicts that with strongly encoded items, familiarity (and thus, preference) for the blend will be relatively lower than individual items-an ugliness-in-averageness effect ( UiA ). We tested this novel theoretical prediction in 5 experiments. Experiment 1 showed that with weak learning, facial morphs were more attractive than contributing individuals (BiA effect). Experiments 2A and 2B demonstrated that when participants first strongly learned a subset of individual faces (either in a face-name memory task or perceptual-tracking task), morphs of trained individuals were less attractive than the trained individuals (UiA effect). Experiment 3 showed that changes in familiarity for the trained morph (rather than interstimulus conflict) drove the UiA effect. Using a within-subjects design, Experiment 4 mapped out the transition from BiA to UiA solely as a function of memory training. Finally, computational modeling using a well-known memory framework (REM) illustrated the familiarity transition observed in Experiment 4. Overall, these results highlight how memory processes illuminate classic and modern social preference phenomena. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Risk of predation makes foragers less choosy about their food.

PubMed

Charalabidis, Alice; Dechaume-Moncharmont, François-Xavier; Petit, Sandrine; Bohan, David A

2017-01-01

Animals foraging in the wild have to balance speed of decision making and accuracy of assessment of a food item's quality. If resource quality is important for maximizing fitness, then the duration of decision making may be in conflict with other crucial and time consuming tasks, such as anti-predator behaviours or competition monitoring. Individuals facing the risk of predation and/or competition should adjust the duration of decision making and, as a consequence, their level of choosiness for resources. When exposed to predation, the forager could either maintain its level of choosiness for food items but accept a reduction in the amount of food items consumed or it could reduce its level of choosiness and accept all prey items encountered. Under competition risk, individuals are expected to reduce their level of choosiness as slow decision making exposes individuals to a higher risk of opportunity costs. To test these predictions, the level of choosiness of a seed-eating carabid beetle, Harpalus affinis, was examined under 4 different experimental conditions of risk: i) predation risk; ii) intraspecific competition; iii) interspecific competition; and, iv) control. All the risks were simulated using chemical cues from individual conspecifics or beetles of different species that are predatory or granivorous. Our results show that when foraging under the risk of predation, H. affinis individuals significantly reduce their level of choosiness for seeds. Reductions in level of choosiness for food items might serve as a sensible strategy to reduce both the total duration of a foraging task and the cognitive load of the food quality assessment. No significant differences were observed when individuals were exposed to competition cues. Competition, (i.e opportunity cost) may not be perceived as risk high enough to induce changes in the level of choosiness. Our results suggest that considering the amount of items consumed, alone, would be a misleading metric when assessing individual response to a risk of predation. Foraging studies should therefore also take in account the decision making process.
Individual Differences in Incorrect Responding and the Ability to Discriminate the Source of the Products of Retrieval

ERIC Educational Resources Information Center

Jonker, Tanya R.

2016-01-01

When memory is tested, researchers are often interested in the items that were correctly recalled or recognized, while ignoring or factoring out trials where one "recalls" or "recognizes" a nonstudied item. However, intrusions and false alarms are more than nuisance data and can provide key insights into the memory system. The…
Frequency of consumption of cariogenic food items by 4-month-old to 24-month-old children: comparison between two rural communities in KwaZulu-Natal, South Africa.

PubMed

MacKeown, Jennifer M; Faber, Mieke

2005-03-01

The objective of the study was to compare the frequency of consumption of cariogenic food items among 4-month-old to 24-month-old children in two neighbouring rural areas in KwaZulu-Natal Province, South Africa: Nyuswa/Embo (Area A) (n = 127) and Ndunakazi (Area B) (n = 105). Dietary intake was assessed using a food frequency questionnaire. Mothers or caregivers were interviewed by a team of Zulu-speaking fieldworkers. The percentage of children consuming the individual food items (consumers) and the weekly consumption for consumers were calculated for the two areas separately. The food items were ranked in descending order according to the combined group of children and reported for each area within five selected food groups (carbohydrates, sugars, fruit and vegetables, milk and milk products, and other foods and snacks). Food items were 'flagged' according to their cariogenic potential. Fisher's exact test on absolute numbers tested for significant differences in the frequency of intake between individual food items between the two groups. Significance was set at P < 0.05. The frequency of consumption of certain listed cariogenic food items showed significant differences between the two areas. A higher percentage of children in Area A than in Area B consumed most of the food items and also more frequently. Children mainly consumed foods with a cariogenic score of 2, solid foods with 8-20% sugars as well as foods high in starch with less than 10% sugars. This knowledge is essential to gain insight into the eating pattern among rural communities and will provide a baseline for developing and adapting dietary advice specifically for young rural South African children with particular emphasis on the prevention of dental caries.
The reliability of a quality appraisal tool for studies of diagnostic reliability (QAREL).

PubMed

Lucas, Nicholas; Macaskill, Petra; Irwig, Les; Moran, Robert; Rickards, Luke; Turner, Robin; Bogduk, Nikolai

2013-09-09

The aim of this project was to investigate the reliability of a new 11-item quality appraisal tool for studies of diagnostic reliability (QAREL). The tool was tested on studies reporting the reliability of any physical examination procedure. The reliability of physical examination is a challenging area to study given the complex testing procedures, the range of tests, and lack of procedural standardisation. Three reviewers used QAREL to independently rate 29 articles, comprising 30 studies, published during 2007. The articles were identified from a search of relevant databases using the following string: "Reproducibility of results (MeSH) OR reliability (t.w.) AND Physical examination (MeSH) OR physical examination (t.w.)." A total of 415 articles were retrieved and screened for inclusion. The reviewers undertook an independent trial assessment prior to data collection, followed by a general discussion about how to score each item. At no time did the reviewers discuss individual papers. Reliability was assessed for each item using multi-rater kappa (κ). Multi-rater reliability estimates ranged from κ = 0.27 to 0.92 across all items. Six items were recorded with good reliability (κ > 0.60), three with moderate reliability (κ = 0.41 - 0.60), and two with fair reliability (κ = 0.21 - 0.40). Raters found it difficult to agree about the spectrum of patients included in a study (Item 1) and the correct application and interpretation of the test (Item 10). In this study, we found that QAREL was a reliable assessment tool for studies of diagnostic reliability when raters agreed upon criteria for the interpretation of each item. Nine out of 11 items had good or moderate reliability, and two items achieved fair reliability. The heterogeneity in the tests included in this study may have resulted in an underestimation of the reliability of these two items. We discuss these and other factors that could affect our results and make recommendations for the use of QAREL.
Constructing a question bank based on script concordance approach as a novel assessment methodology in surgical education.

PubMed

Aldekhayel, Salah A; Alselaim, Nahar A; Magzoub, Mohi Eldin; Al-Qattan, Mohammad M; Al-Namlah, Abdullah M; Tamim, Hani; Al-Khayal, Abdullah; Al-Habdan, Sultan I; Zamakhshary, Mohammed F

2012-10-24

Script Concordance Test (SCT) is a new assessment tool that reliably assesses clinical reasoning skills. Previous descriptions of developing SCT-question banks were merely subjective. This study addresses two gaps in the literature: 1) conducting the first phase of a multistep validation process of SCT in Plastic Surgery, and 2) providing an objective methodology to construct a question bank based on SCT. After developing a test blueprint, 52 test items were written. Five validation questions were developed and a validation survey was established online. Seven reviewers were asked to answer this survey. They were recruited from two countries, Saudi Arabia and Canada, to improve the test's external validity. Their ratings were transformed into percentages. Analysis was performed to compare reviewers' ratings by looking at correlations, ranges, means, medians, and overall scores. Scores of reviewers' ratings were between 76% and 95% (mean 86% ± 5). We found poor correlations between reviewers (Pearson's: +0.38 to -0.22). Ratings of individual validation questions ranged between 0 and 4 (on a scale 1-5). Means and medians of these ranges were computed for each test item (mean: 0.8 to 2.4; median: 1 to 3). A subset of test items comprising 27 items was generated based on a set of inclusion and exclusion criteria. This study proposes an objective methodology for validation of SCT-question bank. Analysis of validation survey is done from all angles, i.e., reviewers, validation questions, and test items. Finally, a subset of test items is generated based on a set of criteria.
Item response theory - A first approach

NASA Astrophysics Data System (ADS)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Recovery definitions: Do they change?

PubMed Central

Kaskutas, Lee Ann; Witbrodt, Jane; Grella, Christine E.

2015-01-01

Background The term “recovery” is widely used in the substance abuse literature and clinical settings, but data have not been available to empirically validate how recovery is defined by individuals who are themselves in recovery. The “What Is Recovery?” project developed a 39-item definition of recovery based on a large nationwide online survey of individuals in recovery. The objective of this paper is to report on the stability of those definitions one to two years later. Methods To obtain a sample for studying recovery definitions that reflected the different pathways to recovery, the parent study involved intensive outreach. Follow-up interviews (n = 1237) were conducted online and by telephone among respondents who consented to participate in follow-up studies. Descriptive analyses considered endorsement of individual recovery items at both surveys, and t-tests of summary scores studied significant change in the sample overall and among key subgroups. To assess item reliability, Cronbach’s alpha was estimated. Results Rates of endorsement of individual items at both interviews was above 90% for a majority of the recovery elements, and there was about as much transition into endorsement as out of endorsement. Statistically significant t-test scores were of modest magnitude, and reliability statistics were high (ranging from .782 to .899). Conclusions Longitudinal analyses found little evidence of meaningful change in recovery definitions at follow-up. Results thus suggest that the recovery definitions developed in the parent “What Is Recovery?” survey represent stable definitions of recovery that can be used to guide service provision in Recovery-Oriented Systems of Care. PMID:26166666
ADHD and retrieval-induced forgetting: evidence for a deficit in the inhibitory control of memory.

PubMed

Storm, Benjamin C; White, Holly A

2010-04-01

Research on retrieval-induced forgetting has shown that the selective retrieval of some information can cause the forgetting of other information. Such forgetting is believed to result from inhibitory processes that function to resolve interference during retrieval. The current study examined whether individuals with ADHD demonstrate normal levels of retrieval-induced forgetting. A total of 40 adults with ADHD and 40 adults without ADHD participated in a standard retrieval-induced forgetting experiment. Critically, half of the items were tested using category cues and the other half of the items were tested using category-plus-one-letter-stem cues. Whereas both ADHD and non-ADHD participants demonstrated retrieval-induced forgetting on the final category-cued recall test, only non-ADHD participants demonstrated retrieval-induced forgetting on the final category-plus-stem-cued recall test. These results suggest that individuals with ADHD do have a deficit in the inhibitory control of memory, but that this deficit may only be apparent when output interference is adequately controlled on the final test.
Development and Validation of a Test for Bulimia.

ERIC Educational Resources Information Center

Smith, Marcia C.; Thelen, Mark H.

1984-01-01

Developed the Bulimia Test (BULIT) based on responses of clinically identified females (N=18) and normal female college students (N=119) to preliminary test items. Results showed that the BULIT provided an objective, reliable, and valid measure by which to identify individuals with symptoms of bulimia. (Instrument is appended.) (LLL)
The Subjective and Objective Interface of Bias Detection on Language Tests

ERIC Educational Resources Information Center

Ross, Steven J.; Okabe, Junko

2006-01-01

Test validity is predicated on there being a lack of bias in tasks, items, or test content. It is well-known that factors such as test candidates' mother tongue, life experiences, and socialization practices of the wider community may serve to inject subtle interactions between individuals' background and the test content. When the gender of the…
Individuals with knee impairments identify items in need of clarification in the Patient Reported Outcomes Measurement Information System (PROMIS®) pain interference and physical function item banks - a qualitative study.

PubMed

Lynch, Andrew D; Dodds, Nathan E; Yu, Lan; Pilkonis, Paul A; Irrgang, James J

2016-05-11

The content and wording of the Patient Reported Outcome Measurement Information System (PROMIS) Physical Function and Pain Interference item banks have not been qualitatively assessed by individuals with knee joint impairments. The purpose of this investigation was to identify items in the PROMIS Physical Function and Pain Interference Item Banks that are irrelevant, unclear, or otherwise difficult to respond to for individuals with impairment of the knee and to suggest modifications based on cognitive interviews. Twenty-nine individuals with knee joint impairments qualitatively assessed items in the Pain Interference and Physical Function Item Banks in a mixed-methods cognitive interview. Field notes were analyzed to identify themes and frequency counts were calculated to identify items not relevant to individuals with knee joint impairments. Issues with clarity were identified in 23 items in the Physical Function Item Bank, resulting in the creation of 43 new or modified items, typically changing words within the item to be clearer. Interpretation issues included whether or not the knee joint played a significant role in overall health and age/gender differences in items. One quarter of the original items (31 of 124) in the Physical Function Item Bank were identified as irrelevant to the knee joint. All 41 items in the Pain Interference Item Bank were identified as clear, although individuals without significant pain substituted other symptoms which interfered with their life. The Physical Function Item Bank would benefit from additional items that are relevant to individuals with knee joint impairments and, by extension, to other lower extremity impairments. Several issues in clarity were identified that are likely to be present in other patient cohorts as well.
Task demands determine comparison strategy in whole probe change detection.

PubMed

Udale, Rob; Farrell, Simon; Kent, Chris

2018-05-01

Detecting a change in our visual world requires a process that compares the external environment (test display) with the contents of memory (study display). We addressed the question of whether people strategically adapt the comparison process in response to different decision loads. Study displays of 3 colored items were presented, followed by 'whole-display' probes containing 3 colored shapes. Participants were asked to decide whether any probed items contained a new feature. In Experiments 1-4, irrelevant changes to the probed item's locations or feature bindings influenced memory performance, suggesting that participants employed a comparison process that relied on spatial locations. This finding occurred irrespective of whether participants were asked to decide about the whole display, or only a single cued item within the display. In Experiment 5, when the base-rate of changes in the nonprobed items increased (increasing the incentive to use the cue effectively), participants were not influenced by irrelevant changes in location or feature bindings. In addition, we observed individual differences in the use of spatial cues. These results suggest that participants can flexibly switch between spatial and nonspatial comparison strategies, depending on interactions between individual differences and task demand factors. These findings have implications for models of visual working memory that assume that the comparison between study and test obligatorily relies on accessing visual features via their binding to location. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Aerobic fitness and executive control of relational memory in preadolescent children.

PubMed

Chaddock, Laura; Hillman, Charles H; Buck, Sarah M; Cohen, Neal J

2011-02-01

the neurocognitive benefits of an active lifestyle in childhood have public health and educational implications, especially as children in today's technological society are becoming increasingly overweight, unhealthy, and unfit. Human and animal studies show that aerobic exercise affects both prefrontal executive control and hippocampal function. This investigation attempts to bridge these research threads by using a cognitive task to examine the relationship between aerobic fitness and executive control of relational memory in preadolescent 9- and 10-yr-old children. higher-fit and lower-fit children studied faces and houses under individual item (i.e., nonrelational) and relational encoding conditions, and the children were subsequently tested with recognition memory trials consisting of previously studied pairs and pairs of completely new items. With each subject participating in both item and relational encoding conditions, and with recognition test trials amenable to the use of both item and relational memory cues, this task afforded a challenge to the flexible use of memory, specifically in the use of appropriate encoding and retrieval strategies. Hence, the task provided a test of both executive control and memory processes. lower-fit children showed poorer recognition memory performance than higher-fit children, selectively in the relational encoding condition. No association between aerobic fitness and recognition performance was found for faces and houses studied as individual items (i.e., nonrelationally). the findings implicate childhood aerobic fitness as a factor in the ability to use effective encoding and retrieval executive control processes for relational memory material and, possibly, in the strategic engagement of prefrontal- and hippocampal-dependent systems.
Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life.

PubMed

Paap, Muirne C S; Kroeze, Karel A; Terwee, Caroline B; van der Palen, Job; Veldkamp, Bernard P

2017-11-01

Examining item usage is an important step in evaluating the performance of a computerized adaptive test (CAT). We study item usage for a newly developed multidimensional CAT which draws items from three PROMIS domains, as well as a disease-specific one. The multidimensional item bank used in the current study contained 194 items from four domains: the PROMIS domains fatigue, physical function, and ability to participate in social roles and activities, and a disease-specific domain (the COPD-SIB). The item bank was calibrated using the multidimensional graded response model and data of 795 patients with chronic obstructive pulmonary disease. To evaluate the item usage rates of all individual items in our item bank, CAT simulations were performed on responses generated based on a multivariate uniform distribution. The outcome variables included active bank size and item overuse (usage rate larger than the expected item usage rate). For average θ-values, the overall active bank size was 9-10%; this number quickly increased as θ-values became more extreme. For values of -2 and +2, the overall active bank size equaled 39-40%. There was 78% overlap between overused items and active bank size for average θ-values. For more extreme θ-values, the overused items made up a much smaller part of the active bank size: here the overlap was only 35%. Our results strengthen the claim that relatively short item banks may suffice when using polytomous items (and no content constraints/exposure control mechanisms), especially when using MCAT.

Measuring pain phenomena after spinal cord injury: Development and psychometric properties of the SCI-QOL Pain Interference and Pain Behavior assessment tools.

PubMed

Cohen, Matthew L; Kisala, Pamela A; Dyson-Hudson, Trevor A; Tulsky, David S

2018-05-01

To develop modern patient-reported outcome measures that assess pain interference and pain behavior after spinal cord injury (SCI). Grounded-theory based qualitative item development; large-scale item calibration field-testing; confirmatory factor analyses; graded response model item response theory analyses; statistical linking techniques to transform scores to the Patient Reported Outcome Measurement Information System (PROMIS) metric. Five SCI Model Systems centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. N/A. Spinal Cord Injury - Quality of Life (SCI-QOL) Pain Interference item bank, SCI-QOL Pain Interference short form, and SCI-QOL Pain Behavior scale. Seven hundred fifty-seven individuals with traumatic SCI completed 58 items addressing various aspects of pain. Items were then separated by whether they assessed pain interference or pain behavior, and poorly functioning items were removed. Confirmatory factor analyses confirmed that each set of items was unidimensional, and item response theory analyses were used to estimate slopes and thresholds for the items. Ultimately, 7 items (4 from PROMIS) comprised the Pain Behavior scale and 25 items (18 from PROMIS) comprised the Pain Interference item bank. Ten of these 25 items were selected to form the Pain Interference short form. The SCI-QOL Pain Interference item bank and the SCI-QOL Pain Behavior scale demonstrated robust psychometric properties. The Pain Interference item bank is available as a computer adaptive test or short form for research and clinical applications, and scores are transformed to the PROMIS metric.
Development and psychometric evaluation of a cardiovascular risk and disease management knowledge assessment tool.

PubMed

Rosneck, James S; Hughes, Joel; Gunstad, John; Josephson, Richard; Noe, Donald A; Waechter, Donna

2014-01-01

This article describes the systematic construction and psychometric analysis of a knowledge assessment instrument for phase II cardiac rehabilitation (CR) patients measuring risk modification disease management knowledge and behavioral outcomes derived from national standards relevant to secondary prevention and management of cardiovascular disease. First, using adult curriculum based on disease-specific learning outcomes and competencies, a systematic test item development process was completed by clinical staff. Second, a panel of educational and clinical experts used an iterative process to identify test content domain and arrive at consensus in selecting items meeting criteria. Third, the resulting 31-question instrument, the Cardiac Knowledge Assessment Tool (CKAT), was piloted in CR patients to ensure use of application. Validity and reliability analyses were performed on 3638 adults before test administrations with additional focused analyses on 1999 individuals completing both pretreatment and posttreatment administrations within 6 months. Evidence of CKAT content validity was substantiated, with 85% agreement among content experts. Evidence of construct validity was demonstrated via factor analysis identifying key underlying factors. Estimates of internal consistency, for example, Cronbach's α = .852 and Spearman-Brown split-half reliability = 0.817 on pretesting, support test reliability. Item analysis, using point biserial correlation, measured relationships between performance on single items and total score (P < .01). Analyses using item difficulty and item discrimination indices further verified item stability and validity of the CKAT. A knowledge instrument specifically designed for an adult CR population was systematically developed and tested in a large representative patient population, satisfying psychometric parameters, including validity and reliability.
Integrating personalized medical test contents with XML and XSL-FO.

PubMed

Toddenroth, Dennis; Dugas, Martin; Frankewitsch, Thomas

2011-03-01

In 2004 the adoption of a modular curriculum at the medical faculty in Muenster led to the introduction of centralized examinations based on multiple-choice questions (MCQs). We report on how organizational challenges of realizing faculty-wide personalized tests were addressed by implementation of a specialized software module to automatically generate test sheets from individual test registrations and MCQ contents. Key steps of the presented method for preparing personalized test sheets are (1) the compilation of relevant item contents and graphical media from a relational database with database queries, (2) the creation of Extensible Markup Language (XML) intermediates, and (3) the transformation into paginated documents. The software module by use of an open source print formatter consistently produced high-quality test sheets, while the blending of vectorized textual contents and pixel graphics resulted in efficient output file sizes. Concomitantly the module permitted an individual randomization of item sequences to prevent illicit collusion. The automatic generation of personalized MCQ test sheets is feasible using freely available open source software libraries, and can be efficiently deployed on a faculty-wide scale.
Development of and Field-Test Results for the CAHPS PCMH Survey

PubMed Central

Scholle, Sarah Hudson; Vuong, Oanh; Ding, Lin; Fry, Stephanie; Gallagher, Patricia; Brown, Julie A.; Hays, Ron D.; Cleary, Paul D.

2017-01-01

Objective To develop and evaluate survey questions that assess processes of care relevant to Patient-Centered Medical Homes (PCMHs). Research Design We convened expert panels, reviewed evidence on effective care practices and existing surveys, elicited broad public input, and conducted cognitive interviews and a field test to develop items relevant to PCMHs that could be added to the CAHPS® Clinician & Group (CG-CAHPS) 1.0 Survey. Surveys were tested using a two-contact mail protocol in 10 adult and 33 pediatric practices (both private and community health centers) in Massachusetts. A total of 4,875 completed surveys were received (overall response rate of 25%). Analyses We calculated the rate of valid responses for each item. We conducted exploratory factor analyses and estimated item-to-total correlations, individual and site level reliability, and correlations among proposed multi-item composites. Results Ten items in four new domains (Comprehensiveness, Information, Self-Management Support, and Shared Decision-Making) and four items in two existing domains (Access and Coordination of Care) were selected to be supplemental items to be used in conjunction with the adult CG-CAHPS 1.0 survey. For the child version, four items in each of two new domains (Information and Self-Management Support) and five items in existing domains (Access, Comprehensiveness-Prevention, Coordination of Care) were selected. Conclusions This study provides support for the reliability and validity of new items to supplement the CG-CAHPS 1.0 survey to assess aspects of primary care that are important attributes of Patient-Centered Medical Homes. PMID:23064272
Item and scale differential functioning of the Mini-Mental State Exam assessed using the Differential Item and Test Functioning (DFIT) Framework.

PubMed

Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A

2006-11-01

To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.
Evaluation of a Lag Schedule of Reinforcement in a Group Contingency to Promote Varied Naming of Categories Items with Children

ERIC Educational Resources Information Center

Wiskow, Katie M.; Donaldson, Jeanne M.

2016-01-01

We compared the effects of Lag 0 and Lag 1 schedules of reinforcement on children's responses naming category items in a group context and subsequent responses emitted during individual testing in which the schedule of reinforcement remained Lag 0. Specifically, we measured response variability and novel responses to categories for 3 children who…
Validation of the instrument of health literacy competencies for Chinese-speaking health professionals.

PubMed

Chang, Li-Chun; Chen, Yu-Chi; Liao, Li-Ling; Wu, Fei Ling; Hsieh, Pei-Lin; Chen, Hsiao-Jung

2017-01-01

The study aimed to illustrate the constructs and test the psychometric properties of an instrument of health literacy competencies (IOHLC) for health professionals. A multi-phase questionnaire development method was used to develop the scale. The categorization of the knowledge and practice domains achieved consensus through a modified Delphi process. To reduce the number of items, the 92-item IOHLC was psychometrically evaluated through internal consistency, Rasch modeling, and two-stage factor analysis. In total, 736 practitioners, including nurses, nurse practitioners, health educators, case managers, and dieticians completed the 92-item IOHLC online from May 2012 to January 2013. The final version of the IOHLC covered 9 knowledge items and 40 skill items containing 9 dimensions, with good model fit, and explaining 72% of total variance. All domains had acceptable internal consistency and discriminant validity. The tool in this study is the first to verify health literacy competencies rigorously. Moreover, through psychometric testing, the 49-item IOHLC demonstrates adequate reliability and validity. The IOHLC may serve as a reference for the theoretical and in-service training of Chinese-speaking individuals' health literacy competencies.
Working memory capacity predicts the beneficial effect of selective memory retrieval.

PubMed

Schlichting, Andreas; Aslan, Alp; Holterman, Christoph; Bäuml, Karl-Heinz T

2015-01-01

Selective retrieval of some studied items can both impair and improve recall of the other items. This study examined the role of working memory capacity (WMC) for the two effects of memory retrieval. Participants studied an item list consisting of predefined target and nontarget items. After study of the list, half of the participants performed an imagination task supposed to induce a change in mental context, whereas the other half performed a counting task which does not induce such context change. Following presentation of a second list, memory for the original list's target items was tested, either with or without preceding retrieval of the list's nontarget items. Consistent with previous work, preceding nontarget retrieval impaired target recall in the absence of the context change, but improved target recall in its presence. In particular, there was a positive relationship between WMC and the beneficial, but not the detrimental effect of memory retrieval. On the basis of the view that the beneficial effect of memory retrieval reflects context-reactivation processes, the results indicate that individuals with higher WMC are better able to capitalise on retrieval-induced context reactivation than individuals with lower WMC.
Designing and Testing an Inventory for Measuring Social Media Competency of Certified Health Education Specialists

PubMed Central

Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann

2015-01-01

Background Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). Objective The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. Methods The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Results Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Conclusions Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES. PMID:26399428
Designing and Testing an Inventory for Measuring Social Media Competency of Certified Health Education Specialists.

PubMed

Alber, Julia M; Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann

2015-09-23

Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES.
Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

PubMed

Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

2018-02-01

The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Developmental changes in visual short-term memory in infancy: evidence from eye-tracking.

PubMed

Oakes, Lisa M; Baumgartner, Heidi A; Barrett, Frederick S; Messenger, Ian M; Luck, Steven J

2013-01-01

We assessed visual short-term memory (VSTM) for color in 6- and 8-month-old infants (n = 76) using a one-shot change detection task. In this task, a sample array of two colored squares was visible for 517 ms, followed by a 317-ms retention period and then a 3000-ms test array consisting of one unchanged item and one item in a new color. We tracked gaze at 60 Hz while infants looked at the changed and unchanged items during test. When the two sample items were different colors (Experiment 1), 8-month-old infants exhibited a preference for the changed item, indicating memory for the colors, but 6-month-olds exhibited no evidence of memory. When the two sample items were the same color and did not need to be encoded as separate objects (Experiment 2), 6-month-old infants demonstrated memory. These results show that infants can encode information in VSTM in a single, brief exposure that simulates the timing of a single fixation period in natural scene viewing, and they reveal rapid developmental changes between 6 and 8 months in the ability to store individuated items in VSTM.
The Spinal Cord Injury- Functional Index: Item Banks to Measure Physical Functioning of Individuals with Spinal Cord Injury

PubMed Central

Tulsky, David S.; Jette, Alan; Kisala, Pamela A.; Kalpakjian, Claire; Dijkers, Marcel P.; Whiteneck, Gale; Ni, Pengsheng; Kirshblum, Steven; Charlifue, Susan; Heinemann, Allen W.; Forchheimer, Martin; Slavin, Mary; Houlihan, Bethlyn; Tate, Denise; Dyson-Hudson, Trevor; Fyffe, Denise; Williams, Steve; Zanca, Jeanne

2012-01-01

Objective To develop a comprehensive set of patient reported items to assess multiple aspects of physical functioning relevant to the lives of people with spinal cord injury (SCI) and to evaluate the underlying structure of physical functioning. Design Cross-sectional Setting Inpatient and community Participants Item pools of physical functioning were developed, refined and field tested in a large sample of 855 individuals with traumatic spinal cord injury stratified by diagnosis, severity, and time since injury Interventions None Main Outcome Measure SCI-FI measurement system Results Confirmatory factor analysis (CFA) indicated that a 5-factor model, including basic mobility, ambulation, wheelchair mobility, self care, and fine motor, had the best model fit and was most closely aligned conceptually with feedback received from individuals with SCI and SCI clinicians. When just the items making up basic mobility were tested in CFA, the fit statistics indicate strong support for a unidimensional model. Similar results were demonstrated for each of the other four factors indicating unidimensional models. Conclusions Though unidimensional or 2-factor (mobility and upper extremity) models of physical functioning make up outcomes measures in the general population, the underlying structure of physical function in SCI is more complex. A 5-factor solution allows for comprehensive assessment of key domain areas of physical functioning. These results informed the structure and development of the SCI-FI measurement system of physical functioning. PMID:22609299
Conditional Standard Errors of Measurement for Composite Scores Using IRT

ERIC Educational Resources Information Center

Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan

2012-01-01

Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…
Response Time as an Indicator of Test Taker Speed: Assumptions Meet Reality

ERIC Educational Resources Information Center

Wise, Steven L.

2015-01-01

The growing presence of computer-based testing has brought with it the capability to routinely capture the time that test takers spend on individual test items. This, in turn, has led to an increased interest in potential applications of response time in measuring intellectual ability and achievement. Goldhammer (this issue) provides a very useful…
First State Fitness Test. A Measurement of Functional Health.

ERIC Educational Resources Information Center

Brown, Timothy; And Others

This test is designed to measure the functional health of young people. Functional health refers to those factors relating to personal health that can be improved with regular exercise. This test is unique in comparison to other physical fitness tests because of the absence of motor skill items which have no relationship to an individual's…
Individual Differences in Working Memory Capacity and Episodic Retrieval: Examining the Dynamics of Delayed and Continuous Distractor Free Recall

ERIC Educational Resources Information Center

Unsworth, Nash

2007-01-01

Two experiments explored the possibility that individual differences in working memory capacity (WMC) partially reflect differences in the size of the search set from which items are retrieved. High- and low-WMC individuals were tested in delayed (Experiment 1) and continuous distractor (Experiment 2) free recall with varying list lengths. Across…
Cross-cultural adaptation and validation of the Quebec User Evaluation of Satisfaction with Assistive Technology (QUEST 2.0): the development of the Taiwanese version.

PubMed

Mao, Hui-Fen; Chen, Wan-Yin; Yao, Grace; Huang, Sheau-Ling; Lin, Chia-Chi; Huang, Wen-Ni Wennie

2010-05-01

To develop and validate a cross-cultural version of the Quebec User Evaluation of Satisfaction with Assistive Technology (QUEST 2.0) for users of assistive technology devices in Taiwan. A cross-sectional survey. The standard cultural adaptation procedure was used for questionnaire translation and cultural item design. A field test was then conducted for item selection and psychometric properties testing. One hundred and five volunteer assistive device users in community. A questionnaire comprising 12 items of the QUEST 2.0 and 16 culture-specific items. One culture-specific item, 'Cost', was selected based on eight criteria and added to the QUEST 2.0 (12 items) to formulate the Taiwanese version of QUEST 2.0 (T-QUEST). The T-QUEST consisted of 13 items which were classified into two domains: device (8 items) and service (5 items). The internal consistencies of the device, service and total T-QUEST scores were 0.87, 0.84 and 0.90, respectively. The device, services and total T-QUEST scores achieved good test-retest stability (intraclass correlation coefficient (ICC) 0.90, 0.97, 0.95). Exploratory factor analysis revealed that T-QUEST had a two-factor structure for device and service in the construct of user satisfaction (53.42% of the variance explained). Users of assistive device in different culture may have different concerns regarding satisfaction. T-QUEST is the first published version of QUEST with culture-specific items added to the original translated items of QUEST 2.0. T-QUEST was a valid and reliable tool for measuring user satisfaction among Mandarin-speaking individuals using various kinds of assistive devices.
An Examination of Income Effect on Consumers' Ethical Evaluation of Counterfeit Drugs Buying Behaviour: A Cross-Sectional Study in Qatar and Sudan.

PubMed

Alfadl, Abubakr Abdelraouf; Ibrahim, Mohamed Izham Mohamed; Maraghi, Fatima Abdulla; Mohammad, Khadijah Shhab

2016-09-01

There are limited studies on consumer behaviour toward counterfeit products and the determining factors that motivate willingness to purchase counterfeit items. This study aimed to fill this literature gap through studying differences in individual ethical evaluations of counterfeit drug purchase and whether that ethical evaluation affected by difference in income. It is hypothesized that individuals with lower/higher income make a more/less permissive evaluation of ethical responsibility regarding counterfeit drug purchase. To empirically test the research assumption, a comparison was made between people who live in the low-income country Sudan and people who live in the high-income country Qatar. The study employed a face-to-face structured interview survey methodology to collect data from 1,170 subjects and the Sudanese and Qatari samples were compared using independent t-test at alpha level of 0.05 employing SPSS version 22.0. Sudanese and Qatari individuals were significantly different on all items. Sudanese individuals scored below 3 for all Awareness of Societal Consequences (ASC) items indicating that they make more permissive evaluation of ethical responsibility regarding counterfeit drug purchase. Both groups shared a basic positive moral agreement regarding subjective norm indicating that influence of income is not evident. Findings indicate that low-income individuals make more permissive evaluation of ethical responsibility regarding counterfeit drugs purchase when highlighting awareness of societal consequences used as a deterrent tool, while both low and high-income individuals share a basic positive moral agreement when subjective norm dimension is exploited to discourage unethical buying behaviour.
An Instrument to Assess Beliefs about Standardized Testing: Measuring the Influence of Epistemology on the Endorsement of Standardized Testing

ERIC Educational Resources Information Center

Magee, Robert G.; Jones, Brett D.

2012-01-01

This article describes the development of an instrument to assess beliefs about standardized testing in schools, a topic of much heated debate. The Beliefs About Standardized Testing scale was developed to measure the extent to which individuals support high-stakes standardized testing. The 9-item scale comprises three subscales which measure…

Monitoring the Performance of Human and Automated Scores for Spoken Responses

ERIC Educational Resources Information Center

Wang, Zhen; Zechner, Klaus; Sun, Yu

2018-01-01

As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…
Mayo-Portland adaptability inventory: comparing psychometrics in cerebrovascular accident to traumatic brain injury.

PubMed

Malec, James F; Kean, Jacob; Altman, Irwin M; Swick, Shannon

2012-12-01

(1) To evaluate the measurement reliability and construct validity of the Mayo-Portland Adaptability Inventory, 4th revision (MPAI-4) in a sample consisting exclusively of patients with cerebrovascular accident (CVA) using single parameter (Rasch) item-response methods; (2) to examine the differential item functioning (DIF) by sex within the CVA population; and (3) to examine DIF and differential test functioning (DTF) across traumatic brain injury (TBI) and CVA samples. Retrospective psychometric analysis of rating scale data. Home- and community-based brain injury rehabilitation program. Individuals post-CVA (n=861) and individuals with TBI (n=603). Not applicable. MPAI-4. Item data on admission to community-based rehabilitation were submitted to Rasch, DIF, and DTF analyses. The final calibration in the CVA sample revealed satisfactory reliability/separation for persons (.91/3.16) and items (1.00/23.64). DIF showed that items for pain, anger, audition, and memory were associated with higher levels of disability for CVA than TBI patients; whereas, self-care, mobility, and use of hands indicated greater overall disability for TBI patients. DTF analyses showed a high degree of association between the 2 sets of items (R=.92; R(2)=.85) and, at most, a 3.7 point difference in raw scores. The MPAI-4 demonstrates satisfactory psychometric properties for use with individuals with CVA applying for interdisciplinary posthospital rehabilitation. DIF reveals clinically meaningful differences between CVA and TBI groups that should be considered in results at the item and subscale level. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Shortening of an existing generic online health-related quality of life instrument for dogs.

PubMed

Reid, J; Wiseman-Orr, L; Scott, M

2017-10-11

Development, initial validation and reliability testing of a shortened version of a web-based questionnaire instrument to measure generic health-related quality of life in companion dogs, to facilitate smartphone and online use. The original 46 items were reduced using expert judgment and factor analysis. Items were removed on the basis of item loadings and communalities on factors identified through factor analysis of responses from owners of healthy and unwell dogs, intrafactor item correlations, readability of items in the UK, USA and Australia and ability of individual items to discriminate between healthy and unwell dogs. Validity was assessed through factor analysis and a field trial using a "known groups" approach. Test-retest reliability was assessed using intraclass correlation coefficients. The new instrument comprises 22 items, each of which was rated by dog owners using a 7-point Likert scale. Factor analysis revealed a structure with four health-related quality of life domains (energetic/enthusiastic, happy/content, active/comfortable, and calm/relaxed) accounting for 72% of the variability in the data compared with 64% for the original instrument. The field test involving 153 healthy and unwell dogs demonstrated good discriminative properties and high intraclass correlation coefficients. The 22-item shortened form is superior to the original instrument and can be accessed via a mobile phone app. This is likely to increase the acceptability to dog owners as a routine wellness measure in health care packages and as a therapeutic monitoring tool. © 2017 British Small Animal Veterinary Association.
Diabetes knowledge in nursing homes and home-based care services: a validation study of the Michigan Diabetes Knowledge Test adapted for use among nursing personnel.

PubMed

Haugstvedt, Anne; Aarflot, Morten; Igland, Jannicke; Landbakk, Tilla; Graue, Marit

2016-01-01

Providing high-quality diabetes care in nursing homes and home-based care facilities requires suitable instruments to evaluate the level of diabetes knowledge among the health-care providers. Thus, the aim of this study was to examine the psychometric properties of the Michigan Diabetes Knowledge Test adapted for use among nursing personnel. The study included 127 nursing personnel (32 registered nurses, 69 nursing aides and 26 nursing assistants) at three nursing homes and one home-based care facility in Norway. We examined the reliability and content and construct validity of the Michigan Diabetes Knowledge Test. The items in both the general diabetes subscale and the insulin-use subscale were considered relevant and appropriate. The instrument showed satisfactory properties for distinguishing between groups. Item response theory-based measurements and item information curves indicate maximum information at average or lower knowledge scores. Internal consistency and the item-total correlations were quite weak, indicating that the Michigan Diabetes Knowledge Test measures a set of items related to various relevant knowledge topics but not necessarily related to each other. The Michigan Diabetes Knowledge Test measures a broad range of topics relevant to diabetes care. It is an appropriate instrument for identifying individual and distinct needs for diabetes education among nursing personnel. The knowledge gaps identified by the Michigan Diabetes Knowledge Test could also provide useful input for the content of educational activities. However, some revision of the test should be considered.
Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks.

PubMed

Zhao, Yue

2017-03-01

In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation. Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen's summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items. The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant. Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.
Evaluating HIV Knowledge Questionnaires Among Men Who Have Sex with Men: A Multi-Study Item Response Theory Analysis.

PubMed

Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian

2018-01-01

Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.
Attitudes Toward Transgender Men and Women: Development and Validation of a New Measure

PubMed Central

Billard, Thomas J

2018-01-01

A series of three studies were conducted to generate, develop, and validate the Attitudes toward Transgender Men and Women (ATTMW) scale. In Study 1, 120 American adults responded to an open-ended questionnaire probing various dimensions of their perceptions of transgender individuals and identity. Qualitative thematic analysis generated 200 items based on their responses. In Study 2, 238 American adults completed a questionnaire consisting of the generated items. Exploratory factor analysis (EFA) revealed two non-identical 12-item subscales (ATTM and ATTW) of the full 24-item scale. In Study 3, 150 undergraduate students completed a survey containing the ATTMW and a number of validity-testing variables. Confirmatory factor analysis (CFA) verified the single-factor structures of the ATTM and ATTW subscales, and the convergent, discriminant, predictive, and concurrent validities of the ATTMW were also established. Together, our results demonstrate that the ATTMW is a reliable and valid measure of attitudes toward transgender individuals. PMID:29666595
Development and reliability testing of a self-report instrument to measure the office layout as a correlate of occupational sitting.

PubMed

Duncan, Mitch J; Rashid, Mahbub; Vandelanotte, Corneel; Cutumisu, Nicoleta; Plotnikoff, Ronald C

2013-02-04

Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach's α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. The number of items on all scales were reduced, Chronbach's α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys.
Development and reliability testing of a self-report instrument to measure the office layout as a correlate of occupational sitting

PubMed Central

2013-01-01

Background Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. Methods The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach’s α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. Results The number of items on all scales were reduced, Chronbach’s α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). Conclusion All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys. PMID:23379485
Measurement of overgeneral autobiographical memory: Psychometric properties of the autobiographical memory test in young and older populations

PubMed Central

Romero, Dulce; Ricarte, Jorge J.; Serrano, Juan P.; Nieto, Marta; Latorre, Jose M.

2018-01-01

The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used. PMID:29672583
Measurement of overgeneral autobiographical memory: Psychometric properties of the autobiographical memory test in young and older populations.

PubMed

Ros, Laura; Romero, Dulce; Ricarte, Jorge J; Serrano, Juan P; Nieto, Marta; Latorre, Jose M

2018-01-01

The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used.
Validation of the Cross-Cultural Alcoholism Screening Test (CCAST).

PubMed

Gorenc, K D; Peredo, S; Pacurucu, S; Llanos, R; Vincente, B; López, R; Abreu, L F; Paez, E

1999-01-01

When screening instruments that are used in the assessment and diagnosis of alcoholism of individuals from different ethnicities, some cultural variables based on norms and societal acceptance of drinking behavior can play an important role in determining the outcome. The accepted diagnostic criteria of current market testing are based on Western standards. In this study, the Munich Alcoholism Test (31 items) was the base instrument applied to subjects from several Hispanic-American countries (Bolivia, Chile, Ecuador, Mexico, and Peru). After the sample was submitted to several statistical procedures, these 31 items were reduced to a culture-free, 31-item test named the Cross-Cultural Alcohol Screening Test (CCAST). The results of this Hispanic-American sample (n = 2,107) empirically demonstrated that CCAST measures alcoholism with an adequate degree of accuracy when compared to other available cross-cultural tests. CCAST is useful in the diagnosis of alcoholism in Spanish-speaking immigrants living in countries where English is spoken. CCAST can be used in general hospitals, psychiatric wards, emergency services and police stations. The test can be useful for other professionals, such as psychological consultants, researchers, and those conducting expertise appraisal.
Development and reliability testing of the Worksite and Energy Balance Survey.

PubMed

Hoehner, Christine M; Budd, Elizabeth L; Marx, Christine M; Dodson, Elizabeth A; Brownson, Ross C

2013-01-01

Worksites represent important venues for health promotion. Development of psychometrically sound measures of worksite environments and policy supports for physical activity and healthy eating are needed for use in public health research and practice. Assess the test-retest reliability of the Worksite and Energy Balance Survey (WEBS), a self-report instrument for assessing perceptions of worksite supports for physical activity and healthy eating. The WEBS included items adapted from existing surveys or new items on the basis of a review of the literature and expert review. Cognitive interviews among 12 individuals were used to test the clarity of items and further refine the instrument. A targeted random-digit-dial telephone survey was administered on 2 occasions to assess test-retest reliability (mean days between time periods = 8; minimum = 5; maximum = 14). Five Missouri census tracts that varied by racial-ethnic composition and walkability. Respondents included 104 employed adults (67% white, 64% women, mean age = 48.6 years). Sixty-three percent were employed at worksites with less than 100 employees, approximately one-third supervised other people, and the majority worked a regular daytime shift (75%). Test-retest reliability was assessed using Spearman correlations for continuous variables, Cohen's κ statistics for nonordinal categorical variables, and 1-way random intraclass correlation coefficients for ordinal categorical variables. Test-retest coefficients ranged from 0.41 to 0.97, with 80% of items having reliability coefficients of more than 0.6. Items that assessed participation in or use of worksite programs/facilities tended to have lower reliability. Reliability of some items varied by gender, obesity status, and worksite size. Test-retest reliability and internal consistency for the 5 scales ranged from 0.84 to 0.94 and 0.63 to 0.84, respectively. The WEBS items and scales exhibited sound test-retest reliability and may be useful for research and surveillance. Further evaluation is needed to document the validity of the WEBS and associations with energy balance outcomes.
Changes in children's sleep domains between 2 and 3 years of age: the Ulm SPATZ Health Study.

PubMed

Braig, Stefanie; Urschitz, Michael S; Rothenbacher, Dietrich; Genuneit, Jon

2017-08-01

There is growing interest in the link between sleep habits and child health but reference values specific to toddlers as well as longitudinal data on sleep are scarce. We aimed to describe parent-reported child sleep habits and their intra-individual changes in two- to three-year-olds using data from a regional birth cohort study. In the Ulm SPATZ Health Study, a birth cohort study conducted at Ulm, Southern Germany, with baseline examination from April 2012 to May 2013, the German version of the Children's Sleep Habits Questionnaire (CSHQ-DE) was used longitudinally at follow-ups at two and three years (N = 615 children). Descriptive statistics including intra-individual differences between three- and two-year scores were reported, the latter using the sign test. The sample-averaged total and subscale CSHQ scores differed only slightly between two and three years (max. Cohen's d = 0.39). Intra-individual comparisons of the CSHQ subscales or single items revealed congruent but also opposing changes in items belonging to the same subscale. Whereas items on bedtime resistance generally improved, sleep duration shortened with older age. With regard to sleep anxiety, we found worsening in the item 'Afraid of sleeping in the dark' in about one-fifth of our children whereas other items on this CSHQ subscale showed an opposing trend with age. A similar opposing trend was detected within the subscale on night wakings. Our data provide initial descriptive information on sleep habits in toddlers. The high intra-individual changes, partly in opposing directions which may be masked by aggregation, indicate a need for age- and item-specific analyses. Copyright © 2017 Elsevier B.V. All rights reserved.
Differential age-related effects on conjunctive and relational visual short-term memory binding.

PubMed

Bastin, Christine

2017-12-28

An age-related associative deficit has been described in visual short-term binding memory tasks. However, separate studies have suggested that ageing disrupts relational binding (to associate distinct items or item and context) more than conjunctive binding (to integrate features within an object). The current study directly compared relational and conjunctive binding with a short-term memory task for object-colour associations in 30 young and 30 older adults. Participants studied a number of object-colour associations corresponding to their individual object span level in a relational task in which objects were associated to colour patches and a conjunctive task where colour was integrated into the object. Memory for individual items and for associations was tested with a recognition memory test. Evidence for an age-related associative deficit was observed in the relational binding task, but not in the conjunctive binding task. This differential impact of ageing on relational and conjunctive short-term binding is discussed by reference to two underlying age-related cognitive difficulties: diminished hippocampally dependent binding and attentional resources.
Binding of Visual and Spatial Short-Term Memory in Williams Syndrome and Moderate Learning Disability

ERIC Educational Resources Information Center

Jarrold, Christopher; Phillips, Caroline; Baddeley, Alan D

2007-01-01

A main aim of this study was to test the claim that individuals with Williams syndrome have selectively impaired memory for spatial as opposed to visual information. The performance of 16 individuals with Williams syndrome (six males, 10 females; mean age 18y 7mo [SD 7y 6mo], range 9y 1mo-30y 7mo) on tests of short-term memory for item and…
Overview of the Spinal Cord Injury--Quality of Life (SCI-QOL) measurement system.

PubMed

Tulsky, David S; Kisala, Pamela A; Victorson, David; Tate, Denise G; Heinemann, Allen W; Charlifue, Susan; Kirshblum, Steve C; Fyffe, Denise; Gershon, Richard; Spungen, Ann M; Bombardier, Charles H; Dyson-Hudson, Trevor A; Amtmann, Dagmar; Kalpakjian, Claire Z; Choi, Seung W; Jette, Alan M; Forchheimer, Martin; Cella, David

2015-05-01

The Spinal Cord Injury--Quality of Life (SCI-QOL) measurement system was developed to address the shortage of relevant and psychometrically sound patient reported outcome (PRO) measures available for clinical care and research in spinal cord injury (SCI) rehabilitation. Using a computer adaptive testing (CAT) approach, the SCI-QOL builds on the Patient Reported Outcomes Measurement Information System (PROMIS) and the Quality of Life in Neurological Disorders (Neuro-QOL) initiative. This initial manuscript introduces the background and development of the SCI-QOL measurement system. Greater detail is presented in the additional manuscripts of this special issue. Classical and contemporary test development methodologies were employed. Qualitative input was obtained from individuals with SCI and clinicians through interviews, focus groups, and cognitive debriefing. Item pools were field tested in a multi-site sample (n=877) and calibrated using item response theory methods. Initial reliability and validity testing was performed in a new sample of individuals with traumatic SCI (n=245). Five Model SCI System centers and one Department of Veterans Affairs Medical Center across the United States. Adults with traumatic SCI. n/a n/a The SCI-QOL consists of 19 item banks, including the SCI-Functional Index banks, and 3 fixed-length scales measuring physical, emotional, and social aspects of health-related QOL (HRQOL). The SCI-QOL measurement system consists of psychometrically sound measures for individuals with SCI. The manuscripts in this special issue provide evidence of the reliability and initial validity of this measurement system. The SCI-QOL also links to other measures designed for a general medical population.
Scale refinement and initial evaluation of a behavioral health function measurement tool for work disability evaluation.

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M

2013-09-01

To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Cross-sectional survey followed by IRT calibration data simulations. Community. Sample of individuals applying for Social Security Administration disability benefits: claimants (n=1015) and a normative comparative sample of U.S. adults (n=1000). None. SSA-BH measurement instrument. IRT analyses supported the unidimensionality of 4 SSA-BH scales: mood and emotions (35 items), self-efficacy (23 items), social interactions (6 items), and behavioral control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10-item computer adaptive tests with the full item bank indicated robust ability of the computer adaptive testing approach to comprehensively characterize behavioral health function along 4 distinct dimensions. Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all 4 scales. Behavioral function profiles of Social Security Administration claimants were generated and compared with age- and sex-matched norms along 4 scales: mood and emotions, behavioral control, social interactions, and self-efficacy. Using the computer adaptive test-based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the Social Security Administration's work disability programs. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Effects of Presentation Mode on Veridical and False Memory in Individuals with Intellectual Disability

ERIC Educational Resources Information Center

Carlin, Michael; Toglia, Michael P.; Belmonte, Colleen; DiMeglio, Chiara

2012-01-01

In the present study the effects of visual, auditory, and audio-visual presentation formats on memory for thematically constructed lists were assessed in individuals with intellectual disability and mental age-matched children. The auditory recognition test included target items, unrelated foils, and two types of semantic lures: critical related…
The Nursing Home Physical Performance Test: A Secondary Data Analysis of Women in Long-Term Care Using Item Response Theory.

PubMed

Perera, Subashan; Nace, David A; Resnick, Neil M; Greenspan, Susan L

2017-04-11

The Nursing Home Physical Performance Test (NHPPT) was developed to measure function among nursing home residents using sit-to-stand, scooping applesauce, face washing, dialing phone, putting on sweater, and ambulating tasks. Using item response theory, we explore its measurement characteristics at item level and opportunities for improvements. We used data from long-term care women. We fitted a graded response model, estimated parameters, and constructed probability and information curves. We identified items to be targeted toward lower and higher functioning persons to increase the range of abilities to which the instrument is applicable. We revised the scoring by making sit-to-stand and sweater items harder and dialing phone easier. We examined changes to concurrent validity with activities of daily living (ADL), frailty, and cognitive function. Participants were 86 years old, had more than three comorbidities, and a NHPPT of 19.4. All items had high discrimination and were targeted toward the lower middle range of performance continuum. After revision, sit-to-stand and sweater items demonstrated greater discrimination among the higher functioning and/or greater spread of thresholds for response categories. The overall test showed discrimination over a wider range of individuals. Concurrent validity correlation improved from 0.60 to 0.68 for instrumental ADL and explained variability (R2) from 22% to 36% for frailty. NHPPT has good measurement characteristics at the item level. NHPPT can be improved, implemented in computerized adaptive testing, and combined with self-report for greater utility, but a definitive study is needed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Psychometric Properties of the Persian Version of the Simple Shoulder Test (SST) Questionnaire.

PubMed

Ebrahimzadeh, Mohammad H; Vahedi, Ehsan; Baradaran, Aslan; Birjandinejad, Ali; Seyyed-Hoseinian, Seyyed-Hadi; Bagheri, Farshid; Kachooei, Amir Reza

2016-10-01

To validate the Persian version of the simple shoulder test in patients with shoulder joint problems. Following Beaton`s guideline, translation and back translation was conducted. We reached to a consensus on the Persian version of SST. To test the face validity in a pilot study, the Persian SST was administered to 20 individuals with shoulder joint conditions. We enrolled 148 consecutive patients with shoulder problem to fill the Persian SST, shoulder specific measure including Oxford shoulder score (OSS) and two general measures including DASH and SF-36. To measure the test-retest reliability, 42 patients were randomly asked to fill the Persian-SST for the second time after one week. Cronbach's alpha coefficient was used to demonstrate internal consistency over the 12 items of Persian-SST. ICC for the total questionnaire was 0.61 showing good and acceptable test-retest reliability. ICC for individual items ranged from 0.32 to 0.79. The total Cronbach's alpha was 0.84 showing good internal consistency over the 12 items of the Persian-SST. Validity testing showed strong correlation between SST and OSS and DASH. The correlation with OSS was positive while with DASH scores was negative. The correlation was also good to strong with all physical and most mental subscales of the SF-36. Correlation coefficient was higher with DASH and OSS in compare to SF-36. Persian version of SST found to be valid and reliable instrument for shoulder joint pain and function assessment in Iranian population.
Evaluation of the Fecal Incontinence Quality of Life Scale (FIQL) using item response theory reveals limitations and suggests revisions.

PubMed

Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A

2018-06-01

The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
The Detection and Influence of Problematic Item Content in Ability Tests: An Examination of Sensitivity Review Practices for Personnel Selection Test Development

ERIC Educational Resources Information Center

Grand, James A.; Golubovich, Juliya; Ryan, Ann Marie; Schmitt, Neal

2013-01-01

In organizational and educational practices, sensitivity reviews are commonly advocated techniques for reducing test bias and enhancing fairness. In the present paper, results from two studies are reported which investigate how effective individuals are at detecting problematic test content and the influence such content has on important testing…
Screening for depression in clinical practice: reliability and validity of a five-item subset of the CES-Depression.

PubMed

Bohannon, Richard W; Maljanian, Rose; Goethe, John

2003-12-01

Individuals with chronic disease are not screened routinely for depression. Availability of an abbreviated test with demonstrated reliability and validity might encourage screening so we explored the reliability and validity of a 5-item subset of the 20-item Center for Epidemiological Studies Depression Scale among inner-city outpatients with chronic asthma or diabetes. Most patients were female (73.1%) and Hispanic (61.8%). Acceptable reliability was shown by Cronbach alpha (.76) for the subset of 5 items. Validity was supported by the high correlation of .91 between patients' scores on the 5-item subset and the full 20 items. The 5 items reflected a single factor (eigenvalue = 2.66). Receiver operating characteristic curve analysis identified cut-points for the 5 items that were sensitive (> .84) and specific (> or = .80) in identifying patients classified as depressed by full 20 items. The reduced patient and clinician burden of the subset of 5 items, as well as its desirable psychometric properties, support broader application of this subset as a screening tool for depression.
Inconsistency in the items included in tools used in general health research and physical therapy to evaluate the methodological quality of randomized controlled trials: a descriptive analysis

PubMed Central

2013-01-01

Background Assessing the risk of bias of randomized controlled trials (RCTs) is crucial to understand how biases affect treatment effect estimates. A number of tools have been developed to evaluate risk of bias of RCTs; however, it is unknown how these tools compare to each other in the items included. The main objective of this study was to describe which individual items are included in RCT quality tools used in general health and physical therapy (PT) research, and how these items compare to those of the Cochrane Risk of Bias (RoB) tool. Methods We used comprehensive literature searches and a systematic approach to identify tools that evaluated the methodological quality or risk of bias of RCTs in general health and PT research. We extracted individual items from all quality tools. We calculated the frequency of quality items used across tools and compared them to those in the RoB tool. Comparisons were made between general health and PT quality tools using Chi-squared tests. Results In addition to the RoB tool, 26 quality tools were identified, with 19 being used in general health and seven in PT research. The total number of quality items included in general health research tools was 130, compared with 48 items across PT tools and seven items in the RoB tool. The most frequently included items in general health research tools (14/19, 74%) were inclusion and exclusion criteria, and appropriate statistical analysis. In contrast, the most frequent items included in PT tools (86%, 6/7) were: baseline comparability, blinding of investigator/assessor, and use of intention-to-treat analysis. Key items of the RoB tool (sequence generation and allocation concealment) were included in 71% (5/7) of PT tools, and 63% (12/19) and 37% (7/19) of general health research tools, respectively. Conclusions There is extensive item variation across tools that evaluate the risk of bias of RCTs in health research. Results call for an in-depth analysis of items that should be used to assess risk of bias of RCTs. Further empirical evidence on the use of individual items and the psychometric properties of risk of bias tools is needed. PMID:24044807
Inconsistency in the items included in tools used in general health research and physical therapy to evaluate the methodological quality of randomized controlled trials: a descriptive analysis.

PubMed

Armijo-Olivo, Susan; Fuentes, Jorge; Ospina, Maria; Saltaji, Humam; Hartling, Lisa

2013-09-17

Assessing the risk of bias of randomized controlled trials (RCTs) is crucial to understand how biases affect treatment effect estimates. A number of tools have been developed to evaluate risk of bias of RCTs; however, it is unknown how these tools compare to each other in the items included. The main objective of this study was to describe which individual items are included in RCT quality tools used in general health and physical therapy (PT) research, and how these items compare to those of the Cochrane Risk of Bias (RoB) tool. We used comprehensive literature searches and a systematic approach to identify tools that evaluated the methodological quality or risk of bias of RCTs in general health and PT research. We extracted individual items from all quality tools. We calculated the frequency of quality items used across tools and compared them to those in the RoB tool. Comparisons were made between general health and PT quality tools using Chi-squared tests. In addition to the RoB tool, 26 quality tools were identified, with 19 being used in general health and seven in PT research. The total number of quality items included in general health research tools was 130, compared with 48 items across PT tools and seven items in the RoB tool. The most frequently included items in general health research tools (14/19, 74%) were inclusion and exclusion criteria, and appropriate statistical analysis. In contrast, the most frequent items included in PT tools (86%, 6/7) were: baseline comparability, blinding of investigator/assessor, and use of intention-to-treat analysis. Key items of the RoB tool (sequence generation and allocation concealment) were included in 71% (5/7) of PT tools, and 63% (12/19) and 37% (7/19) of general health research tools, respectively. There is extensive item variation across tools that evaluate the risk of bias of RCTs in health research. Results call for an in-depth analysis of items that should be used to assess risk of bias of RCTs. Further empirical evidence on the use of individual items and the psychometric properties of risk of bias tools is needed.
Explanatory Item Response Modeling of Children's Change on a Dynamic Test of Analogical Reasoning

ERIC Educational Resources Information Center

Stevenson, Claire E.; Hickendorff, Marian; Resing, Wilma C. M.; Heiser, Willem J.; de Boeck, Paul A. L.

2013-01-01

Dynamic testing is an assessment method in which training is incorporated into the procedure with the aim of gauging cognitive potential. Large individual differences are present in children's ability to profit from training in analogical reasoning. The aim of this experiment was to investigate sources of these differences on a dynamic test of…
Differential Item Functioning Analysis for Accommodated versus Nonaccommodated Students

ERIC Educational Resources Information Center

Finch, Holmes; Barton, Karen; Meyer, Patrick

2009-01-01

The No Child Left Behind act resulted in an increased reliance on large-scale standardized tests to assess the progress of individual students as well as schools. In addition, emphasis was placed on including all students in the testing programs as well as those with disabilities. As a result, the role of testing accommodations has become more…
The Disaggregation of Value-Added Test Scores to Assess Learning Outcomes in Economics Courses

ERIC Educational Resources Information Center

Walstad, William B.; Wagner, Jamie

2016-01-01

This study disaggregates posttest, pretest, and value-added or difference scores in economics into four types of economic learning: positive, retained, negative, and zero. The types are derived from patterns of student responses to individual items on a multiple-choice test. The micro and macro data from the "Test of Understanding in College…
Project DIVIDE Instrument Development. Technical Report # 0810

ERIC Educational Resources Information Center

Ketterlin-Geller, Leanne; Jung, Eunju; Geller, Josh; Yovanoff, Paul

2008-01-01

In this technical report, we describe the development of cognitive diagnostic test items that form the basis of the diagnostic system for Project DIVIDE (Dynamic Instruction Via Individually Designed Environments). The construct underlying the diagnostic test is division of fractions. We include a description of the process we used to identify the…
Development and psychometric testing of the Canine Owner-Reported Quality of Life questionnaire, an instrument designed to measure quality of life in dogs with cancer.

PubMed

Giuffrida, Michelle A; Brown, Dorothy Cimino; Ellenberg, Susan S; Farrar, John T

2018-05-01

OBJECTIVE To describe development and initial psychometric testing of an owner-reported questionnaire designed to standardize measurement of general quality of life (QOL) in dogs with cancer. DESIGN Key-informant interviews, questionnaire development, and field trial. SAMPLE Owners of 25 dogs with cancer for item development and pretesting and owners of 90 dogs with cancer for reliability and validity testing. PROCEDURES Standard methods for development and testing of questionnaire instruments intended to measure subjective states were used. Items were generated, selected, scaled, and pretested for content, meaning, and readability. Response items were evaluated with exploratory factor analysis and by assessing internal consistency (Cronbach α) and convergence with global QOL as determined with a visual analog scale. Preliminary tests of stability and responsiveness were performed. RESULTS The final questionnaire-which was named the Canine Owner-Reported Quality of Life (CORQ) questionnaire-contained 17 items related to observable behaviors commonly used by owners to evaluate QOL in their dogs. Several items pertaining to physical symptoms performed poorly and were omitted. The 17 items were assigned to 4 factors-vitality, companionship, pain, and mobility-on the basis of the items they contained. The CORQ questionnaire and its factors had high internal consistency (Cronbach α = 0.68 to 0.90) and moderate to strong correlations (r = 0.49 to 0.71) with global QOL as measured on a visual analog scale. Preliminary testing indicated good test-retest reliability and responsiveness to improvements in overall QOL. CONCLUSIONS AND CLINICAL RELEVANCE The CORQ questionnaire was a valid, reliable owner-reported questionnaire that measured general QOL in dogs with cancer and showed promise as a clinical trial outcome measure for quantifying changes in individual dog QOL occurring in response to cancer treatment and progression.
Development of a brief, reliable and valid diet assessment tool for impaired glucose tolerance and diabetes: the UK Diabetes and Diet Questionnaire.

PubMed

England, Clare Y; Thompson, Janice L; Jago, Russ; Cooper, Ashley R; Andrews, Rob C

2017-02-01

Dietary advice is fundamental in the prevention and management of type 2 diabetes (T2DM). Advice is improved by individual assessment but existing methods are time-consuming and require expertise. We developed a twenty-five-item questionnaire, the UK Diabetes and Diet Questionnaire (UKDDQ), for quick assessment of an individual's diet. The present study examined the UKDDQ's repeatability and relative validity compared with 4 d food diaries. The UKDDQ was completed twice with a median 3 d gap (interquartile range=1-7 d) between tests. A 4 d food diary was completed after the second UKDDQ. Diaries were analysed and food groups were mapped on to the UKDDQ. Absolute agreement between total scores was examined using intra-class correlation (ICC). Agreement for individual items was tested with Cohen's weighted kappa (κ w). South West of England. Adults (n 177, 50·3 % women) with, or at high risk for, T2DM; mean age 55·8 (sd 8·6) years, mean BMI 34·4 (sd 7·3) kg/m2; participants were 91 % White British. The UKDDQ showed excellent repeatability (ICC=0·90 (0·82, 0·94)). For individual items, κ w ranged from 0·43 ('savoury pastries') to 0·87 ('vegetables'). Total scores from the UKDDQ and food diaries compared well (ICC=0·54 (0·27, 0·70)). Agreement for individual items varied and was good for 'alcohol' (κ w=0·71) and 'breakfast cereals' (κ w=0·70), with no agreement for 'vegetables' (κ w=0·08) or 'savoury pastries' (κ w=0·09). The UKDDQ is a new British dietary questionnaire with excellent repeatability. Comparisons with food diaries found agreements similar to those for international dietary questionnaires currently in use. It targets foods and habits important in diabetes prevention and management.
Measurement properties of the WOMAC LK 3.1 pain scale.

PubMed

Stratford, P W; Kennedy, D M; Woodhouse, L J; Spadoni, G F

2007-03-01

The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is applied extensively to patients with osteoarthritis of the hip or knee. Previous work has challenged the validity of its physical function scale however an extensive evaluation of its pain scale has not been reported. Our purpose was to estimate internal consistency, factorial validity, test-retest reliability, and the standard error of measurement (SEM) of the WOMAC LK 3.1 pain scale. Four hundred and seventy-four patients with osteoarthritis of the hip or knee awaiting arthroplasty were administered the WOMAC. Estimates of internal consistency (coefficient alpha), factorial validity (confirmatory factor analysis), and the SEM based on internal consistency (SEM(IC)) were obtained. Test-retest reliability [Type 2,1 intraclass correlation coefficients (ICC)] and a corresponding SEM(TRT) were estimated on a subsample of 36 patients. Our estimates were: internal consistency alpha=0.84; SEM(IC)=1.48; Type 2,1 ICC=0.77; SEM(TRT)=1.69. Confirmatory factor analysis failed to support a single factor structure of the pain scale with uncorrelated error terms. Two comparable models provided excellent fit: (1) a model with correlated error terms between the walking and stairs items, and between night and sit items (chi2=0.18, P=0.98); (2) a two factor model with walking and stairs items loading on one factor, night and sit items loading on a second factor, and the standing item loading on both factors (chi2=0.18, P=0.98). Our examination of the factorial structure of the WOMAC pain scale failed to support a single factor and internal consistency analysis yielded a coefficient less than optimal for individual patient use. An alternate strategy to summing the five-item responses when considering individual patient application would be to interpret item responses separately or to sum only those items which display homogeneity.
Development of the movement domain in the global body examination.

PubMed

Kvåle, Alice; Bunkan, Berit Heir; Opjordsmoen, Stein; Friis, Svein

2012-01-01

The purpose of this study was to develop a new Movement domain, based on 16 items from the Global Physiotherapy Examination-52 (GPE-52) and 18 items from the Comprehensive Body Examination (CBE). Furthermore, we examined how well the new domain and its scales would discriminate between healthy individuals and different groups of patients, compared to the original methods. Two physiotherapists, each using one method, independently examined 132 individuals (34 healthy, 32 with localized pain, 32 with generalized pain, and 34 with psychoses). The number of items was reduced by means of correlational and exploratory factor analysis. Internal consistency was examined with Cronbach's alpha. For examination of discriminative validity, Mann-Whitney U-test and Area under the Curve (AUC) were used. The initial 34 items were reduced to two subscales with 13 items: one for range of movement and balance and one for flexibility. Cronbach's alpha was 0.84 and 0.87 for the two subscales. The new subscales showed very good to excellent discriminating ability between healthy persons and the different patient groups (p < 0.001; AUC 0.82-0.95). Furthermore, patients with localized pain had significantly less movement aberrations than the other patient groups. The new Movement domain had fewer items than the GPE-52 and CBE, without losing discriminative validity.
Development and analysis of an instrument to assess student understanding of GOB chemistry knowledge relevant to clinical nursing practice.

PubMed

Brown, Corina E; Hyslop, Richard M; Barbera, Jack

2015-01-01

The General, Organic, and Biological Chemistry Knowledge Assessment (GOB-CKA) is a multiple-choice instrument designed to assess students' understanding of the chemistry topics deemed important to clinical nursing practice. This manuscript describes the development process of the individual items along with a psychometric evaluation of the final version of the items and instrument. In developing items for the GOB-CKA, essential topics were identified through a series of expert interviews (with practicing nurses, nurse educators, and GOB chemistry instructors) and confirmed through a national survey. Individual items were tested in qualitative studies with students from the target population for clarity and wording. Data from pilot and beta studies were used to evaluate each item and narrow the total item count to 45. A psychometric analysis performed on data from the 45-item final version was used to provide evidence of validity and reliability. The final version of the instrument has a Cronbach's alpha value of 0.76. Feedback from an expert panel provided evidence of face and content validity. Convergent validity was estimated by comparing the results from the GOB-CKA with the General-Organic-Biochemistry Exam (Form 2007) of the American Chemical Society. Instructors who wish to use the GOB-CKA for teaching and research may contact the corresponding author for a copy of the instrument. © 2014 Wiley Periodicals, Inc.
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.

PubMed

Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E

2018-02-02

In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Recalled Aspects of Original Encoding Strategies Influence Episodic Feeling of Knowing

PubMed Central

Hertzog, Christopher; Fulton, Erika K.; Sinclair, Starlette M.; Dunlosky, John

2013-01-01

We tested the hypothesis that feeling of knowing (FOK) after a failed recall attempt is influenced by recalling aspects of the original encoding strategy. Individuals were instructed to use interactive imagery to encode unrelated word pairs. We manipulated item concreteness (abstract versus concrete) and item repetition at study (1 versus 3). Participants orally described the mediator produced immediately after studying each item, if any. After a delay they were given cued recall, made FOK ratings, and attempted to recall their original mediator. Concreteness and item repetition enhanced strategy recall, which had a large effect on FOKs. Controlling on strategy recall reduced the predictive validity of FOKs for recognition memory, indicating that access to original aspects of encoding influenced FOK accuracy. Confidence judgments (CJs) for correctly recognized items covaried with FOKs, but FOKs did not fully track strategy recall associations with CJs, suggesting emergent effects of strategy cues elicited by recognition tests not accessed at the time of the FOK judgment. In summary, cue-generated access to aspects of the original encoding strategy strongly influenced episodic FOK, although other influences are also implicated. PMID:23835601
Recalled aspects of original encoding strategies influence episodic feelings of knowing.

PubMed

Hertzog, Christopher; Fulton, Erika K; Sinclair, Starlette M; Dunlosky, John

2014-01-01

We tested the hypothesis that the feeling of knowing (FOK) after a failed recall attempt is influenced by recalling aspects of the original encoding strategy. Individuals were instructed to use interactive imagery to encode unrelated word pairs. We manipulated item concreteness (abstract vs. concrete) and item repetitions at study (one vs. three). Participants orally described the mediator produced immediately after studying each item, if any. After a delay, they were given cued recall, made FOK ratings, and attempted to recall their original mediator. Concreteness and item repetition enhanced strategy recall, which had a large effect on FOKs. Controlling on strategy recall reduced the predictive validity of FOKs for recognition memory, indicating that access to the original aspects of encoding influenced FOK accuracy. Confidence judgments (CJs) for correctly recognized items covaried with FOKs, but FOKs did not fully track the strategy recall associations with CJs, suggesting emergent effects of strategy cues that were elicited by recognition tests but not accessed at the time of the FOK judgment. In summary, cue-generated access to aspects of the original encoding strategy strongly influenced episodic FOKs, although other influences were also implicated.
Dying to remember, remembering to survive: mortality salience and survival processing.

PubMed

Burns, Daniel J; Hart, Joshua; Kramer, Melanie E; Burns, Amy D

2014-01-01

Processing items for their relevance to survival improves recall for those items relative to numerous other deep processing encoding techniques. Perhaps related, placing individuals in a mortality salient state has also been shown to enhance retention of items encoded after the morality salience manipulation (e.g., in a pleasantness rating task), a phenomenon we dubbed the "dying-to-remember" (DTR) effect. The experiments reported here further explored the effect and tested the possibility that the DTR effect is related to survival processing. Experiment 1 replicated the effect using different encoding tasks, demonstrating that the effect is not dependent on the pleasantness task. In Experiment 2 the DTR effect was associated with increases in item-specific processing, not relational processing, according to several indices. Experiment 3 replicated the main results of Experiment 2, and tested the effects of mortality salience and survival processing within the same experiment. The DTR effect and its associated difference in item-specific processing were completely eliminated when the encoding task required survival processing. These results are consistent with the interpretation that the mechanisms responsible for survival processing and DTR effects are overlapping.
Associative memory in aging: the effect of unitization on source memory.

PubMed

Bastin, Christine; Diana, Rachel A; Simon, Jessica; Collette, Fabienne; Yonelinas, Andrew P; Salmon, Eric

2013-03-01

In normal aging, memory for associations declines more than memory for individual items. Unitization is an encoding process defined by creation of a new single entity to represent a new arbitrary association. The current study tested the hypothesis that age-related differences in associative memory can be reduced by encoding instructions that promote unitization. In two experiments, groups of 20 young and 20 older participants learned new associations between a word and a background color under two conditions. In the item detail condition, they had to imagine that the item is the same color as the background-an instruction promoting unitization of the associations. In the context detail condition, which did not promote unitization, they had to imagine that the item interacted with another colored object. At test, they had to retrieve the color that was associated with each word (source memory). In both experiments, the results showed an age-related decrement in source memory performance in the context detail but not in the item detail condition. Moreover, Experiment 2 examined receiver operating characteristics in older participants and indicated that familiarity contributed more to source memory performance in the item detail than in the context detail condition. These findings suggest that unitization of new associations can overcome the associative memory deficit observed in aging, at least for item-color associations.

An Examination of Income Effect on Consumers’ Ethical Evaluation of Counterfeit Drugs Buying Behaviour: A Cross-Sectional Study in Qatar and Sudan

PubMed Central

Alfadl, Abubakr Abdelraouf; Maraghi, Fatima Abdulla; Mohammad, Khadijah Shhab

2016-01-01

Introduction There are limited studies on consumer behaviour toward counterfeit products and the determining factors that motivate willingness to purchase counterfeit items. Aim This study aimed to fill this literature gap through studying differences in individual ethical evaluations of counterfeit drug purchase and whether that ethical evaluation affected by difference in income. It is hypothesized that individuals with lower/higher income make a more/less permissive evaluation of ethical responsibility regarding counterfeit drug purchase. Materials and Methods To empirically test the research assumption, a comparison was made between people who live in the low-income country Sudan and people who live in the high-income country Qatar. The study employed a face-to-face structured interview survey methodology to collect data from 1,170 subjects and the Sudanese and Qatari samples were compared using independent t-test at alpha level of 0.05 employing SPSS version 22.0. Results Sudanese and Qatari individuals were significantly different on all items. Sudanese individuals scored below 3 for all Awareness of Societal Consequences (ASC) items indicating that they make more permissive evaluation of ethical responsibility regarding counterfeit drug purchase. Both groups shared a basic positive moral agreement regarding subjective norm indicating that influence of income is not evident. Conclusion Findings indicate that low-income individuals make more permissive evaluation of ethical responsibility regarding counterfeit drugs purchase when highlighting awareness of societal consequences used as a deterrent tool, while both low and high-income individuals share a basic positive moral agreement when subjective norm dimension is exploited to discourage unethical buying behaviour. PMID:27790465
Development of a measure of asthma-specific quality of life among adults.

PubMed

Eberhart, Nicole K; Sherbourne, Cathy D; Edelen, Maria Orlando; Stucky, Brian D; Sin, Nancy L; Lara, Marielena

2014-04-01

A key goal in asthma treatment is improvement in quality of life (QoL), but existing measures often confound QoL with symptoms and functional impairment. The current study addresses these limitations and the need for valid patient-reported outcome measures by using state-of-the-art methods to develop an item bank assessing QoL in adults with asthma. This article describes the process for developing an initial item pool for field testing. Five focus group interviews were conducted with a total of 50 asthmatic adults. We used "pile sorting/binning" and "winnowing" methods to identify key QoL dimensions and develop a pool of items based on statements made in the focus group interviews. We then conducted a literature review and consulted with an expert panel to ensure that no key concepts were omitted. Finally, we conducted individual cognitive interviews to ensure that items were well understood and inform final item refinement. Six hundred and sixty-one QoL statements were identified from focus group interview transcripts and subsequently used to generate a pool of 112 items in 16 different content areas. Items covering a broad range of content were developed that can serve as a valid gauge of individuals' perceptions of the effects of asthma and its treatment on their lives. These items do not directly measure symptoms or functional impairment, yet they include a broader range of content than most existent measures of asthma-specific QoL.
Test-retest reliability at the item level and total score level of the Norwegian version of the Spinal Cord Injury Falls Concern Scale (SCI-FCS).

PubMed

Roaldsen, Kirsti Skavberg; Måøy, Åsa Blad; Jørgensen, Vivien; Stanghelle, Johan Kvalvik

2016-05-01

Translation of the Spinal Cord Injury Falls Concern Scale (SCI-FCS), and investigation of test-retest reliability on item-level and total-score-level. Translation, adaptation and test-retest study. A specialized rehabilitation setting in Norway. Fifty-four wheelchair users with a spinal cord injury. The median age of the cohort was 49 years, and the median number of years after injury was 13. Interventions/measurements: The SCI-FCS was translated and back-translated according to guidelines. Individuals answered the SCI-FCS twice over the course of one week. We investigated item-level test-retest reliability using Svensson's rank-based statistical method for disagreement analysis of paired ordinal data. For relative reliability, we analyzed the total-score-level test-retest reliability with intraclass correlation coefficients (ICC2.1), the standard error of measurement (SEM), and the smallest detectable change (SDC) for absolute reliability/measurement-error assessment and Cronbach's alpha for internal consistency. All items showed satisfactory percentage agreement (≥69%) between test and retest. There were small but non-negligible systematic disagreements among three items; we recovered an 11-13% higher chance for a lower second score. There was no disagreement due to random variance. The test-retest agreement (ICC2.1) was excellent (0.83). The SEM was 2.6 (12%), and the SDC was 7.1 (32%). The Cronbach's alpha was high (0.88). The Norwegian SCI-FCS is highly reliable for wheelchair users with chronic spinal cord injuries.
Comparative validity and repeatability of a single question, a twenty-eight-item FFQ and estimated food records to assess takeaway meal intake.

PubMed

Cook, Amelia S; McCook, Rochelle; Petocz, Peter; O'Leary, Fiona; Allman-Farinelli, Margaret

2016-11-01

A single question (SQ) and a twenty-eight-item FFQ to measure takeaway meal intake were compared with two 7-d estimated food records (EFR; reference method). Test methods were completed after the reference period and repeated 6-8 d later for repeatability. The SQ asked about intake of high-SFA takeaway meals. FFQ items included low- and high-SFA meals. Test methods were compared with EFR for sensitivity, specificity, and positive and negative predictive values, using a goal of ≤1 high-SFA weekly takeaway meals. Bland-Altman analyses were used to check agreement between measurement approaches, the κ coefficient was used to summarise the observed level of agreement, and Spearman's correlation was used to assess the degree to which instruments ranked individuals. Young adults were recruited from two universities, and 109 participants (61 % female) completed the study. The mean age was 24·4 (sd 4·9) years, and the mean BMI was 23·5 (sd 3·7) kg/m2. The SQ and the FFQ had a sensitivity of 97 and 83 % and a specificity of 46 and 92 %, respectively. Both methods exhibited moderate correlation for measuring total and high-SFA takeaway meal intakes (r s ranging from 0·64 to 0·80). Neither instrument could measure precise, absolute intake at the group or individual level. Test methods ranged from fair (κ w =0·24) to moderate agreement (κ w =0·59). The repeatability for all was acceptable. The FFQ identified excessive high-SFA takeaway meal intake and measured individuals' category for total and high-SFA takeaway intakes. Both methods are suitable for ranking individuals for total or high-SFA takeaway meal intakes.
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations.

PubMed

Teresi, Jeanne A; Ocepek-Welikson, Katja; Cook, Karon F; Kleinman, Marjorie; Ramirez, Mildred; Reid, M Carrington; Siu, Albert

2016-01-01

Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System ® (PROMIS ® ) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, "How much did pain interfere with enjoyment of social activities?" was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta ( θ ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness.
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations

PubMed Central

Teresi, Jeanne A.; Ocepek-Welikson, Katja; Cook, Karon F.; Kleinman, Marjorie; Ramirez, Mildred; Reid, M. Carrington; Siu, Albert

2017-01-01

Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System® (PROMIS®) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. Methods DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. Results The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, “How much did pain interfere with enjoyment of social activities?” was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta (θ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Conclusions Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness. PMID:28983449
The EORTC CAT Core-The computer adaptive version of the EORTC QLQ-C30 questionnaire.

PubMed

Petersen, Morten Aa; Aaronson, Neil K; Arraras, Juan I; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Dirven, Linda; Fayers, Peter; Gamper, Eva-Maria; Giesinger, Johannes M; Habets, Esther J J; Hammerlid, Eva; Helbostad, Jorunn; Hjermstad, Marianne J; Holzner, Bernhard; Johnson, Colin; Kemmler, Georg; King, Madeleine T; Kaasa, Stein; Loge, Jon H; Reijneveld, Jaap C; Singer, Susanne; Taphoorn, Martin J B; Thamsborg, Lise H; Tomaszewski, Krzysztof A; Velikova, Galina; Verdonck-de Leeuw, Irma M; Young, Teresa; Groenvold, Mogens

2018-06-21

To optimise measurement precision, relevance to patients and flexibility, patient-reported outcome measures (PROMs) should ideally be adapted to the individual patient/study while retaining direct comparability of scores across patients/studies. This is achievable using item banks and computerised adaptive tests (CATs). The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Core 30 (QLQ-C30) is one of the most widely used PROMs in cancer research and clinical practice. Here we provide an overview of the research program to develop CAT versions of the QLQ-C30's 14 functional and symptom domains. The EORTC Quality of Life Group's strategy for developing CAT item banks consists of: literature search to identify potential candidate items; formulation of new items compatible with the QLQ-C30 item style; expert evaluations and patient interviews; field-testing and psychometric analyses, including factor analysis, item response theory calibration and simulation of measurement properties. In addition, software for setting up, running and scoring CAT has been developed. Across eight rounds of data collections, 9782 patients were recruited from 12 countries for the field-testing. The four phases of development resulted in a total of 260 unique items across the 14 domains. Each item bank consists of 7-34 items. Psychometric evaluations indicated higher measurement precision and increased statistical power of the CAT measures compared to the QLQ-C30 scales. Using CAT, sample size requirements may be reduced by approximately 20-35% on average without loss of power. The EORTC CAT Core represents a more precise, powerful and flexible measurement system than the QLQ-C30. It is currently being validated in a large independent, international sample of cancer patients. Copyright © 2018 Elsevier Ltd. All rights reserved.
Exploratory Item Classification Via Spectral Graph Clustering

PubMed Central

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2017-01-01

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Improving Healthcare Transition Planning and Health-Related Independence for Youth with ASD and their Families

DTIC Science & Technology

2015-10-01

volunteers) recruited Objective 5: Develop and test focus group & individual interview guide; train staff on protocol and procedure • Caregiver and young...and individual items will then be evaluated and revised based on finds from cognitive interviewing and full-scale pretesting . 15. SUBJECT TERMS...first modality assessed caregiver perspectives on health-related transitioning using focus groups . The second modality included individual interviews
Emotional vitality in caregivers: application of Rasch Measurement Theory with secondary data to development and test a new measure.

PubMed

Barbic, Skye P; Bartlett, Susan J; Mayo, Nancy E

2015-07-01

To describe the practical steps in identifying items and evaluating scoring strategies for a new measure of emotional vitality in informal caregivers of individuals who have experienced a significant health event. The psychometric properties of responses to selected items from validated health-related quality of life and other psychosocial questionnaires administered four times over a one-year period were evaluated using Rasch Measurement Theory. Community. A total of 409 individuals providing informal care at home to older adults who had experienced a recent stroke. Rasch Measurement Theory was used to test the ordering of response option thresholds, fit, spread of the item locations, residual correlations, person separation index, and stability across time. Based on a theoretical framework developed in earlier work, we identified 22 candidate items from a pool of relevant psychosocial measures available. Of these, additional evaluation resulted in 19 items that could be used to assess the five core domains. The overall model fit was reasonable (χ(2) = 202.26, DF = 117, p = 0.06), stable across time, with borderline evidence of multidimensionality (10%). Items and people covered a continuum ranging from -3.7 to +2.7 logits, reflecting coverage of the measurement continuum, with a person separation index of 0.85. Mean fit of caregivers was lower than expected (-1.31 ±1.10 logits). Established methods from the Rasch Measurement Theory were applied to develop a prototype measure of emotional vitality that is acceptable, reliable, and can be used to obtain an interval level score for use in future research and clinical settings. © The Author(s) 2014.
The role of relational binding in item memory: evidence from face recognition in a case of developmental amnesia.

PubMed

Olsen, Rosanna K; Lee, Yunjo; Kube, Jana; Rosenbaum, R Shayna; Grady, Cheryl L; Moscovitch, Morris; Ryan, Jennifer D

2015-04-01

Current theories state that the hippocampus is responsible for the formation of memory representations regarding relations, whereas extrahippocampal cortical regions support representations for single items. However, findings of impaired item memory in hippocampal amnesics suggest a more nuanced role for the hippocampus in item memory. The hippocampus may be necessary when the item elements need to be bound within and across episodes to form a lasting representation that can be used flexibly. The current investigation was designed to test this hypothesis in face recognition. H.C., an individual who developed with a compromised hippocampal system, and control participants incidentally studied individual faces that either varied in presentation viewpoint across study repetitions or remained in a fixed viewpoint across the study repetitions. Eye movements were recorded during encoding and participants then completed a surprise recognition memory test. H.C. demonstrated altered face viewing during encoding. Although the overall number of fixations made by H.C. was not significantly different from that of controls, the distribution of her viewing was primarily directed to the eye region. Critically, H.C. was significantly impaired in her ability to subsequently recognize faces studied from variable viewpoints, but demonstrated spared performance in recognizing faces she encoded from a fixed viewpoint, implicating a relationship between eye movement behavior in the service of a hippocampal binding function. These findings suggest that a compromised hippocampal system disrupts the ability to bind item features within and across study repetitions, ultimately disrupting recognition when it requires access to flexible relational representations. Copyright © 2015 the authors 0270-6474/15/355342-09$15.00/0.
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed

Feinberg, Richard A; Clauser, Amanda L

2016-10-01

In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.
Building the BIKE: Development and Testing of the Biotechnology Instrument for Knowledge Elicitation (BIKE)

NASA Astrophysics Data System (ADS)

Witzig, Stephen B.; Rebello, Carina M.; Siegel, Marcelle A.; Freyermuth, Sharyn K.; Izci, Kemal; McClure, Bruce

2014-10-01

Identifying students' conceptual scientific understanding is difficult if the appropriate tools are not available for educators. Concept inventories have become a popular tool to assess student understanding; however, traditionally, they are multiple choice tests. International science education standard documents advocate that assessments should be reform based, contain diverse question types, and should align with instructional approaches. To date, no instrument of this type targeting student conceptions in biotechnology has been developed. We report here the development, testing, and validation of a 35-item Biotechnology Instrument for Knowledge Elicitation (BIKE) that includes a mix of question types. The BIKE was designed to elicit student thinking and a variety of conceptual understandings, as opposed to testing closed-ended responses. The design phase contained nine steps including a literature search for content, student interviews, a pilot test, as well as expert review. Data from 175 students over two semesters, including 16 student interviews and six expert reviewers (professors from six different institutions), were used to validate the instrument. Cronbach's alpha on the pre/posttest was 0.664 and 0.668, respectively, indicating the BIKE has internal consistency. Cohen's kappa for inter-rater reliability among the 6,525 total items was 0.684 indicating substantial agreement among scorers. Item analysis demonstrated that the items were challenging, there was discrimination among the individual items, and there was alignment with research-based design principles for construct validity. This study provides a reliable and valid conceptual understanding instrument in the understudied area of biotechnology.
Defining and validating a short form Montreal Cognitive Assessment (s-MoCA) for use in neurodegenerative disease

PubMed Central

Roalf, David R; Moore, Tyler M; Wolk, David A; Arnold, Steven E; Mechanic-Hamilton, Dawn; Rick, Jacqueline; Kabadi, Sushila; Ruparel, Kosha; Chen-Plotkin, Alice S; Chahine, Lama M; Dahodwala, Nabila A; Duda, John E; Weintraub, Daniel A; Moberg, Paul J

2016-01-01

Introduction Screening for cognitive deficits is essential in neurodegenerative disease. Screening tests, such as the Montreal Cognitive Assessment (MoCA), are easily administered, correlate with neuropsychological performance and demonstrate diagnostic utility. Yet, administration time is too long for many clinical settings. Methods Item response theory and computerised adaptive testing simulation were employed to establish an abbreviated MoCA in 1850 well-characterised community-dwelling individuals with and without neurodegenerative disease. Results 8 MoCA items with high item discrimination and appropriate difficulty were identified for use in a short form (s-MoCA). The s-MoCA was highly correlated with the original MoCA, showed robust diagnostic classification and cross-validation procedures substantiated these items. Discussion Early detection of cognitive impairment is an important clinical and public health concern, but administration of screening measures is limited by time constraints in demanding clinical settings. Here, we provide as-MoCA that is valid across neurological disorders and can be administered in approximately 5 min. PMID:27071646
Differential Item Functioning in Primary Healthcare Evaluation Instruments by French/English Version, Educational Level and Urban/Rural Location

PubMed Central

Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.

2011-01-01

Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
Effects of Presentation Mode and Computer Familiarity on Summarization of Extended Texts

ERIC Educational Resources Information Center

Yu, Guoxing

2010-01-01

Comparability studies on computer- and paper-based reading tests have focused on short texts and selected-response items via almost exclusively statistical modeling of test performance. The psychological effects of presentation mode and computer familiarity on individual students are under-researched. In this study, 157 students read extended…
[Development of an Atypical Response Scale.

ERIC Educational Resources Information Center

Mendelsohn, Mark; Linden, James

The development of an objective diagnostic scale to measure atypical behavior is discussed. The Atypical Response Scale (ARS) is a structured projective test consisting of 17 items, each weighted 1, 2, or 3, that were tested for convergence and reliability. ARS may be individually or group administered in 10-15 minutes; hand scoring requires 90…
Overview of the Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system

PubMed Central

Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Tate, Denise G.; Heinemann, Allen W.; Charlifue, Susan; Kirshblum, Steve C.; Fyffe, Denise; Gershon, Richard; Spungen, Ann M.; Bombardier, Charles H.; Dyson-Hudson, Trevor A.; Amtmann, Dagmar; Z. Kalpakjian, Claire; W. Choi, Seung; Jette, Alan M.; Forchheimer, Martin; Cella, David

2015-01-01

Context/Objective The Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system was developed to address the shortage of relevant and psychometrically sound patient reported outcome (PRO) measures available for clinical care and research in spinal cord injury (SCI) rehabilitation. Using a computer adaptive testing (CAT) approach, the SCI-QOL builds on the Patient Reported Outcomes Measurement Information System (PROMIS) and the Quality of Life in Neurological Disorders (Neuro-QOL) initiative. This initial manuscript introduces the background and development of the SCI-QOL measurement system. Greater detail is presented in the additional manuscripts of this special issue. Design Classical and contemporary test development methodologies were employed. Qualitative input was obtained from individuals with SCI and clinicians through interviews, focus groups, and cognitive debriefing. Item pools were field tested in a multi-site sample (n = 877) and calibrated using item response theory methods. Initial reliability and validity testing was performed in a new sample of individuals with traumatic SCI (n = 245). Setting Five Model SCI System centers and one Department of Veterans Affairs Medical Center across the United States. Participants Adults with traumatic SCI. Interventions n/a Outcome Measures n/a Results The SCI-QOL consists of 19 item banks, including the SCI-Functional Index banks, and 3 fixed-length scales measuring physical, emotional, and social aspects of health-related QOL (HRQOL). Conclusion The SCI-QOL measurement system consists of psychometrically sound measures for individuals with SCI. The manuscripts in this special issue provide evidence of the reliability and initial validity of this measurement system. The SCI-QOL also links to other measures designed for a general medical population. PMID:26010962
Fitting Item Response Theory Models to Two Personality Inventories: Issues and Insights.

PubMed

Chernyshenko, O S; Stark, S; Chan, K Y; Drasgow, F; Williams, B

2001-10-01

The present study compared the fit of several IRT models to two personality assessment instruments. Data from 13,059 individuals responding to the US-English version of the Fifth Edition of the Sixteen Personality Factor Questionnaire (16PF) and 1,770 individuals responding to Goldberg's 50 item Big Five Personality measure were analyzed. Various issues pertaining to the fit of the IRT models to personality data were considered. We examined two of the most popular parametric models designed for dichotomously scored items (i.e., the two- and three-parameter logistic models) and a parametric model for polytomous items (Samejima's graded response model). Also examined were Levine's nonparametric maximum likelihood formula scoring models for dichotomous and polytomous data, which were previously found to provide good fits to several cognitive ability tests (Drasgow, Levine, Tsien, Williams, & Mead, 1995). The two- and three-parameter logistic models fit some scales reasonably well but not others; the graded response model generally did not fit well. The nonparametric formula scoring models provided the best fit of the models considered. Several implications of these findings for personality measurement and personnel selection were described.
Social contagion of correct and incorrect information in memory.

PubMed

Rush, Ryan A; Clark, Steven E

2014-01-01

The present study examines how discussion between individuals regarding a shared memory affects their subsequent individual memory reports. In three experiments pairs of participants recalled items from photographs of common household scenes, discussed their recall with each other, and then recalled the items again individually. Results showed that after the discussion. individuals recalled more correct items and more incorrect items, with very small non-significant increases, or no change, in recall accuracy. The information people were exposed to during the discussion was generally accurate, although not as accurate as individuals' initial recall. Individuals incorporated correct exposure items into their subsequent recall at a higher rate than incorrect exposure items. Participants who were initially more accurate became less accurate, and initially less-accurate participants became more accurate as a result of their discussion. Comparisons to no-discussion control groups suggest that the effects were not simply the product of repeated recall opportunities or self-cueing, but rather reflect the transmission of information between individuals.

Competition strength influences individual preferences in an auction game

PubMed Central

Toelch, Ulf; Jubera-Garcia, Esperanza; Kurth-Nelson, Zeb; Dolan, Raymond J.

2014-01-01

Competitive interactions between individuals are ubiquitous in human societies. Auctions represent an institutionalized context for these interactions, a context where individuals frequently make non-optimal decisions. In particular, competition in auctions can lead to overbidding, resulting in the so-called winner’s curse, often explained by invoking emotional arousal. In this study, we investigated an alternative possibility, namely that competitors’ bids are construed as a source of information about the good’s common value thereby influencing an individuals’ private value estimate. We tested this hypothesis by asking participants to bid in a repeated all-pay auction game for five different real items. Crucially, participants had to rank the auction items for their preference before and after the experiment. We observed a clear relation between auction dynamics and preference change. We found that low competition reduced preference while high competition increased preference. Our findings support a view that competitors’ bids in auction games are perceived as valid social signal for the common value of an item. We suggest that this influence of social information constitutes a major cause for the frequently observed deviations from optimality in auctions. PMID:25168161
The use of cognitive ability measures as explanatory variables in regression analysis.

PubMed

Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J

2012-12-01

Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual's wage, or a decision such as an individual's education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score , constructed via standard psychometric practice from individuals' responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a "mixed effects structural equations" (MESE) model, may be more appropriate in many circumstances.
Using Classical Test Theory and Item Response Theory to Evaluate the LSCI

NASA Astrophysics Data System (ADS)

Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS

2011-01-01

Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
What–where–when memory and encoding strategies in healthy aging

PubMed Central

2016-01-01

Older adults exhibit disproportionate impairments in memory for item-associations. These impairments may stem from an inability to self-initiate deep encoding strategies. The present study investigates this using the “treasure-hunt task”; a what–where–when style episodic memory test that requires individuals to “hide” items around complex scenes. This task separately assesses memory for item, location, and temporal order, as well as bound what–where–when information. The results suggest that older adults are able to ameliorate integration memory deficits by using self-initiated encoding strategies when these are externally located and therefore place reduced demands on working memory and attentional resources. PMID:26884230
Leveling the playing field: attention mitigates the effects of intelligence on memory.

PubMed

Markant, Julie; Amso, Dima

2014-05-01

Effective attention and memory skills are fundamental to typical development and essential for achievement during the formal education years. It is critical to identify the specific mechanisms linking efficiency of attentional selection of an item and the quality of its memory retention. The present study capitalized on the spatial cueing paradigm to examine the role of selection via suppression in modulating children and adolescents' memory encoding. By varying a single parameter, the spatial cueing task can elicit either a simple orienting mechanism (i.e., facilitation) or one that involves both target selection and simultaneous suppression of competing information (i.e., IOR). We modified this paradigm to include images of common items in target locations. Participants were not instructed to learn the items and were not told they would be completing a memory test later. Following the cueing task, we imposed a 7-min delay and then asked participants to complete a recognition memory test. Results indicated that selection via suppression promoted recognition memory among 7-17year-olds. Moreover, individual differences in the extent of suppression during encoding predicted recognition memory accuracy. When basic cueing facilitated orienting to target items during encoding, IQ was the best predictor of recognition memory performance for the attended items. In contrast, engaging suppression (i.e., IOR) during encoding counteracted individual differences in intelligence, effectively improving recognition memory performance among children with lower IQs. This work demonstrates that engaging selection via suppression during learning and encoding improves memory retention and has broad implications for developing effective educational techniques. Copyright © 2014 Elsevier B.V. All rights reserved.
Leveling the playing field: Attention mitigates the effects of intelligence on memory

PubMed Central

Markant, Julie; Amso, Dima

2014-01-01

Effective attention and memory skills are fundamental to typical development and essential for achievement during the formal education years. It is critical to identify the specific mechanisms linking efficiency of attentional selection of an item and the quality of its memory retention. The present study capitalized on the spatial cueing paradigm to examine the role of selection via suppression in modulating children and adolescents’ memory encoding. By varying a single parameter, the spatial cueing task can elicit either a simple orienting mechanism (i.e., facilitation) or one that involves both target selection and simultaneous suppression of competing information (i.e., IOR). We modified this paradigm to include images of common items in target locations. Participants were not instructed to learn the items and were not told they would be completing a memory test later. Following the cueing task, we imposed a seven-minute delay and then asked participants to complete a recognition memory test. Results indicated that selection via suppression promoted recognition memory among 7-17 year-olds. Moreover, individual differences in the extent of suppression during encoding predicted recognition memory accuracy. When basic cueing facilitated orienting to target items during encoding, IQ was the best predictor of recognition memory performance for the attended items. In contrast, engaging suppression (i.e, IOR) during encoding counteracted individual differences in intelligence, effectively improving recognition memory performance among children with lower IQs. This work demonstrates that engaging selection via suppression during learning and encoding improves memory retention and has broad implications for developing effective educational techniques. PMID:24549142
The effect of response modality on immediate serial recall in dementia of the Alzheimer type.

PubMed

Macé, Anne-Laure; Ergis, Anne-Marie; Caza, Nicole

2012-09-01

Contrary to traditional models of verbal short-term memory (STM), psycholinguistic accounts assume that temporary retention of verbal materials is an intrinsic property of word processing. Therefore, memory performance will depend on the nature of the STM tasks, which vary according to the linguistic representations they engage. The aim of this study was to explore the effect of response modality on verbal STM performance in individuals with dementia of the Alzheimer Type (DAT), and its relationship with the patients' word-processing deficits. Twenty individuals with mild DAT and 20 controls were tested on an immediate serial recall (ISR) task using the same items across two response modalities (oral and picture pointing) and completed a detailed language assessment. When scoring of ISR performance was based on item memory regardless of item order, a response modality effect was found for all participants, indicating that they recalled more items with picture pointing than with oral response. However, this effect was less marked in patients than in controls, resulting in an interaction. Interestingly, when recall of both item and order was considered, results indicated similar performance between response modalities in controls, whereas performance was worse for pointing than for oral response in patients. Picture-naming performance was also reduced in patients relative to controls. However, in the word-to-picture matching task, a similar pattern of responses was found between groups for incorrectly named pictures of the same items. The finding of a response modality effect in item memory for all participants is compatible with the assumption that semantic influences are greater in picture pointing than in oral response, as predicted by psycholinguistic models. Furthermore, patients' performance was modulated by their word-processing deficits, showing a reduced advantage relative to controls. Overall, the response modality effect observed in this study for item memory suggests that verbal STM performance is intrinsically linked with word processing capacities in both healthy controls and individuals with mild DAT, supporting psycholinguistic models of STM.
The Survey of Treatment Entry Pressures (STEP): identifying client's reasons for entering substance abuse treatment.

PubMed

Dugosh, Karen Leggett; Festinger, David S; Lynch, Kevin G; Marlowe, Douglas B

2014-10-01

Systematically identifying reasons that clients enter substance abuse treatment may allow clinicians to immediately focus on issues of greatest relevance to the individual and enhance treatment engagement. We developed the Survey of Treatment Entry Pressures (STEP) to identify the specific factors that precipitated an individual's treatment entry. The instrument contains 121 items from 6 psychosocial domains (i.e., family, financial, social, medical, psychiatric, legal). The current study examined the STEP's psychometric properties. A total of 761 participants from various treatment settings and modalities completed the STEP prior to treatment admission and 4-7 days later. Analyses were performed to examine the instrument's psychometric properties including item response rates, test-retest reliability, internal consistency, and factor structure. The items displayed adequate test-retest reliability and internal consistency within each psychosocial domain. Generally, results from exploratory and confirmatory factor analyses support a 2-factor structure reflecting type of reinforcement schedule. The study provides preliminary support for the psychometric properties of the STEP. The STEP may provide a reliable way for clinicians to characterize and capitalize on a client's treatment motivation early on which may serve to improve treatment retention and therapeutic outcomes. © 2014 Wiley Periodicals, Inc.
Recognizing and treating anal cancer: training medical students and physicians in Puerto Rico.

PubMed

Ortiz, Ana P; Guiot, Humberto M; Díaz-Miranda, Olga L; Román, Leticia; Palefsky, Joel; Colón-López, Vivian

2013-12-01

This training activity aimed at increasing the knowledge of anal cancer screening, diagnostic and treatment options in medical students and physicians, to determine the interest of these individuals in receiving training in the diagnosis and treatment of anal cancer, and to explore any previous training and/or experience with both anal cancer and clinical trials that these individuals might have. An educational activity (1.5 contact hours) was attended by a group of medical students, residents and several faculty members, all from the Medical Sciences Campus of the University of Puerto Rico (n = 50). A demographic survey and a 6-item pre- and post-test on anal cancer were given to assess knowledge change. Thirty-four participants (68%) answered the survey. Mean age was 29.6 +/- 6.6 years; 78.8% had not received training in anal cancer screening, 93.9% reported being interested in receiving anal cancer training, and 75.8% expressed an interest in leading or conducting a clinical trial. A significant increase in the test scores was observed after the educational activity (pre-test: 3.4 +/- 1.2; post-test: 4.7 +/- 0.71). Three of the items showed an increase in knowledge by the time the post-test was taken. The first of these items assessed the participants' knowledge regarding the existence of any guidelines for the screening/treatment of patients with human papillomavirus (HPV)-related anal disease. The second of these items attempted to determine whether the participants recognized that anal intraepithelial neoplasia (AIN) 2 is considered to be a high-grade neoplasia. The last of the 3 items was aimed at ascertaining whether or not the participants were aware that warty growths in the anus are not necessarily a manifestation of high-grade AIN. This educational activity increased the participants' knowledge of anal cancer and revealed, as well, that most of the participants were interested in future training and in collaborating in a clinical trial. Training physicians from Puerto Rico on anal cancer clinical trials is essential to encourage recruitment of Hispanic patients in these studies now that the guidelines in anal cancer screening and treatment are on their way to be defined.
Nondestructive testing techniques

NASA Astrophysics Data System (ADS)

Bray, Don E.; McBride, Don

A comprehensive reference covering a broad range of techniques in nondestructive testing is presented. Based on years of extensive research and application at NASA and other government research facilities, the book provides practical guidelines for selecting the appropriate testing methods and equipment. Topics discussed include visual inspection, penetrant and chemical testing, nuclear radiation, sonic and ultrasonic, thermal and microwave, magnetic and electromagnetic techniques, and training and human factors. (No individual items are abstracted in this volume)
Development and validation of a socioculturally competent trust in physician scale for a developing country setting.

PubMed

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-05-03

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. To develop and validate a new trust in physician scale for a developing country setting. Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. The final 12 item trust in physician scale has a good construct validity and internal consistency. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Development and validation of a socioculturally competent trust in physician scale for a developing country setting

PubMed Central

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-01-01

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. Objectives To develop and validate a new trust in physician scale for a developing country setting. Methods Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Results Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. Conclusions The final 12 item trust in physician scale has a good construct validity and internal consistency. PMID:25941182
Mathematical skill in individuals with Williams Syndrome: Evidence from a standardized mathematics battery

PubMed Central

O’Hearn, Kirsten; Landau, Barbara

2007-01-01

Williams syndrome (WS) is a developmental disorder associated with relatively spared verbal skills and severe visuospatial deficits. It has also been reported that individuals with WS are impaired at mathematics. We examined mathematical skills in persons with WS using the second edition of the Test of Early Mathematical Ability (TEMA-2), which measures a wide range of skills. We administered the TEMA-2 to 14 individuals with WS and 14 children matched individually for mental age on the matrices subtest of the Kaufman Brief Intelligence Test. There were no differences between groups on the overall scores on the TEMA-2. However, an item-by-item analysis revealed group differences. Participants with WS performed more poorly than controls when reporting which of two numbers was closest to a target number, a task thought to utilize a mental number line subserved by the parietal lobe, consistent with previous evidence showing parietal abnormalities in people with WS. In contrast, people with WS performed better than the control group at reading numbers, suggesting that verbal math skills may be comparatively strong in WS. These findings add to evidence that components of mathematical knowledge may be differentially damaged in developmental disorders. PMID:17482333
Mathematical skill in individuals with Williams syndrome: evidence from a standardized mathematics battery.

PubMed

O'Hearn, Kirsten; Landau, Barbara

2007-08-01

Williams syndrome (WS) is a developmental disorder associated with relatively spared verbal skills and severe visuospatial deficits. It has also been reported that individuals with WS are impaired at mathematics. We examined mathematical skills in persons with WS using the second edition of the Test of Early Mathematical Ability (TEMA-2), which measures a wide range of skills. We administered the TEMA-2 to 14 individuals with WS and 14 children matched individually for mental-age on the matrices subtest of the Kaufman Brief Intelligence Test. There were no differences between groups on the overall scores on the TEMA-2. However, an item-by-item analysis revealed group differences. Participants with WS performed more poorly than controls when reporting which of two numbers was closest to a target number, a task thought to utilize a mental number line subserved by the parietal lobe, consistent with previous evidence showing parietal abnormalities in people with WS. In contrast, people with WS performed better than the control group at reading numbers, suggesting that verbal math skills may be comparatively strong in WS. These findings add to evidence that components of mathematical knowledge may be differentially damaged in developmental disorders.
How well do TTM measures work among a sample of individuals with unhealthy alcohol use that is characterized by low readiness to change?

PubMed

Baumann, Sophie; Gaertner, Beate; Schnuerer, Inga; Bischof, Gallus; John, Ulrich; Freyer-Adam, Jennis

2013-09-01

Little is known about the applicability of the transtheoretical model of intentional behavior change (TTM) to individuals with unhealthy alcohol use that is primarily characterized by low readiness to change. This study examined the psychometric properties of short measures by assessing three core constructs of the TTM: the 20-item Processes of Change (POC-20) scale, and short versions of the Alcohol Decisional Balance Scale (ADBS) and the Alcohol Abstinence Self-Efficacy (AASE) scale. A sample of 427 individuals with unhealthy alcohol use (Mage = 30 years, 65% men), identified at job agencies in northeastern Germany, completed all three scales. Item difficulty (d), selectivity (rit), and Cronbach's alpha were calculated. Confirmatory factory analyses were used to test for construct validity and latent mean differences across the stages. The psychometric properties of the 8-item AASE were adequate (d range: 0.59-0.78; rit range: 0.59-0.68; α range: 0.74-0.81), except for one subscale. Most items of the POC-20 and the 10-item ADBS were difficult (dPOC range: 0.08-0.40; dADBS range: 0.21-0.58); selectivity (ritPOC range: 0.26-0.62; ritADBS range: 0.34-0.68) and internal consistency (αPOC range: 0.41-0.76; αADBS range: 0.64-0.78) were low to moderate. Construct validity was acceptable (Comparative Fit Index range: 0.95-0.99). The association between stages and TTM constructs partially followed expected patterns. Suggestions for modifications of TTM measures are discussed for better applicability among proactively recruited samples of individuals with unhealthy alcohol use and with primarily low readiness to change. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Exploring the Relevance of Items in the Communicative Participation Item Bank (CPIB) for Individuals With Hearing Loss

PubMed Central

Baylor, Carolyn R.; Birch, Kristen; Yorkston, Kathryn M.

2017-01-01

Purpose The Communicative Participation Item Bank (CPIB) was developed to evaluate participation restrictions in communication situations for individuals with speech and language disorders. This study evaluated the potential relevance of CPIB items for individuals with hearing loss. Method Cognitive interviews were conducted with 17 adults with a range of treated and untreated hearing loss, who responded to 46 items. Interviews were continued until saturation was reached and prevalent trends emerged. A focus group was also conducted with 3 experienced audiologists to seek their views on the CPIB. Analysis of data included qualitative and quantitative approaches. Results The majority of the items were applicable to individuals with hearing loss; however, 12 items were identified as potentially not relevant. This was largely attributed to the items' focus on speech production rather than hearing. The results from the focus group were in agreement for a majority of items. Conclusions The next step in validating the CPIB for individuals with hearing loss is a psychometric analysis on a large sample. Possible outcomes could be that the CPIB is considered valid in its entirety or the creation of a new questionnaire or a hearing loss–specific short form with a subset of items is necessary. PMID:28114665
Development and initial validation of the appropriate antibiotic use self-efficacy scale.

PubMed

Hill, Erin M; Watkins, Kaitlin

2018-06-04

While there are various medication self-efficacy scales that exist, none assess self-efficacy for appropriate antibiotic use. The Appropriate Antibiotic Use Self-Efficacy Scale (AAUSES) was developed, pilot tested, and its psychometric properties were examined. Following pilot testing of the scale, a 28-item questionnaire was examined using a sample (n = 289) recruited through the Amazon Mechanical Turk platform. Participants also completed other scales and items, which were used in assessing discriminant, convergent, and criterion-related validity. Test-retest reliability was also examined. After examining the scale and removing items that did not assess appropriate antibiotic use, an exploratory factor analysis was conducted on 13 items from the original scale. Three factors were retained that explained 65.51% of the variance. The scale and its subscales had adequate internal consistency. The scale had excellent test-retest reliability, as well as demonstrated convergent, discriminant, and criterion-related validity. The AAUSES is a valid and reliable scale that assesses three domains of appropriate antibiotic use self-efficacy. The AAUSES may have utility in clinical and research settings in understanding individuals' beliefs about appropriate antibiotic use and related behavioral correlates. Future research is needed to examine the scale's utility in these settings. Copyright © 2018 Elsevier B.V. All rights reserved.
Psychometric properties of WHOQOL-BREF in clinical and health Greek populations: incorporating new culture-relevant items.

PubMed

Ginieri-Coccossis, M; Triantafillou, E; Tomaras, V; Soldatos, C; Mavreas, V; Christodoulou, G

2012-01-01

Τhe present study examines main psychometric properties of the World Health Organisation (WHO) quality of life (QoL) instrument, the WHOQOL-BREF with the inclusion of four national items. Participants were 425 adult native Greek speaking, grouped into patients with physical disorders, psychiatric disorders and healthy individuals. Participants were administered WHOQOL-BREF and 23 national items, the General Health Questionnaire (GHQ-28) and the Life Satisfaction Index (LSI). Confirmatory factor analysis produced acceptable fit values for the original model of 26 items within the four WHOQOL domains: physical health, psychological health, social relationships and environment. Testing for the fit of national items within this model, the results indicated four new items with the most satisfactory fit indices and were thus included forming a 30-items version. The national items refer to: (a) nutrition, (b) satisfaction with work (both loaded in the physical health domain), (c) home life and (d) social life (both loaded in the social relationships domain). Statistical tests were applied to the 26- and 30-items versions producing satisfactory results, with the 30-items version showing slightly better values. Furthermore, results on the 30-items version included: (a) internal consistency, which was found satisfactory, with alpha values ranging from α=0.67-0.81, while the inclusion of new items produced higher alpha values in physical health and social relationships domains, (b) construct validity with good item-domain correlations, as well as strong correlations between domain scores, (c) convergent validity, which was very satisfactory, showing good correlations with GHQ-28 and LSI, (d) discriminant validity, showing instrument's ability to detect QoL differences between healthy and unhealthy participants, and between physically ill and psychiatric patients, and (e) test-retest reliability, with ICC scores in excess of 0.80 obtaining for all domains. The WHOQOL-BREF Greek version was found to perform well with sick and healthy participants, demonstrating satisfactory psychometric properties. Use of the instrument may be recommended for clinical and general populations, for service or intervention evaluation, as well as for cross-cultural clinical trials.
The Impact of Cooperative Quizzes in a Large Introductory Astronomy Course for Non Science Majors

NASA Astrophysics Data System (ADS)

Zeilik, Michael; Morris, Vicky J.

In Astronomy 101 at the University of New Mexico, we carried out a repeated-items experiment on quizzes and tests to investigate the impact of cooperative testing. This trial was the only change in a reformed course format that had been refined over previous semesters. Our research questions were: Did cooperative quizzes result in gains for the class overall? Did these gains "stick" within the semester? In the spring and fall semesters of 2000, students took quizzes individually and in cooperative learning teams, and tests individually. Normalized gain, , on the quizzes averaged about 0.4, and effect size about 0.8 (approximately a 10% increase in class mean score). Repeating selected quiz items on a subsequent test demonstrated that the gain was sustained over a month in both semesters. In addition, we compared demographics of UNM students with those of the National Astronomy Diagnostic Test project. We found that UNM students are similar to the national sample, except in ethnicity (more Hispanic American, fewer White). Based on these results, we judge that our cooperative quiz strategy will likely succeed in other "Astro 101" classes.
The effects of diazepam and oxprenolol on short term memory in individuals of high and low state anxiety.

PubMed Central

Desai, N; Taylor-Davies, A; Barnett, D B

1983-01-01

1 The effect of oral doses of diazepam (5 mg) and oxprenolol (80 mg) on short term memory of normal individuals stratified for 'state' anxiety levels has been investigated. 2 Normal student volunteers were stratified into high and low anxiety groups on the basis of responses to the Spielberger 'A-state' scale. Subjects were then randomly administered active drug or placebo and given a form of running memory test performed under a variety of conditions in which variable rate of item presentation and articulatory suppression were used. 3 Diazepam significantly reduced the errors of recall in the running memory test in the high anxiety group and produced a distinct separation of response from the low anxiety group under the test conditions of slow item presentation with articulatory suppression. Oxprenolol had no effect on the short term memory test in either high or low anxiety groups in any experimental test situation. 4 These results are compared to previous work in which generally a deleterious effect of diazepam on short term memory in normal volunteers has been reported. The implications of these findings are further discussed in relationship to possible models of memory function. PMID:6849754

Retrieval Practice Fails to Insulate Episodic Memories against Interference after Stroke.

PubMed

Pastötter, Bernhard; Eberle, Hanna; Aue, Ingo; Bäuml, Karl-Heinz T

2017-01-01

Recent work in cognitive psychology showed that retrieval practice of previously studied information can insulate this information against retroactive interference from subsequently studied other information in healthy individuals. The present study examined whether this beneficial effect of interference reduction is also present in patients with stroke. Twenty-two patients with stroke, 4.6 months post injury on average, and 22 healthy controls participated in the experiment. In each of two experimental sessions, participants first studied a list of items (list 1) and then underwent a practice phase in which the list 1 items were either restudied or retrieval practiced. Participants then either studied a second list of items (list 2) or fulfilled an unrelated distractor task. Recall of the two lists' items was assessed in a final criterion test. Results showed that, in healthy controls, additional study of list 2 items impaired final recall of list 1 items in the restudy condition but not in the retrieval practice condition. In contrast, in patients with stroke, list 2 learning impaired final list 1 recall in both conditions. The results indicate that retrieval practice insulated the tested information against retroactive interference in healthy controls, but failed to do so in patients with stroke. Possible implications of the findings for the understanding of long-term memory impairment after stroke are discussed.
An electrophysiological signature of summed similarity in visual working memory.

PubMed

van Vugt, Marieke K; Sekuler, Robert; Wilson, Hugh R; Kahana, Michael J

2013-05-01

Summed-similarity models of short-term item recognition posit that participants base their judgments of an item's prior occurrence on that item's summed similarity to the ensemble of items on the remembered list. We examined the neural predictions of these models in 3 short-term recognition memory experiments using electrocorticographic/depth electrode recordings and scalp electroencephalography. On each experimental trial, participants judged whether a test face had been among a small set of recently studied faces. Consistent with summed-similarity theory, participants' tendency to endorse a test item increased as a function of its summed similarity to the items on the just-studied list. To characterize this behavioral effect of summed similarity, we successfully fit a summed-similarity model to individual participant data from each experiment. Using the parameters determined from fitting the summed-similarity model to the behavioral data, we examined the relation between summed similarity and brain activity. We found that 4-9 Hz theta activity in the medial temporal lobe and 2-4 Hz delta activity recorded from frontal and parietal cortices increased with summed similarity. These findings demonstrate direct neural correlates of the similarity computations that form the foundation of several major cognitive theories of human recognition memory. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Development and evaluation of the Korean Health Literacy Instrument.

PubMed

Kang, Soo Jin; Lee, Tae Wha; Paasche-Orlow, Michael K; Kim, Gwang Suk; Won, Hee Kwan

2014-01-01

The purpose of this study is to develop and validate the Korean Health Literacy Instrument, which measures the capacity to understand and use health-related information and make informed health decisions in Korean adults. In Phase 1, 33 initial items were generated to measure functional, interactive, and critical health literacy with prose, document, and numeracy tasks. These items included content from health promotion, disease management, and health navigation contexts. Content validity assessment was conducted by an expert panel, and 11 items were excluded. In Phase 2, the 22 remaining items were administered to a convenience sample of 292 adults from community and clinical settings. Exploratory factor and item difficulty and discrimination analyses were conducted and four items with low discrimination were deleted. In Phase 3, the remaining 18 items were administered to a convenience sample of 315 adults 40-64 years of age from community and clinical settings. A confirmatory factor analysis was performed to test the construct validity of the instrument. The Korean Health Literacy Instrument has a range of 0 to 18. The mean score in our validation study was 11.98. The instrument exhibited an internal consistency reliability coefficient of 0.82, and a test-retest reliability of 0.89. The instrument is suitable for screening individuals who have limited health literacy skills. Future studies are needed to further define the psychometric properties and predictive validity of the Korean Health Literacy Instrument.
Pedagogy of Science Teaching Tests: Formative assessments of science teaching orientations

NASA Astrophysics Data System (ADS)

Cobern, William W.; Schuster, David; Adams, Betty; Skjold, Brandy Ann; Zeynep Muğaloğlu, Ebru; Bentz, Amy; Sparks, Kelly

2014-09-01

A critical aspect of teacher education is gaining pedagogical content knowledge of how to teach science for conceptual understanding. Given the time limitations of college methods courses, it is difficult to touch on more than a fraction of the science topics potentially taught across grades K-8, particularly in the context of relevant pedagogies. This research and development work centers on constructing a formative assessment resource to help expose pre-service teachers to a greater number of science topics within teaching episodes using various modes of instruction. To this end, 100 problem-based, science pedagogy assessment items were developed via expert group discussions and pilot testing. Each item contains a classroom vignette followed by response choices carefully crafted to include four basic pedagogies (didactic direct, active direct, guided inquiry, and open inquiry). The brief but numerous items allow a substantial increase in the number of science topics that pre-service students may consider. The intention is that students and teachers will be able to share and discuss particular responses to individual items, or else record their responses to collections of items and thereby create a snapshot profile of their teaching orientations. Subsets of items were piloted with students in pre-service science methods courses, and the quantitative results of student responses were spread sufficiently to suggest that the items can be effective for their intended purpose.
International epidemiology of child and adolescent psychopathology ii: integration and applications of dimensional findings from 44 societies.

PubMed

Rescorla, Leslie; Ivanova, Masha Y; Achenbach, Thomas M; Begovac, Ivan; Chahed, Myriam; Drugli, May Britt; Emerich, Deisy Ribas; Fung, Daniel S S; Haider, Mariam; Hansson, Kjell; Hewitt, Nohelia; Jaimes, Stefanny; Larsson, Bo; Maggiolini, Alfio; Marković, Jasminka; Mitrović, Dragan; Moreira, Paulo; Oliveira, João Tiago; Olsson, Martin; Ooi, Yoon Phaik; Petot, Djaouida; Pisa, Cecilia; Pomalima, Rolando; da Rocha, Marina Monzani; Rudan, Vlasta; Sekulić, Slobodan; Shahini, Mimoza; de Mattos Silvares, Edwiges Ferreira; Szirovicza, Lajos; Valverde, José; Vera, Luis Anderssen; Villa, Maria Clara; Viola, Laura; Woo, Bernardine S C; Zhang, Eugene Yuqing

2012-12-01

To build on Achenbach, Rescorla, and Ivanova (2012) by (a) reporting new international findings for parent, teacher, and self-ratings on the Child Behavior Checklist, Youth Self-Report, and Teacher's Report Form; (b) testing the fit of syndrome models to new data from 17 societies, including previously underrepresented regions; (c) testing effects of society, gender, and age in 44 societies by integrating new and previous data; (d) testing cross-society correlations between mean item ratings; (e) describing the construction of multisociety norms; (f) illustrating clinical applications. Confirmatory factor analyses (CFAs) of parent, teacher, and self-ratings, performed separately for each society; tests of societal, gender, and age effects on dimensional syndrome scales, DSM-oriented scales, Internalizing, Externalizing, and Total Problems scales; tests of agreement between low, medium, and high ratings of problem items across societies. CFAs supported the tested syndrome models in all societies according to the primary fit index (Root Mean Square Error of Approximation [RMSEA]), but less consistently according to other indices; effect sizes were small-to-medium for societal differences in scale scores, but very small for gender, age, and interactions with society; items received similarly low, medium, or high ratings in different societies; problem scores from 44 societies fit three sets of multisociety norms. Statistically derived syndrome models fit parent, teacher, and self-ratings when tested individually in all 44 societies according to RMSEAs (but less consistently according to other indices). Small to medium differences in scale scores among societies supported the use of low-, medium-, and high-scoring norms in clinical assessment of individual children. Copyright © 2012 American Academy of Child and Adolescent Psychiatry. Published by Elsevier Inc. All rights reserved.
Preference index supported by motivation tests in Nile tilapia

PubMed Central

2017-01-01

The identification of animal preferences is assumed to provide better rearing environments for the animals in question. Preference tests focus on the frequency of approaches or the time an animal spends in proximity to each item of the investigated resource during a multiple-choice trial. Recently, a preference index (PI) was proposed to differentiate animal preferences from momentary responses (Sci Rep, 2016, 6:28328, DOI: 10.1038/srep28328). This index also quantifies the degree of preference for each item. Each choice response is also weighted, with the most recent responses weighted more heavily, but the index includes the entire bank of tests, and thus represents a history-based approach. In this study, we compared this PI to motivation tests, which consider how much effort is expended to access a resource. We performed choice tests over 7 consecutive days for 34 Nile tilapia fish that presented with different colored compartments in each test. We first detected the preferred and non-preferred colors of each fish using the PI and then tested their motivation to reach these compartments. We found that fish preferences varied individually, but the results were consistent with the motivation profiles, as individual fish were more motivated (the number of touches made on transparent, hinged doors that prevented access to the resource) to access their preferred items. On average, most of the 34 fish avoided the color yellow and showed less motivation to reach yellow and red colors. The fish also exhibited greater motivation to access blue and green colors (the most preferred colors). These results corroborate the PI as a reliable tool for the identification of animal preferences. We recommend this index to animal keepers and researchers to identify an animal’s preferred conditions. PMID:28426689
Preference index supported by motivation tests in Nile tilapia.

PubMed

Maia, Caroline Marques; Volpato, Gilson Luiz

2017-01-01

The identification of animal preferences is assumed to provide better rearing environments for the animals in question. Preference tests focus on the frequency of approaches or the time an animal spends in proximity to each item of the investigated resource during a multiple-choice trial. Recently, a preference index (PI) was proposed to differentiate animal preferences from momentary responses (Sci Rep, 2016, 6:28328, DOI: 10.1038/srep28328). This index also quantifies the degree of preference for each item. Each choice response is also weighted, with the most recent responses weighted more heavily, but the index includes the entire bank of tests, and thus represents a history-based approach. In this study, we compared this PI to motivation tests, which consider how much effort is expended to access a resource. We performed choice tests over 7 consecutive days for 34 Nile tilapia fish that presented with different colored compartments in each test. We first detected the preferred and non-preferred colors of each fish using the PI and then tested their motivation to reach these compartments. We found that fish preferences varied individually, but the results were consistent with the motivation profiles, as individual fish were more motivated (the number of touches made on transparent, hinged doors that prevented access to the resource) to access their preferred items. On average, most of the 34 fish avoided the color yellow and showed less motivation to reach yellow and red colors. The fish also exhibited greater motivation to access blue and green colors (the most preferred colors). These results corroborate the PI as a reliable tool for the identification of animal preferences. We recommend this index to animal keepers and researchers to identify an animal's preferred conditions.
Road March Performance of Special Operations Soldiers Carrying Various Loads and Load Distributions

DTIC Science & Technology

1993-01-01

groups were used (Ramos and Knaplk, 1979; Knapik et al,, 1980 ; Hermansen et al., 1972), In the hand-grip test, the 7 soldier, in a seated position...Inventory (DIshman et al,, 1980 ). The POMS was a 65-item questionnaire which provided measures of six mood states, Soldiers scored each item on a five-point...estimates require individual calibration (Acheson et al., 1980 ) and heart rate can be influenced by a number of factors including training state (Saltin
A partner-related risk behavior index to identify people at elevated risk for sexually transmitted infections.

PubMed

Crosby, Richard; Shrier, Lydia A

2013-04-01

The purpose of this study was to develop and test a sexual-partner-related risk behavior index to identify high-risk individuals most likely to have a sexually transmitted infection (STI). Patients from five STI and adolescent medical clinics in three US cities were recruited (N = 928; M age = 29.2 years). Data were collected using audio-computer-assisted self-interviewing. Of seven sexual-partner-related variables, those that were significantly associated with the outcomes were combined into a partner-related risk behavior index. The dependent variables were laboratory-confirmed infection with Chlamydia trachomatis, Neisseria gonorrhoeae, and/or Trichomonas vaginalis. Nearly one-fifth of the sample (169/928; 18.4%) tested positive for an STI. Three of the seven items were significantly associated with having one or more STIs: sex with a newly released prisoner, sex with a person known or suspected of having an STI, and sexual concurrency. In combined form, this three-item index was significantly associated with STI prevalence (p < .001). In the presence of three covariates (gender, race, and age), those classified as being at-risk by the index were 1.8 times more likely than those not classified as such to test positive for an STI (p < .001). Among individuals at risk for STIs, a three-item index predicted testing positive for one or more of three STIs. This index could be used to prioritize and guide intensified clinic-based counseling for high-risk patients of STI and other clinics.
Filter Leaf. Operational Control Tests for Wastewater Treatment Facilities. Instructor's Manual [and] Student Workbook.

ERIC Educational Resources Information Center

Wooley, John F.

In the operation of vacuum filters and belt filters, it is desirable to evaluate the performance of different types of filter media and conditioning processes. The filter leaf test, which is used to evaluate these items, is described. Designed for individuals who have completed National Pollutant Discharge Elimination System (NPDES) level 1…
Short-Form Philadelphia Naming Test: Rationale and Empirical Evaluation

ERIC Educational Resources Information Center

Walker, Grant M.; Schwartz, Myrna F.

2012-01-01

Purpose: To create two matched short forms of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) that yield similar results to the PNT for measuring anomia. Method: In Study 1, archived naming data from 94 individuals with aphasia were used to identify which PNT items should be included in the short forms. The 2…
How to Compare Parametric and Nonparametric Person-Fit Statistics Using Real Data

ERIC Educational Resources Information Center

Sinharay, Sandip

2017-01-01

Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed…
Cross-Cultural Comparisons of the Motivation of Young Children to Achieve in School.

ERIC Educational Resources Information Center

Adkins, Dorothy C.

Research on the differences in motivation to achieve in school among 10 groups of four-year-olds utilized a new, 75-item objective projective test called Gumpgookies. This test was individually administered to approximately 2000 children mainly from low economic backgrounds. The various ethnic and religious groups were compared with respect to…
The Reward-Based Eating Drive Scale: A Self-Report Index of Reward-Based Eating

PubMed Central

Mason, Ashley E.; Laraia, Barbara A.; Hartman, William; Ready, Karen; Acree, Michael; Adam, Tanja C.; St. Jeor, Sachiko; Kessler, David

2014-01-01

Why are some individuals more vulnerable to persistent weight gain and obesity than are others? Some obese individuals report factors that drive overeating, including lack of control, lack of satiation, and preoccupation with food, which may stem from reward-related neural circuitry. These are normative and common symptoms and not the sole focus of any existing measures. Many eating scales capture these common behaviors, but are confounded with aspects of dysregulated eating such as binge eating or emotional overeating. Across five studies, we developed items that capture this reward-based eating drive (RED). Study 1 developed the items in lean to obese individuals (n = 327) and examined changes in weight over eight years. In Study 2, the scale was further developed and expert raters evaluated the set of items. Study 3 tested psychometric properties of the final 9 items in 400 participants. Study 4 examined psychometric properties and race invariance (n = 80 women). Study 5 examined psychometric properties and age/gender invariance (n = 381). Results showed that RED scores correlated with BMI and predicted earlier onset of obesity, greater weight fluctuations, and greater overall weight gain over eight years. Expert ratings of RED scale items indicated that the items reflected characteristics of reward-based eating. The RED scale evidenced high internal consistency and invariance across demographic factors. The RED scale, designed to tap vulnerability to reward-based eating behavior, appears to be a useful brief tool for identifying those at higher risk of weight gain over time. Given the heterogeneity of obesity, unique brief profiling of the reward-based aspect of obesity using a self-report instrument such as the RED scale may be critical for customizing effective treatments in the general population. PMID:24979216
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed Central

Feinberg, Richard A.; Clauser, Amanda L.

2016-01-01

ABSTRACT Background In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. Objective The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Methods Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Results Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Conclusions Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation. PMID:27777664
Development of a German reading span test with dual task design for application in cognitive hearing research.

PubMed

Carroll, Rebecca; Meis, Markus; Schulte, Michael; Vormann, Matthias; Kießling, Jürgen; Meister, Hartmut

2015-02-01

To report the development of a standardized German version of a reading span test (RST) with a dual task design. Special attention was paid to psycholinguistic control of the test items and time-sensitive scoring. We aim to establish our RST version to use for determining an individual's working memory in the framework of hearing research in German contexts. RST stimuli were controlled and pretested for psycholinguistic factors. The RST task was to read sentences, quickly determine their plausibility, and later recall certain words to determine a listener's individual reading span. RST results were correlated with outcomes of additional sentence-in-noise tests measured in an aided and an unaided listening condition, each at two reception thresholds. Item plausibility was pre-determined by 28 native German participants. An additional 62 listeners (45-86 years, M = 69.8) with mild-to-moderate hearing loss were tested for speech intelligibility and reading span in a multicenter study. The reading span test significantly correlated with speech intelligibility at both speech reception thresholds in the aided listening condition. Our German RST is standardized with respect to psycholinguistic construction principles of the stimuli, and is a cognitive correlate of intelligibility in a German matrix speech-in-noise test.
Measuring pain in the context of homelessness

PubMed Central

Matter, Rebecca; Kline, Susan; Cook, Karon F.; Amtmann, Dagmar

2009-01-01

Purpose The primary objective of this study was to inform the development of measures of pain impact appropriate for all respondents, including homeless individuals, so that they can be used in clinical research and practice. The secondary objective was to increase understanding about the unique experience of homeless people with pain. Methods Seventeen homeless individuals with chronic health conditions (often associated with pain) participated in cognitive interviews to test the functioning of 56 pain measurement items and provided information about their experience living with and accessing treatment for pain. Results The most common problems identified with items were that they lacked clarity or were irrelevant in the context of homelessness. Items that were unclear, irrelevant and/or had other identified problems made it difficult for participants to respond. Participants also described multiple ways in which their pain was exacerbated by conditions of homelessness and identified barriers to accessing appropriate treatment. Conclusions Results suggested that the majority of items were problematic for the homeless and require substantial modifications to make the pain impact bank relevant to this population. Additional recommendations include involving homeless in future item bank development, conducting research on the topic of pain and homelessness, and using cognitive interviewing in other types of health disparities research. PMID:19582592
Response Mixture Modeling: Accounting for Heterogeneity in Item Characteristics across Response Times.

PubMed

Molenaar, Dylan; de Boeck, Paul

2018-06-01

In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
Object representations in visual working memory change according to the task context.

PubMed

Balaban, Halely; Luria, Roy

2016-08-01

This study investigated whether an item's representation in visual working memory (VWM) can be updated according to changes in the global task context. We used a modified change detection paradigm, in which the items moved before the retention interval. In all of the experiments, we presented identical color-color conjunction items that were arranged to provide a common fate Gestalt grouping cue during their movement. Task context was manipulated by adding a condition highlighting either the integrated interpretation of the conjunction items or their individuated interpretation. We monitored the contralateral delay activity (CDA) as an online marker of VWM. Experiment 1 employed only a minimal global context; the conjunction items were integrated during their movement, but then were partially individuated, at a late stage of the retention interval. The same conjunction items were perfectly integrated in an integration context (Experiment 2). An individuation context successfully produced strong individuation, already during the movement, overriding Gestalt grouping cues (Experiment 3). In Experiment 4, a short priming of the individuation context managed to individuate the conjunction items immediately after the Gestalt cue was no longer available. Thus, the representations of identical items changed according to the task context, suggesting that VWM interprets incoming input according to global factors which can override perceptual cues. Copyright © 2016 Elsevier Ltd. All rights reserved.
The influence of strategic encoding on false memory in patients with mild cognitive impairment and Alzheimer's disease dementia.

PubMed

Tat, Michelle J; Soonsawat, Anothai; Nagle, Corinne B; Deason, Rebecca G; O'Connor, Maureen K; Budson, Andrew E

2016-11-01

Patients with Alzheimer's disease (AD) dementia exhibit high rates of memory distortions in addition to their impairments in episodic memory. Several investigations have demonstrated that when healthy individuals (young and old) engaged in an encoding strategy that emphasized the uniqueness of study items (an item-specific encoding strategy), they were able to improve their discrimination between old items and unstudied critical lure items in a false memory task. In the present study we examined if patients with AD could also improve their memory discrimination when engaging in an item-specific encoding strategy. Healthy older adult controls, patients with mild cognitive impairment (MCI) due to AD, and patients with mild AD dementia were asked to study lists of categorized words. In the Item-Specific condition, participants were asked to provide a unique detail or personal experience with each study item. In the Relational condition, they were asked to determine how each item in the list was related to the others. To assess the influence of both strategies, recall and recognition memory tests were administered. Overall, both patient groups exhibited poorer memory in both recall and recognition tests compared to controls. In terms of recognition, healthy older controls and patients with MCI due to AD exhibited improved memory discrimination in the Item-Specific condition compared to the Relational condition, whereas patients with AD dementia did not. We speculate that patients with MCI due to AD use intact frontal networks to effectively engage in this strategy. Published by Elsevier Inc.

Scale Refinement and Initial Evaluation of a Behavioral Health Function Measurement Tool for Work Disability Evaluation

PubMed Central

Marfeo, Elizabeth E.; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Rasch, Elizabeth K.; Brandt, Diane E.; Jette, Alan M.

2014-01-01

Objectives To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Design Cross-sectional survey followed by item response theory (IRT) calibration data simulations Setting Community Participants A sample of individuals applying for SSA disability benefits, claimants (N=1015), and a normative comparative sample of US adults (N=1000) Interventions None. Main Outcome Measure Social Security Administration Behavioral Health Function (SSA-BH) measurement instrument Results Item response theory analyses supported the unidimensionality of four SSA-BH scales: Mood and Emotions (35 items), Self-Efficacy (23 items), Social Interactions (6 items), and Behavioral Control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10- item CATs with the full item bank indicated robust ability of the CAT approach to comprehensively characterize behavioral health function along four distinct dimensions. Conclusions Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all four scales. Behavioral function profiles of SSA claimants were generated and compared to age and sex matched norms along four scales: Mood and Emotions, Behavioral Control, Social Interactions, and Self-Efficacy. Utilizing the CAT based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the SSA’s work disability programs. PMID:23542404
Assessment of nasalance and nasality in patients with a repaired cleft palate.

PubMed

Sinko, Klaus; Gruber, Maike; Jagsch, Reinhold; Roesner, Imme; Baumann, Arnulf; Wutzl, Arno; Denk-Linnert, Doris-Maria

2017-07-01

In patients with a repaired cleft palate, nasality is typically diagnosed by speech language pathologists. In addition, there are various instruments to objectively diagnose nasalance. To explore the potential of nasalance measurements after cleft palate repair by NasalView ® , we correlated perceptual nasality and instrumentally measured nasalance of eight speech items and determined the relationship between sensitivity and specificity of the nasalance measures by receiver-operating characteristics (ROC) analyses and AUC (area under the curve) computation for each single test item and specific item groups. We recruited patients with a primarily repaired cleft palate receiving speech therapy during follow-up. During a single day visit, perceptive and instrumental assessments were obtained in 36 patients and analyzed. The individual perceptual nasality was assigned to one of four categories; the corresponding instrumental nasalance measures for the eight specific speech items were expressed on a metric scale (1-100). With reference to the perceptual diagnoses, we observed 3 nasal and one oral test item with high sensitivity. However, the specificity of the nasality indicating measures was rather low. The four best speech items with the highest sensitivity provided scores ranging from 96.43 to 100%, while the averaged sensitivity of all eight items was below 90%. We conclude that perceptive evaluation of nasality remains state of the art. For clinical follow-up, instrumental nasalance assessment can objectively document subtle changes by analysis of four speech items only. Further studies are warranted to determine the applicability of instrumental nasalance measures in the clinical routine, using discriminative items only.
The influence of strategic encoding on false memory in patients with mild cognitive impairment and Alzheimer’s disease dementia

PubMed Central

Tat, Michelle J.; Soonsawat, Anothai; Nagle, Corinne B.; Deason, Rebecca G.; O’Connor, Maureen K.; Budson, Andrew E.

2018-01-01

Patients with Alzheimer’s disease (AD) dementia exhibit high rates of memory distortions in addition to their impairments in episodic memory. Several investigations have demonstrated that when healthy individuals (young and old) engaged in an encoding strategy that emphasized the uniqueness of study items (an item-specific encoding strategy), they were able to improve their discrimination between old items and unstudied critical lure items in a false memory task. In the present study we examined if patients with AD could also improve their memory discrimination when engaging in an item-specific encoding strategy. Healthy older adult controls, patients with mild cognitive impairment (MCI) due to AD, and patients with mild AD dementia were asked to study lists of categorized words. In the Item-Specific condition, participants were asked to provide a unique detail or personal experience with each study item. In the Relational condition, they were asked to determine how each item in the list was related to the others. To assess the influence of both strategies, recall and recognition memory tests were administered. Overall, both patient groups exhibited poorer memory in both recall and recognition tests compared to controls. In terms of recognition, healthy older controls and patients with MCI due to AD exhibited improved memory discrimination in the Item-Specific condition compared to the Relational condition, whereas patients with AD dementia did not. We speculate that patients with MCI due to AD use intact frontal networks to effectively engage in this strategy. PMID:27643951
The development of automaticity in short-term memory search: Item-response learning and category learning.

PubMed

Cao, Rui; Nosofsky, Robert M; Shiffrin, Richard M

2017-05-01

In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across trials. In item-response learning, subjects learn long-term mappings between individual items and target versus foil responses. In category learning, subjects learn high-level codes corresponding to separate sets of items and learn to attach old versus new responses to these category codes. To distinguish between these 2 forms of learning, we tested subjects in categorized varied mapping (CV) conditions: There were 2 distinct categories of items, but the assignment of categories to target versus foil responses varied across trials. In cases involving arbitrary categories, CV performance closely resembled standard varied-mapping performance without categories and departed dramatically from CM performance, supporting the item-response-learning hypothesis. In cases involving prelearned categories, CV performance resembled CM performance, as long as there was sufficient practice or steps taken to reduce trial-to-trial category-switching costs. This pattern of results supports the category-coding hypothesis for sufficiently well-learned categories. Thus, item-response learning occurs rapidly and is used early in CM training; category learning is much slower but is eventually adopted and is used to increase the efficiency of search beyond that available from item-response learning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Test-Retest Reliability of the Self-Reported Impairments in Persons With Late Effects of Polio (SIPP) Rating Scale.

PubMed

Brogårdh, Christina; Lexell, Jan

2016-05-01

A new 13-item rating scale, the Self-Reported Impairments in Persons with Late Effects of Polio (SIPP), has been developed. The SIPP has been analyzed using the Rasch method and has shown good construct validity and internal consistency. To establish its clinical utility, further evaluation of its psychometric properties is needed. To evaluate the test-retest reliability of the SIPP and to define limits for the smallest change that indicates a real change, both for a group of persons and a single individual. A postal survey. University Hospital. Fifty-one persons (31 men and 20 women; mean age, 72 years) with clinically verified late effects of polio. Not applicable. The participants completed the SIPP twice, 2 weeks apart. The response frequencies at test occasion 1 (T1) and test occasion 2 (T2) were calculated. Test-retest reliability was analyzed using the percentage agreement of each item, the intraclass correlation coefficient, and the mean difference between the test occasions (đ), together with the 95% confidence intervals for đ, the standard error of measurement, the smallest real difference, and a Bland-Altman plot. The percentage agreement (ie, the same scoring at both test occasions) was >70% for 10 of 13 items. The mean score (standard deviation) was 27.9 (5.7) points at T1 and 28.2 (6.0) points at T2, with no systematic difference between the test occasions. The intraclass correlation coefficient was 0.88, the standard error of measurement (the smallest change for a group of persons) was 2.0 points, and the smallest real difference (the smallest change for a single individual) was 5.6 points, respectively. The SIPP is a reliable rating scale in persons with late effects of polio and can be used to evaluate effects of rehabilitation interventions and changes of perceived impairments over time both for a group of persons and for a single individual. Copyright © 2016 American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
Cost sharing and hereditary cancer risk: predictors of willingness-to-pay for genetic testing.

PubMed

Matro, Jennifer M; Ruth, Karen J; Wong, Yu-Ning; McCully, Katen C; Rybak, Christina M; Meropol, Neal J; Hall, Michael J

2014-12-01

Increasing use of predictive genetic testing to gauge hereditary cancer risk has been paralleled by rising cost-sharing practices. Little is known about how demographic and psychosocial factors may influence individuals' willingness-to-pay for genetic testing. The Gastrointestinal Tumor Risk Assessment Program Registry includes individuals presenting for genetic risk assessment based on personal/family cancer history. Participants complete a baseline survey assessing cancer history and psychosocial items. Willingness-to-pay items include intention for: genetic testing only if paid by insurance; testing with self-pay; and amount willing-to-pay ($25-$2,000). Multivariable models examined predictors of willingness-to-pay out-of-pocket (versus only if paid by insurance) and willingness-to-pay a smaller versus larger sum (≤$200 vs. ≥$500). All statistical tests are two-sided (α = 0.05). Of 385 evaluable participants, a minority (42%) had a personal cancer history, while 56% had ≥1 first-degree relative with colorectal cancer. Overall, 21.3% were willing to have testing only if paid by insurance, and 78.7% were willing-to-pay. Predictors of willingness-to-pay were: 1) concern for positive result; 2) confidence to control cancer risk; 3) fewer perceived barriers to colorectal cancer screening; 4) benefit of testing to guide screening (all p < 0.05). Subjects willing-to-pay a higher amount were male, more educated, had greater cancer worry, fewer relatives with colorectal cancer, and more positive attitudes toward genetic testing (all p < 0.05). Individuals seeking risk assessment are willing-to-pay out-of-pocket for genetic testing, and anticipate benefits to reducing cancer risk. Identifying factors associated with willingness-to-pay for genetic services is increasingly important as testing is integrated into routine cancer care.
The development of the "Cantonese receptive vocabulary test' for children aged 2-6 in Hong Kong.

PubMed

Cheung, P S; Lee, K Y; Lee, L W

1997-01-01

The study aims to develop a Cantonese receptive vocabulary test to assess 2-6-year-old children in Hong Kong. The test consists of 100 test items. Each target item is accompanied by a phonological distractor, a semantic distractor and an unrelated distractor. A sample of 609 normal children from four Maternal and Child Health Centres and nine kindergartens was selected. The results show that there is a significant effect of age on the correct score. ANOVA was performed to look at the age effect on each distractor individually. It was found that the scores of the three distractors decrease in their own patterns as age increases. With strong content validity, strong construct validity and high correlation coefficients in the split-half reliability, this test could be used as a reliable measurement for the Cantonese-speaking population in Hong Kong.
Psychometric Development of the Research and Knowledge Scale.

PubMed

Powell, Lauren R; Ojukwu, Elizabeth; Person, Sharina D; Allison, Jeroan; Rosal, Milagros C; Lemon, Stephenie C

2017-02-01

Many research participants are misinformed about research terms, procedures, and goals; however, no validated instruments exist to assess individual's comprehension of health-related research information. We propose research literacy as a concept that incorporates understanding about the purpose and nature of research. We developed the Research and Knowledge Scale (RaKS) to measure research literacy in a culturally, literacy-sensitive manner. We describe its development and psychometric properties. Qualitative methods were used to assess perspectives of research participants and researchers. Literature and informed consent reviews were conducted to develop initial items. These data were used to develop initial domains and items of the RaKS, and expert panel reviews and cognitive pretesting were done to refine the scale. We conducted psychometric analyses to evaluate the scale. The cross-sectional survey was administered to a purposive community-based sample (n=430) using a Web-based data collection system and paper. We did classic theory testing on individual items and assessed test-retest reliability and Kuder-Richardson-20 for internal consistency. We conducted exploratory factor analysis and analysis of variance to assess differences in mean research literacy scores in sociodemographic subgroups. The RaKS is comprised of 16 items, with a Kuder-Richardson-20 estimate of 0.81 and test-retest reliability 0.84. There were differences in mean scale scores by race/ethnicity, age, education, income, and health literacy (all P<0.01). This study provides preliminary evidence for the reliability and validity of the RaKS. This scale can be used to measure research participants' understanding about health-related research processes and identify areas to improve informed decision-making about research participation.
[Quality of advanced practice nurse counseling in home care settings (APN-BQ): psychometric testing of the instrument].

PubMed

Petry, Heidi; Suter-Riederer, Susanne; Kerker-Specker, Carmen; Imhof, Lorenz

2014-12-01

Patient centred and individually-tailored counselling of older people with a chronic condition who live at home is a useful intervention to support their independence. The paper presents the development and psychometric testing of the APN-BQ Instrument, to measure patient-centeredness. To measure the quality of an in-home counselling intervention, a 23-item questionnaire was developed and tested with 206 people 80 years and older. Principal component analysis with Varimax Rotation was conducted (n = 206). Analysis revealed a four factor (fs = 0.91) model scoring in 19 items. All factors loaded > 0.45. Cronbach's alpha was 0.86. The utility and acceptance of the instrument was confirmed by the high response rate (100 %) and the fact that participants answered 98.8 % of all questions. The APN-BQ has shown to be a reliable Instrument with good content and construct validity. It is a tool for APNs to measure structure, process, and outcome quality of a patient-centred and individually-tailored counselling program, including the degree of patient participation, and patient empowerment.
Design and development of food safety knowledge and attitude scales for consumer food safety education.

PubMed

Medeiros, Lydia C; Hillers, Virginia N; Chen, Gang; Bergmann, Verna; Kendall, Patricia; Schroeder, Mary

2004-11-01

The objective of this study was to design and develop food safety knowledge and attitude scales based on food-handling guidelines developed by a national panel of food safety experts. Knowledge (n=43) and attitude (n=49) questions were developed and pilot-tested with a variety of consumer groups. Final questions were selected based on item analysis and on validity and reliability statistical tests. Knowledge questions were tested in Washington State with participants in low-income nutrition education programs (pretest/posttest n=58, test/retest n=19) and college students (pretest/posttest n=34). Attitude questions were tested in Ohio with nutrition education program participants (n=30) and college students (non-nutrition majors n=138, nutrition majors n=57). Item analysis, paired sample t tests, Pearson's correlation coefficients, and Cronbach's alpha were used. Reliability and validity tests of individual items and the question sets were used to reduce the scales to 18 knowledge questions and 10 attitude questions. The knowledge and attitude scales covered topics ranked as important by a national panel of experts and met most validity and reliability standards. The 18-item knowledge questionnaire had instructional sensitivity (mean score increase of more than three points after instruction), internal reliability (Cronbach's alpha >.75), and produced similar results in test-retest without intervention (coefficient of stability=.81). Knowledge of correct procedures for hand washing and avoiding cross-contamination was widespread before instruction. Knowledge was limited regarding avoiding food preparation while ill, cooking hamburgers, high-risk foods, and whether cooked rice and potatoes could be stored at room temperature. The 10-item attitude scale had an appropriate range of responses (item difficulty) and produced similar results in test-retest ( P
The Effect of Mental Rotation on Surgical Pathological Diagnosis.

PubMed

Park, Heejung; Kim, Hyun Soo; Cha, Yoon Jin; Choi, Junjeong; Minn, Yangki; Kim, Kyung Sik; Kim, Se Hoon

2018-05-01

Pathological diagnosis involves very delicate and complex consequent processing that is conducted by a pathologist. The recognition of false patterns might be an important cause of misdiagnosis in the field of surgical pathology. In this study, we evaluated the influence of visual and cognitive bias in surgical pathologic diagnosis, focusing on the influence of "mental rotation." We designed three sets of the same images of uterine cervix biopsied specimens (original, left to right mirror images, and 180-degree rotated images), and recruited 32 pathologists to diagnose the 3 set items individually. First, the items found to be adequate for analysis by classical test theory, Generalizability theory, and item response theory. The results showed statistically no differences in difficulty, discrimination indices, and response duration time between the image sets. Mental rotation did not influence the pathologists' diagnosis in practice. Interestingly, outliers were more frequent in rotated image sets, suggesting that the mental rotation process may influence the pathological diagnoses of a few individual pathologists. © Copyright: Yonsei University College of Medicine 2018.
Rewards of bridging the divide between measurement and clinical theory: demonstration of a bifactor model for the Brief Symptom Inventory.

PubMed

Thomas, Michael L

2012-03-01

There is growing evidence that psychiatric disorders maintain hierarchical associations where general and domain-specific factors play prominent roles (see D. Watson, 2005). Standard, unidimensional measurement models can fail to capture the meaningful nuances of such complex latent variable structures. The present study examined the ability of the multidimensional item response theory bifactor model (see R. D. Gibbons & D. R. Hedeker, 1992) to improve construct validity by serving as a bridge between measurement and clinical theories. Archival data consisting of 688 outpatients' psychiatric diagnoses and item-level responses to the Brief Symptom Inventory (BSI; L. R. Derogatis, 1993) were extracted from files at a university mental health clinic. The bifactor model demonstrated superior fit for the internal structure of the BSI and improved overall diagnostic accuracy in the sample (73%) compared with unidimensional (61%) and oblique simple structure (65%) models. Consistent with clinical theory, multiple sources of item variance were drawn from individual test items. Test developers and clinical researchers are encouraged to consider model-based measurement in the assessment of psychiatric distress.
The effects of cumulative practice on mathematics problem solving.

PubMed

Mayfield, Kristin H; Chase, Philip N

2002-01-01

This study compared three different methods of teaching five basic algebra rules to college students. All methods used the same procedures to teach the rules and included four 50-question review sessions interspersed among the training of the individual rules. The differences among methods involved the kinds of practice provided during the four review sessions. Participants who received cumulative practice answered 50 questions covering a mix of the rules learned prior to each review session. Participants who received a simple review answered 50 questions on one previously trained rule. Participants who received extra practice answered 50 extra questions on the rule they had just learned. Tests administered after each review included new questions for applying each rule (application items) and problems that required novel combinations of the rules (problem-solving items). On the final test, the cumulative group outscored the other groups on application and problem-solving items. In addition, the cumulative group solved the problem-solving items significantly faster than the other groups. These results suggest that cumulative practice of component skills is an effective method of training problem solving.
The effects of cumulative practice on mathematics problem solving.

PubMed Central

Mayfield, Kristin H; Chase, Philip N

2002-01-01

This study compared three different methods of teaching five basic algebra rules to college students. All methods used the same procedures to teach the rules and included four 50-question review sessions interspersed among the training of the individual rules. The differences among methods involved the kinds of practice provided during the four review sessions. Participants who received cumulative practice answered 50 questions covering a mix of the rules learned prior to each review session. Participants who received a simple review answered 50 questions on one previously trained rule. Participants who received extra practice answered 50 extra questions on the rule they had just learned. Tests administered after each review included new questions for applying each rule (application items) and problems that required novel combinations of the rules (problem-solving items). On the final test, the cumulative group outscored the other groups on application and problem-solving items. In addition, the cumulative group solved the problem-solving items significantly faster than the other groups. These results suggest that cumulative practice of component skills is an effective method of training problem solving. PMID:12102132
The Usability of CAT System for Assessing the Depressive Level of Japanese-A Study on Psychometric Properties and Response Behavior.

PubMed

Iwata, Noboru; Kikuchi, Kenichi; Fujihara, Yuya

2016-08-01

An innovative measurement system using a computerized adaptive testing technique based on the item response theory (CAT) has been expanding to measure mental health status. However, little is known about details in its measurement properties based on the empirical data. Moreover, the response time (RT) data, which are not available by a paper-and-pencil measurement but available by a computerized measurement, would be worth investigating for exploring the response behavior. We aimed at constructing the CAT to measure depressive symptomatology in a community population and exploring its measurement properties. Also, we examined the relationships between RTs, individual item responses, and depressive levels. For constructing the CAT system, responses of 2061 workers and university students to 24 depression scale plus four negatively revised positive affect items were subjected to a polytomous IRT analysis. The stopping rule was set for standard error of estimation < 0.30 or the maximum 15 items displayed. The CAT and non-adaptive computer-based test (CBT) were administered to 209 undergraduates, and 168 of them administered again after 1 week. On average, the CAT was converged by 10.4 items. The θ values estimated by CAT and CBT were highly correlated (r = 0.94 and 0.95 for the 1st and 2nd measurements) and with the traditional scoring procedures (r's > 0.90). The test-retest reliability was at a satisfactory level (r = 0.86). RTs to some items significantly correlated with the θ estimates. The mean RT varied by the item contents and wording, i.e., the RT to positive affect items required additional 2 s or longer than the other subscale items. The CAT would be a reliable and practical measurement tool for various purposes including stress check at workplace.
Using the SRQ–20 Factor Structure to Examine Changes in Mental Distress Following Typhoon Exposure

PubMed Central

Stratton, Kelcey J.; Richardson, Lisa K.; Tran, Trinh Luong; Tam, Nguyen Thanh; Aggen, Steven H.; Berenz, Erin C.; Trung, Lam Tu; Tuan, Tran; Buoi, La Thi; Ha, Tran Thu; Thach, Tran Duc; Amstadter, Ananda B.

2014-01-01

Empirical research is limited regarding postdisaster assessment of distress in developing nations. This study aimed to evaluate the factor structure of the 20-item Self-Reporting Questionnaire (SRQ–20) before and after an acute trauma, Typhoon Xangsane, in order to examine changes in mental health symptoms in an epidemiologic sample of Vietnamese adults. The study examined a model estimating individual item factor loadings, thresholds, and a latent change factor for the SRQ–20's single “general distress” common factor. The covariates of sex, age, and severity of typhoon exposure were used to evaluate the disaster-induced changes in SRQ–20 scores while accounting for possible differences in the relationship between individual measurement scale items and the latent mental health construct. Evidence for measurement noninvariance was found. However, allowing sex and age effects on the pre-typhoon and post-typhoon factors accounted for much of the noninvariance in the SRQ–20 measurement structure. A test of no latent change failed, indicating that the SRQ–20 detected significant individual differences in distress between pre- and post-typhoon assessment. Conditioning on age and sex, several typhoon exposure variables differentially predicted levels of distress change, including evacuation, personal injury, and peri-event fear. On average, females and older individuals reported higher levels of distress than males and younger individuals, respectively. The SRQ–20 is a valid and reasonably stable instrument that may be used in postdisaster contexts to assess emotional distress and individual changes in mental health symptoms. PMID:24512425
Psychometric properties of the Arabic version of the 12-item diabetes fatalism scale

PubMed Central

Abi Kharma, Joelle

2018-01-01

Background There are widespread fatalistic beliefs in Arab countries, especially among individuals with diabetes. However, there is no tool to assess diabetes fatalism in this population. This study describes the processes used to create an Arabic version of the Diabetes Fatalism Scale (DFS) and examine its psychometric properties. Methods A descriptive correlational design was used with a convenience sample of Lebanese adults (N = 274) with type 2 diabetes recruited from a major hospital in Beirut, Lebanon and by snowball sampling. The 12- item Diabetes Fatalism Scale- Arabic (12-item DFS-Ar) was back-translated from the original version, pilot tested on 22 adults with type 2 diabetes and then administered to 274 patients to assess the validity and reliability of the scale. Confirmatory factor analysis (CFA) was used to test the hypothesized factor structure. Cronbach’s alpha was used to test for reliability. Results CFA supported the existence of the three factor hypothesis of the original DFS scale. The five items measuring “emotional distress” loaded under Factor 1, the four items measuring “spiritual coping” loaded under factor 2 and the last three items measuring “perceived self-efficacy” of the original scale loaded under Factor 3 (p <0.001 for all three subscales). Goodness of fit indices confirmed adequateness of the CFA model (CFI = 0.97, TLI = 0.96, RMSEA = 0.067 and pclose = 0.05). The 12-item DFS-Ar showed good reliability (Cronbach’s alpha of 0.86) and significantly predicted HbA1c (β = 0.20, p < 0.01). After adjusting for the demographic characteristics and the number of diabetes comorbid conditions, the 12-item DFS-Ar score was independently associated with HbA1c in a multivariable model (β = 0.16, p < 0.05). Conclusions The 12-item DFS-Ar demonstrated good psychometric properties that are comparable to the original scale. It is a valid and reliable measure of diabetes fatalism. Further testing with larger and non-Lebanese Arabic population is needed. PMID:29324827
Translation and Validation of the Farage Quality of Life (FQoL™) Instrument for Consumer Products into Traditional Chinese

PubMed Central

Farage, Miranda A.; Rodenberg, Cindy; Chen, Jasmine

2013-01-01

The Farage Quality of Life™ questionnaire (FQoL™) was developed specifically to assess the impact of consumer products. The objective of this investigation was to achieve a Chinese language instrument. The FQoL™ underwent a forward and backward translation, with cognitive testing by 13 subjects. Slight modifications were made to the instrument, and an implementation study was conducted with 800 participants having a mean (±SD) age of 34.22 (±9.28) years. The subjects were randomly assigned to use 1 of 4 ultra absorbency pad products for the length of one menstrual cycle. Three pads (coded N, S and C) were products currently available on the retail market, a fourth (coded M) was an experimental product improvement on Product N. Subjects were asked to complete the FQoL™ once before (T1) and once after (T2) the start of their period, and the Least Square (LS) Means were determined. Within group comparisons for each item and FQoL™ subscale were conducted by comparing the LS Means for T1 vs. T2. Participants using Product N showed the highest number of significant (p<0.05) changes (11 items), demonstrating these subjects felt worse about items mainly in the subdomains for Emotions, Personal Pleasure, and Physical State. Participants using Product C showed significant changes in 7 items mainly in the subdomains for Emotion and Physical State. Participants using Product S and the experimental Product M showed significant changes in only 4 and 3 individual items, respectively. These were not associated with any particular domain or subdomain. Between group comparisons were conducted by comparing the LS Means for the T2 responses for each group. The group using Product N had LS Mean responses that were significantly worse than the group using Product M for the Emotion, Personal Pleasure and Physical State subdomains, the Energy/Vitality domain, and 2 individual items. The Product S group was worse than the Product M group for 2 individual items. The Product C group was worse than the Product M group for the Personal Pleasure and Physical State subdomains and 5 individual items. We found that the Chinese language FQoL™ detected changes in HRQoL during menstruation compared with before menstruation. Further, the measure was able to detect differences among groups of subjects using different menstrual protection products. PMID:23283031
42 CFR 1001.701 - Excessive claims or furnishing of unnecessary or substandard items and services.

Code of Federal Regulations, 2010 CFR

2010-10-01

... substandard items and services. (a) Circumstance for exclusion. The OIG may exclude an individual or entity... financial impact on program beneficiaries or other individuals; (iii) Whether the individual or entity has a... substandard items and services. 1001.701 Section 1001.701 Public Health OFFICE OF INSPECTOR GENERAL-HEALTH...
catcher: A Software Program to Detect Answer Copying in Multiple-Choice Tests Based on Nominal Response Model

ERIC Educational Resources Information Center

Kalender, Ilker

2012-01-01

catcher is a software program designed to compute the [omega] index, a common statistical index for the identification of collusions (cheating) among examinees taking an educational or psychological test. It requires (a) responses and (b) ability estimations of individuals, and (c) item parameters to make computations and outputs the results of…

Personality in general and clinical samples: Measurement invariance of the Multidimensional Personality Questionnaire.

PubMed

Eigenhuis, Annemarie; Kamphuis, Jan H; Noordhof, Arjen

2017-09-01

A growing body of research suggests that the same general dimensions can describe normal and pathological personality, but most of the supporting evidence is exploratory. We aim to determine in a confirmatory framework the extent to which responses on the Multidimensional Personality Questionnaire (MPQ) are identical across general and clinical samples. We tested the Dutch brief form of the MPQ (MPQ-BF-NL) for measurement invariance across a general population subsample (N = 365) and a clinical sample (N = 365), using Multiple Group Confirmatory Factor Analysis (MGCFA) and Multiple Group Exploratory Structural Equation Modeling (MGESEM). As an omnibus personality test, the MPQ-BF-NL revealed strict invariance, indicating absence of bias. Unidimensional per scale tests for measurement invariance revealed that 10% of items appeared to contain bias across samples. Item bias only affected the scale interpretation of Achievement, with individuals from the clinical sample more readily admitting to put high demands on themselves than individuals from the general sample, regardless of trait level. This formal test of equivalence provides strong evidence for the common structure of normal and pathological personality and lends further support to the clinical utility of the MPQ. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Collaborative Memory and Part-Set Cuing Impairments: The Role of Executive Depletion in Modulating Retrieval Disruption

PubMed Central

Barber, Sarah J.; Rajaram, Suparna

2011-01-01

When people are exposed to a subset of previously studied list items they recall fewer of the remaining items compared to a condition where none of the studied items are provided during recall. This occurs both when the subset of items is provided by the experimenter (i.e., the part-set cuing deficit in individual recall) and when they are provided during the course of a collaborative discussion (i.e., the collaborative inhibition effect in group recall). Previous research has identified retrieval disruption as a common mechanism underlying both effects; however, less is known about the factors that may make individuals susceptible to such retrieval disruption. In the current studies we tested one candidate factor, namely, executive control. Using an executive depletion paradigm we directly manipulated an individual’s level of executive control during retrieval. Results revealed no direct role of executive depletion in modulating retrieval disruption. In contrast, executive control abilities were indirectly related to retrieval disruption through their influence at encoding. Together, these results suggest that executive control does not directly affect retrieval disruption at the retrieval stage, and that the role of this putative mechanism may be limited to the encoding stage. PMID:21678155
The role of attention in item-item binding in visual working memory.

PubMed

Peterson, Dwight J; Naveh-Benjamin, Moshe

2017-09-01

An important yet unresolved question regarding visual working memory (VWM) relates to whether or not binding processes within VWM require additional attentional resources compared with processing solely the individual components comprising these bindings. Previous findings indicate that binding of surface features (e.g., colored shapes) within VWM is not demanding of resources beyond what is required for single features. However, it is possible that other types of binding, such as the binding of complex, distinct items (e.g., faces and scenes), in VWM may require additional resources. In 3 experiments, we examined VWM item-item binding performance under no load, articulatory suppression, and backward counting using a modified change detection task. Binding performance declined to a greater extent than single-item performance under higher compared with lower levels of concurrent load. The findings from each of these experiments indicate that processing item-item bindings within VWM requires a greater amount of attentional resources compared with single items. These findings also highlight an important distinction between the role of attention in item-item binding within VWM and previous studies of long-term memory (LTM) where declines in single-item and binding test performance are similar under divided attention. The current findings provide novel evidence that the specific type of binding is an important determining factor regarding whether or not VWM binding processes require attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
The Focus of Attention in Visual Working Memory: Protection of Focused Representations and Its Individual Variation.

PubMed

Heuer, Anna; Schubö, Anna

2016-01-01

Visual working memory can be modulated according to changes in the cued task relevance of maintained items. Here, we investigated the mechanisms underlying this modulation. In particular, we studied the consequences of attentional selection for selected and unselected items, and the role of individual differences in the efficiency with which attention is deployed. To this end, performance in a visual working memory task as well as the CDA/SPCN and the N2pc, ERP components associated with visual working memory and attentional processes, were analysed. Selection during the maintenance stage was manipulated by means of two successively presented retrocues providing spatial information as to which items were most likely to be tested. Results show that attentional selection serves to robustly protect relevant representations in the focus of attention while unselected representations which may become relevant again still remain available. Individuals with larger retrocueing benefits showed higher efficiency of attentional selection, as indicated by the N2pc, and showed stronger maintenance-associated activity (CDA/SPCN). The findings add to converging evidence that focused representations are protected, and highlight the flexibility of visual working memory, in which information can be weighted according its relevance.
Improving Measures via Examining the Behavior of Distractors in Multiple-Choice Tests

PubMed Central

Sideridis, Georgios; Tsaousis, Ioannis; Al Harbi, Khaleel

2017-01-01

The purpose of the present article was to illustrate, using an example from a national assessment, the value from analyzing the behavior of distractors in measures that engage the multiple-choice format. A secondary purpose of the present article was to illustrate four remedial actions that can potentially improve the measurement of the construct(s) under study. Participants were 2,248 individuals who took a national examination of chemistry. The behavior of the distractors was analyzed by modeling their behavior within the Rasch model. Potentially informative distractors were (a) further modeled using the partial credit model, (b) split onto separate items and retested for model fit and parsimony, (c) combined to form a “super” item or testlet, and (d) reexamined after deleting low-ability individuals who likely guessed on those informative, albeit erroneous, distractors. Results indicated that all but the item split strategies were associated with better model fit compared with the original model. The best fitted model, however, involved modeling and crediting informative distractors via the partial credit model or eliminating the responses of low-ability individuals who likely guessed on informative distractors. The implications, advantages, and disadvantages of modeling informative distractors for measurement purposes are discussed. PMID:29795904
Psychometric properties of the Brisbane Burn Scar Impact Profile in adults with burn scars

PubMed Central

Kimble, Roy; McPhail, Steven; Plaza, Anita; Simons, Megan

2017-01-01

Objective The aim of the study was to determine the longitudinal validity, reproducibility, responsiveness and interpretability of the adult version of the Brisbane Burn Scar Impact Profile, a patient-report measure of health-related quality of life. Methods A prospective longitudinal cohort study of patients with or at risk of burn scarring was conducted at three assessment points (at baseline around the time of wound healing, one to two weeks post-baseline and 1-month post-baseline). Participants attending a major metropolitan adult burn centre at baseline were recruited. Participants completed the Brisbane Burn Scar Impact Profile and the 36-item Short Form Health Survey and Patient Observer Scar Assessment Scale. Intraclass Correlation Coefficients (ICCs), smallest detectable change, percentage of those who improved, stayed the same or worsened and Area under the Receiver Operating Characteristic Curve (AUC) were used to test the aim. Results Data were included for 118 participants at baseline, 68 participants at one to two weeks and 57 participants at 1-month post-baseline. All groups of items had acceptable reproducibility, except for the overall impact of burn scars (ICC = 0.69), the impact of sensations which was not expected to be stable (ICC = 0.63), mobility and daily activities (ICC = 0.63, 0.67 respectively). The responsiveness of six out of seven groups of items able to be tested against external criterion was supported (AUC = 0.72–0.75). Hypothesised correlations of changes in the Brisbane Burn Scar Impact Profile items with changes in criterion measures generally supported longitudinal validity (e.g., nine out of thirteen hypotheses using the SF-36 as an external criterion were supported). Internal consistency estimates, item-total and inter-item correlations indicated there was likely redundancy of some groups of items, particularly in the relationships and social interaction, appearance and emotional reactions items (Chronbach’s alpha range = 0.94–0.95). Conclusion Support was found for the reproducibility, longitudinal validity, responsiveness and interpretability of most groups of Brisbane Burn Scar Impact Profile items and some individual items in the test population. Potential redundancy of items should be investigated further. PMID:28902874
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jaech, J.L.

The use of a pooling technique in leak testing Plutonium Recycle Test Reactor fuel elements to reduce the number of tests is discussed. Since the proportion of defectives in this case is small, application of the method would suggest that the group size be large. It was suggested that additional savings might be introduced by subgrouping the originally grouped items in the event of a positive result, rather than testing them individually. An investigation was made to determine optimum subgrouping sizes. (M.C.G.)
The 12-item Self-Report World Health Organization Disability Assessment Schedule (WHODAS) 2.0 Administered Via the Internet to Individuals With Anxiety and Stress Disorders: A Psychometric Investigation Based on Data From Two Clinical Trials

PubMed Central

Lindsäter, Elin; Ljótsson, Brjánn; Andersson, Erik; Hedman-Lagerlöf, Erik

2017-01-01

Background The World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) is a widespread measure of disability and functional impairment, which is bundled with the Diagnostic and Statistical Manual of Mental Disorders (Fifth Edition) for use in psychiatry. Administering psychometric scales via the Internet is an effective way to reach respondents and allow for convenient handling of data. Objective The aim was to study the psychometric properties of the 12-item self-report WHODAS 2.0 when administered online to individuals with anxiety and stress disorders. The WHODAS 2.0 was hypothesized to exhibit high internal consistency and be unidimensional. We also expected the WHODAS 2.0 to show high 2-week test-retest reliability, convergent validity (correlations approximately .50 to .90 with other self-report measures of functional impairment), that it would differentiate between patients with and without exhaustion disorder, and that it would respond to change in primary symptom domain. Methods We administered the 12-item self-report WHODAS 2.0 online to patients with anxiety and stress disorders (N=160) enrolled in clinical trials of cognitive behavior therapy, and analyzed psychometric properties within a classical test theory framework. Scores were compared with well-established symptom and disability measures, and sensitivity to change was studied from pretreatment to posttreatment assessment. Results The 12-item self-report WHODAS 2.0 showed high internal consistency (Cronbach alpha=.83-.92), high 2-week test-retest reliability (intraclass correlation coefficient=.83), adequate construct validity, and was sensitive to change. We found preliminary evidence for a three-factorial structure, but one strong factor accounted for a clear majority of the variance. Conclusions We conclude that the 12-item self-report WHODAS 2.0 is a psychometrically sound instrument when administered online to individuals with anxiety and stress disorders, but that it is probably fruitful to also report the three subfactors to facilitate comparisons between studies. Trial Registration Clinicaltrials.gov NCT02540317; https://clinicaltrials.gov/ct2/show/NCT02540317 (Archived by WebCite at http://www.webcitation.org/6vQEdYAem); Clinicaltrials.gov NCT02314065; https://clinicaltrials.gov/ct2/show/NCT02314065 (Archived by WebCite at http://www.webcitation.org/6vQEjlUU8) PMID:29222080
Commercial portion-controlled foods in research studies: how accurate are label weights?

PubMed

Conway, Joan M; Rhodes, Donna G; Rumpler, William V

2004-09-01

The purpose of this study was to evaluate the reliability of label weights as surrogates for actual weights in commercial portion-controlled foods used in a research setting. Actual weights of replicate samples of 82 portion-controlled food items and 17 discrete units of food from larger packaging were determined over time. Comparison was made to the package label weights for the portion-controlled food items and the per-serving weights for the discrete units. The study was conducted at the US Department of Agriculture's Beltsville Human Nutrition Research Center's Human Study Facility, which houses a metabolic kitchen and human nutrition research facility. The primary outcome measures were the actual and label weights of 99 food items consumed by human volunteers during controlled feeding studies. Statistical analyses performed The difference between label and actual weights was tested by the paired t test for those data that complied with the assumptions of normality. The Wilcoxon signed rank test was used for the remainder of the data. Compliance with federal guidelines for packaged weights was also assessed. There was no statistical difference between actual and label weights for only 37 food items. The actual weights of 15 portion-controlled food items were 1% or more less than label weights, making them potentially out of compliance with federal guidelines. With advance planning and continuous monitoring, well-controlled feeding studies could incorporate portion-controlled food items and discrete units, especially beverages and confectionery products. Dietetics professionals should encourage individuals with diabetes and others on strict dietary regimens to check actual weights of portion-controlled products carefully against package weights.
Head and neck cancer-specific quality of life: instrument validation.

PubMed

Terrell, J E; Nanavati, K A; Esclamado, R M; Bishop, J K; Bradford, C R; Wolf, G T

1997-10-01

The disfigurement and dysfunction associated with head and neck cancer affect emotional well-being and some of the most basic functions of life. Most cancer-specific quality-of-life assessments give a single composite score for head and neck cancer-related quality of life. To develop and evaluate an improved multidimensional instrument to assess head and neck cancer-related functional status and well-being. The item selection process included literature review, interviews with health care workers, and patient surveys. A survey with 37 disease-specific questions and the SF-12 survey were administered to 253 patients in 3 large medical centers. Factor analysis was performed to identify disease-specific domains. Domain scores were calculated as the standardized score of the component items. These domains were assessed for construct validity based on clinical hypotheses and test-retest reliability. Four relevant domains were identified: Eating (6 items), Communication (4 items), Pain (4 items), and Emotion (6 items). Each had an internal consistency (Cronbach alpha value) of greater than 0.80. Construct validity was demonstrated by moderate correlations with the SF-12 Physical and Mental component scores (r=0.43-0.60). Test-retest reliability for each domain demonstrated strong reliability between the 2 time points. Correlations were strong for each individual question, ranging from 0.53 to 0.93. Construct validity testing demonstrated that the direction of differences for each domain were as hypothesized. The Head and Neck Quality of Life questionnaire is a promising multidimensional tool with which to assess head and neck cancer-specific quality of life.
The Control Attitudes Scale-Revised: psychometric evaluation in three groups of patients with cardiac illness.

PubMed

Moser, Debra K; Riegel, Barbara; McKinley, Sharon; Doering, Lynn V; Meischke, Hendrika; Heo, Seongkum; Lennie, Terry A; Dracup, Kathleen

2009-01-01

Perceived control is a construct with important theoretical and clinical implications for healthcare providers, yet practical application of the construct in research and clinical practice awaits development of an easily administered instrument to measure perceived control with evidence of reliability and validity. To test the psychometric properties of the Control Attitudes Scale-Revised (CAS-R) using a sample of 3,396 individuals with coronary heart disease, 513 patients with acute myocardial infarction, and 146 patients with heart failure. Analyses were done separately in each patient group. Reliability was assessed using Cronbach's alpha to determine internal consistency, and item homogeneity was assessed using item-total and interitem correlations. Validity was examined using principal component analysis and testing hypotheses about known associations. Cronbach's alpha values for the CAS-R in patients with coronary heart disease, acute myocardial infarction, and heart failure were all greater than .70. Item-total and interitem correlation coefficients for all items were acceptable in the groups. In factor analyses, the same single factor was extracted in all groups, and all items were loaded moderately or strongly to the factor in each group. As hypothesized in the final construct validity test, in all groups, patients with higher levels of perceived control had less depression and less anxiety compared with those of patients who had lower levels of perceived control. This study provides evidence of the reliability and validity of the 8-item CAS-R as a measure of perceived control in patients with cardiac illness and provides important insight into a key patient construct.
Characterization of good teleoperators - What aptitudes, interests, and experience correlate with measures of teleoperator performance

NASA Technical Reports Server (NTRS)

Yorchak, J. P.; Hartley, C. S.; Hinman, E.

1985-01-01

The use of aptitude tests and questionnaries to evaluate an individuals aptitude for teleoperation is studied. The Raven Progressive Matrices Test and Differential Aptitude Tests, and a 16-item questionnaire for assessing the subject's interests, academic background, and previous experience are described. The Proto-Flight Manipulator Arm, cameras, console, hand controller, and task board utilized by the 17 engineers are examined. The correlation between aptitude scores and questionnaire responses, and operator performance is investigated. Multiple regression data reveal that the eight predictor variables are not individually significant for evaluating operator performance; however, the complete test battery is applicable for predicting 49 percent of subject variance on the criterion task.
Development and psychometric properties of the Carer - Head Injury Neurobehavioral Assessment Scale (C-HINAS) and the Carer - Head Injury Participation Scale (C-HIPS): patient and family determined outcome scales.

PubMed

Deb, Shoumitro; Bryant, Eleanor; Morris, Paul G; Prior, Lindsay; Lewis, Glyn; Haque, Sayeed

2007-06-01

Develop and assess the psychometric properties of the Carer - Head Injury Participation Scale (C-HIPS) and its biggest factor the Carer - Head Injury Neurobehavioral Assessment Scale (C-HINAS). Furthermore, the aim was to examine the inter-informant reliability by comparing the self reports of individuals with traumatic brain injury (TBI) with the carer reports on the C-HIPS and the C-HINAS. Thirty-two TBI individuals and 27 carers took part in in-depth qualitative interviews exploring the consequences of the TBI. Interview transcripts were analysed and key themes and concepts were used to construct a 49-item and 58-item patient (Patient - Head Injury Participation Scale [P-HIPS]) and carer outcome measure (C-HIPS) respectively, of which 49 were parallel items and nine additional items were used to assess carer burden. Postal versions of the P-HIPS, C-HIPS, Mayo Portland Adaptability Inventory-3 (MPAI-3), and the Glasgow Outcome Scale-Extended (GOSE) were completed by a cohort of 113 TBI individuals and 80 carers. Data from a sub-group of 66 patient/carer pairs were used to compare inter-informant reliability between the P-HIPS and the C-HIPS, and the P-HINAS and the C-HINAS respectively. All individual 49 items of the C-HIPS and their total score showed good test-retest reliability (0.95) and internal consistency (0.95). Comparisons with the MPAI-3 and GOSE found a good correlation with the MPAI-3 (0.7) and a moderate negative correlation with the GOSE (-0.6). Factor analysis of these items extracted a 4-factor structure which represented the domains 'Emotion/Behavior' (C-HINAS), 'Independence/Community Living', 'Cognition', and 'Physical'. The C-HINAS showed good internal consistency (0.92), test-retest reliability (0.93), and concurrent validity with one MPAI subscale (0.7). Assessment of inter-informant reliability revealed good correspondence between the reports of the patients and the carers for both the C-HIPS (0.83) and the C-HINAS (0.82). Both the C-HINAS and the C-HIPS show strong psychometric properties. The qualitative methodology employed in the construction stage of the questionnaires provided good evidence of face and content validity. Comparisons between the P-HIPS and the C-HIPS, and the P-HINAS and the C-HINAS indicated high levels of agreement suggesting that in situations where the patient is unable to provide self-reports, information provided by the carer could be used.
Promising Areas for Psychometric Research.

ERIC Educational Resources Information Center

Angoff, William H.

1988-01-01

An overview of four papers on useful future directions for psychometric research is provided. The papers were drawn from American Psychological Association symposia; they cover the nature of general intelligence, item bias and selection, cut scores, equating problems, computer-adaptive testing, and individual and group achievement measurement.…
An empirical comparison of knowledge and skill in the context of traditional ecological knowledge.

PubMed

Kightley, Eric P; Reyes-García, Victoria; Demps, Kathryn; Magtanong, Ruth V; Ramenzoni, Victoria C; Thampy, Gayatri; Gueze, Maximilien; Stepp, John Richard

2013-10-16

We test whether traditional ecological knowledge (TEK) about how to make an item predicts a person's skill at making it among the Tsimane' (Bolivia). The rationale for this research is that the failure to distinguish between knowledge and skill might account for some of the conflicting results about the relationships between TEK, human health, and economic development. We test the association between a commonly-used measure of individual knowledge (cultural consensus analysis) about how to make an arrow or a bag and a measure of individual skill at making these items, using ordinary least-squares regression. The study consists of 43 participants from 3 villages. We find no association between our measures of knowledge and skill (core model, p > 0.5, R2 = .132). While we cannot rule out the possibility of a real association between these phenomena, we interpret our findings as support for the claim that researchers should distinguish between methods to measure knowledge and skill when studying trends in TEK.
Differential Item Functioning of the Boston Naming Test in Cognitively Normal African American and Caucasian Older Adults

PubMed Central

Pedraza, Otto; Graff-Radford, Neill R.; Smith, Glenn E.; Ivnik, Robert J.; Willis, Floyd B.; Petersen, Ronald C.; Lucas, John A.

2010-01-01

Scores on the Boston Naming Test (BNT) are frequently lower for African American when compared to Caucasian adults. Although demographically-based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo's Older Americans and Older African Americans Normative Studies. Under a 2-parameter logistic IRT framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Six of these 12 items (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. PMID:19570311
When intensions do not map onto extensions: Individual differences in conceptualization.

PubMed

Hampton, James A; Passanisi, Alessia

2016-04-01

Concepts are represented in the mind through knowledge of their extensions (the class of items to which the concept applies) and intensions (features that distinguish that class of items). A common assumption among theories of concepts is that the 2 aspects are intimately related. Hence if there is systematic individual variation in concept representation, the variation should correlate between extensional and intensional measures. A pair of individuals with similar extensional beliefs about a given concept should also share similar intensional beliefs. To test this notion, exemplars (extensions) and features (intensions) of common categories were rated for typicality and importance respectively across 2 occasions. Within-subject consistency was greater than between-subjects consensus on each task, providing evidence for systematic individual variation. Furthermore, the similarity structure between individuals for each task was stable across occasions. However, across 5 samples, similarity between individuals for extensional judgments did not map onto similarity between individuals for intensional judgments. The results challenge the assumption common to many theories of conceptual representation that intensions determine extensions and support a hybrid view of concepts where there is a disconnection between the conceptual resources that are used for the 2 tasks. (c) 2016 APA, all rights reserved).
Developing and testing an instrument for identifying performance incentives in the Greek health care sector.

PubMed

Paleologou, Victoria; Kontodimopoulos, Nick; Stamouli, Aggeliki; Aletras, Vassilis; Niakas, Dimitris

2006-09-13

In the era of cost containment, managers are constantly pursuing increased organizational performance and productivity by aiming at the obvious target, i.e. the workforce. The health care sector, in which production processes are more complicated compared to other industries, is not an exception. In light of recent legislation in Greece in which efficiency improvement and achievement of specific performance targets are identified as undisputable health system goals, the purpose of this study was to develop a reliable and valid instrument for investigating the attitudes of Greek physicians, nurses and administrative personnel towards job-related aspects, and the extent to which these motivate them to improve performance and increase productivity. A methodological exploratory design was employed in three phases: a) content development and assessment, which resulted in a 28-item instrument, b) pilot testing (N = 74) and c) field testing (N = 353). Internal consistency reliability was tested via Cronbach's alpha coefficient and factor analysis was used to identify the underlying constructs. Tests of scaling assumptions, according to the Multitrait-Multimethod Matrix, were used to confirm the hypothesized component structure. Four components, referring to intrinsic individual needs and external job-related aspects, were revealed and explain 59.61% of the variability. They were subsequently labeled: job attributes, remuneration, co-workers and achievement. Nine items not meeting item-scale criteria were removed, resulting in a 19-item instrument. Scale reliability ranged from 0.782 to 0.901 and internal item consistency and discriminant validity criteria were satisfied. Overall, the instrument appears to be a promising tool for hospital administrations in their attempt to identify job-related factors, which motivate their employees. The psychometric properties were good and warrant administration to a larger sample of employees in the Greek healthcare system.
Developing and testing an instrument for identifying performance incentives in the Greek health care sector

PubMed Central

Paleologou, Victoria; Kontodimopoulos, Nick; Stamouli, Aggeliki; Aletras, Vassilis; Niakas, Dimitris

2006-01-01

Background In the era of cost containment, managers are constantly pursuing increased organizational performance and productivity by aiming at the obvious target, i.e. the workforce. The health care sector, in which production processes are more complicated compared to other industries, is not an exception. In light of recent legislation in Greece in which efficiency improvement and achievement of specific performance targets are identified as undisputable health system goals, the purpose of this study was to develop a reliable and valid instrument for investigating the attitudes of Greek physicians, nurses and administrative personnel towards job-related aspects, and the extent to which these motivate them to improve performance and increase productivity. Methods A methodological exploratory design was employed in three phases: a) content development and assessment, which resulted in a 28-item instrument, b) pilot testing (N = 74) and c) field testing (N = 353). Internal consistency reliability was tested via Cronbach's alpha coefficient and factor analysis was used to identify the underlying constructs. Tests of scaling assumptions, according to the Multitrait-Multimethod Matrix, were used to confirm the hypothesized component structure. Results Four components, referring to intrinsic individual needs and external job-related aspects, were revealed and explain 59.61% of the variability. They were subsequently labeled: job attributes, remuneration, co-workers and achievement. Nine items not meeting item-scale criteria were removed, resulting in a 19-item instrument. Scale reliability ranged from 0.782 to 0.901 and internal item consistency and discriminant validity criteria were satisfied. Conclusion Overall, the instrument appears to be a promising tool for hospital administrations in their attempt to identify job-related factors, which motivate their employees. The psychometric properties were good and warrant administration to a larger sample of employees in the Greek healthcare system. PMID:16970823
Statistical evaluation of synchronous spike patterns extracted by frequent item set mining

PubMed Central

Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja

2013-01-01

We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487

Measuring limitations in activities of daily living: a population-based validation of a short questionnaire.

PubMed

Elfering, Achim; Cronenberg, Sonja; Grebner, Simone; Tamcan, Oezguer; Müller, Urs

2017-12-01

A newly developed questionnaire assessing limitations in activity of daily living (LADL-Q) that should improve assessment of LADL is tested in a large population-based validation study. This survey was paper-based. Overall, 16,634 individuals who were representative of the working population in the German-speaking part of Switzerland participated in the study. Item analysis was used the final version of the LADL-Q to four items per subscale that correspond to potential problems in three body regions (back and neck, upper extremities, lower extremities). Analysis included tests for reliability, internal consistency, dimensionality and convergent validity. Test-retest reliability coefficients after 2 weeks ranged from 0.82 to 0.99 (Mdn = 0.87), with no item having a coefficient below 0.60. The median item-total coefficients ranged between moderate and good. Correlation coefficients between LADL-Q subscales and three validated clinical instruments (Western Ontario and McMaster Universities osteoarthritis index, shoulder pain disability index, Oswestry) ranged from 0.63 to 0.81. In structural equation modeling the three subscales were significantly related with two important outcomes in occupational rehabilitation: self-reported general health and daily task performance. The new LADL-Q is a brief, reliable and valid tool for assessment of LADL in studies on musculoskeletal health.
A New Approach to Response Sets in Analysis of a Test of Motivation to Achieve. A Section of the Final Report for 1969-70.

ERIC Educational Resources Information Center

Adkins, Dorothy C.; Ballif, Bonnie L.

Gumpgookies, an objective-projective test of school achievement motivation for children 3 1/2 to 8 year, was reduced from 100 to 75 items following extensive factor analyses. This revised test attempted to dissipate the effects of response sets of the subjects and was prepared in three versions--an individual form, a group form for non-readers,…
Body mass index and motor coordination: Non-linear relationships in children 6-10 years.

PubMed

Lopes, V P; Malina, R M; Maia, J A R; Rodrigues, L P

2018-05-01

Given the concern for health-related consequences of an elevated body mass index (BMI; obesity), the potential consequences of a low BMI in children are often overlooked. The purpose was to evaluate the relationship between the BMI across its entire spectrum and motor coordination (MC) in children 6-10 years. Height, weight, and MC (Körperkoordinationstest für Kinder, KTK test battery) were measured in 1,912 boys and 1,826 girls of 6-10 years of age. BMI (kg/m 2 ) was calculated. KTK scores for each of the four tests were also converted to a motor quotient (MQ). One-way ANOVA was used to test differences in the BMI, individual test items, and MQ among boys and girls within age groups. Sex-specific quadratic regressions of individual KTK items and the MQ on the BMI were calculated. Girls and boys were also classified into four weight status groups using International Obesity Task Force criteria: thin, normal, overweight, and obese. Differences in specific test items and MQ between weight status groups were evaluated by age group in each sex. Thirty-one percent of the sample was overweight or obese, whereas 5% was thin. On average, normal weight children had the highest MQ in both sexes across the age range with few exceptions. Overweight/obese children had a lower MQ than normal weight and thin children. The quadratic regression lines generally presented an inverted parabolic relationship between the BMI and MC and suggested a decrease in MC with an increase in the BMI. In general, BMI shows a curvilinear, inverted parabolic relationship with MC in children 6-10 years. © 2018 John Wiley & Sons Ltd.
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory.

PubMed

Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E

2016-04-01

We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
A Multidimensional Tool Based on the eHealth Literacy Framework: Development and Initial Validity Testing of the eHealth Literacy Questionnaire (eHLQ)

PubMed Central

Karnoe, Astrid; Furstrand, Dorthe; Batterham, Roy; Christensen, Karl Bang; Elsworth, Gerald; Osborne, Richard H

2018-01-01

Background For people to be able to access, understand, and benefit from the increasing digitalization of health services, it is critical that services are provided in a way that meets the user’s needs, resources, and competence. Objective The objective of the study was to develop a questionnaire that captures the 7-dimensional eHealth Literacy Framework (eHLF). Methods Draft items were created in parallel in English and Danish. The items were generated from 450 statements collected during the conceptual development of eHLF. In all, 57 items (7 to 9 items per scale) were generated and adjusted after cognitive testing. Items were tested in 475 people recruited from settings in which the scale was intended to be used (community and health care settings) and including people with a range of chronic conditions. Measurement properties were assessed using approaches from item response theory (IRT) and classical test theory (CTT) such as confirmatory factor analysis (CFA) and reliability using composite scale reliability (CSR); potential bias due to age and sex was evaluated using differential item functioning (DIF). Results CFA confirmed the presence of the 7 a priori dimensions of eHLF. Following item analysis, a 35-item 7-scale questionnaire was constructed, covering (1) using technology to process health information (5 items, CSR=.84), (2) understanding of health concepts and language (5 items, CSR=.75), (3) ability to actively engage with digital services (5 items, CSR=.86), (4) feel safe and in control (5 items, CSR=.87), (5) motivated to engage with digital services (5 items, CSR=.84), (6) access to digital services that work (6 items, CSR=.77), and (7) digital services that suit individual needs (4 items, CSR=.85). A 7-factor CFA model, using small-variance priors for cross-loadings and residual correlations, had a satisfactory fit (posterior productive P value: .27, 95% CI for the difference between the observed and replicated chi-square values: −63.7 to 133.8). The CFA showed that all items loaded strongly on their respective factors. The IRT analysis showed that no items were found to have disordered thresholds. For most scales, discriminant validity was acceptable; however, 2 pairs of dimensions were highly correlated; dimensions 1 and 5 (r=.95), and dimensions 6 and 7 (r=.96). All dimensions were retained because of strong content differentiation and potential causal relationships between these dimensions. There is no evidence of DIF. Conclusions The eHealth Literacy Questionnaire (eHLQ) is a multidimensional tool based on a well-defined a priori eHLF framework with robust properties. It has satisfactory evidence of construct validity and reliable measurement across a broad range of concepts (using both CTT and IRT traditions) in various groups. It is designed to be used to understand and evaluate people’s interaction with digital health services. PMID:29434011
Measuring organizational flexibility in community pharmacy: Building the capacity to implement cognitive pharmaceutical services.

PubMed

Feletto, Eleonora; Wilson, Laura Kate; Roberts, Alison Sarah; Benrimoj, Shalom Isaac

2011-03-01

Community pharmacy is undergoing transformation with increasing pressure to build its capacity to deliver cognitive pharmaceutical services ("services"). The theoretical framework of organizational flexibility (OF) may be used to assess the capacity of community pharmacy to implement change programs and guide capacity-building initiatives. To test the applicability of an existing scale measuring OF to the industry of community pharmacy in Australia. A mail survey was used to test a preexisting scale measuring OF amended from 28 items to 20 items testing 3 underlying factors of operational, structural, and strategic flexibility in the Australian community pharmacy context. The sample was 2006 randomly-stratified community pharmacies. A confirmatory factor analysis was conducted to assess the validity and reliability of the 1-factor models for each underlying construct and the full measurement model. Responses were received from a total of 395 (19.7%) community pharmacies. The 1-factor models of operational, structural, and strategic flexibility fit the data with appropriate respecification. Overall, the favorable fit of the individual factor constructs suggested that the multiple-factor measurement model should be tested. However, this model did not yield an interpretable response. Operational flexibility covaried negatively to the other factors, whereas structural and strategic flexibility shared covariance. Despite this, the results highlighting the individual factor fit suggest the constructs have application to pharmacy. The individual OF constructs were useful in the development and initial testing of a scale adapted for community pharmacy. When further developed and validated, the scale could be used to identify group of pharmacies that require individualized assistance to build capacity and integrate services and other new endeavors. Copyright © 2011 Elsevier Inc. All rights reserved.
One process is not enough! A speed-accuracy tradeoff study of recognition memory.

PubMed

Boldini, Angela; Russo, Riccardo; Avons, S E

2004-04-01

Speed-accuracy tradeoff (SAT) methods have been used to contrast single- and dual-process accounts of recognition memory. In these procedures, subjects are presented with individual test items and are required to make recognition decisions under various time constraints. In this experiment, we presented word lists under incidental learning conditions, varying the modality of presentation and level of processing. At test, we manipulated the interval between each visually presented test item and a response signal, thus controlling the amount of time available to retrieve target information. Study-test modality match had a beneficial effect on recognition accuracy at short response-signal delays (< or =300 msec). Conversely, recognition accuracy benefited more from deep than from shallow processing at study only at relatively long response-signal delays (> or =300 msec). The results are congruent with views suggesting that both fast familiarity and slower recollection processes contribute to recognition memory.
Measurement equivalence of seven selected items of posttraumatic growth between black and white adult survivors of Hurricane Katrina.

PubMed

Rhodes, Alison M; Tran, Thanh V

2013-02-01

This study examined the equivalence or comparability of the measurement properties of seven selected items measuring posttraumatic growth among self-identified Black (n = 270) and White (n = 707) adult survivors of Hurricane Katrina, using data from the Baseline Survey of the Hurricane Katrina Community Advisory Group Study. Internal consistency reliability was equally good for both groups (Cronbach's alphas = .79), as were correlations between individual scale items and their respective overall scale. Confirmatory factor analysis of a congeneric measurement model of seven selected items of posttraumatic growth showed adequate measures of fit for both groups. The results showed only small variation in magnitude of factor loadings and measurement errors between the two samples. Tests of measurement invariance showed mixed results, but overall indicated that factor loading, error variance, and factor variance were similar between the two samples. These seven selected items can be useful for future large-scale surveys of posttraumatic growth.
Selecting Items for Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
A conflict management scale for pharmacy.

PubMed

Austin, Zubin; Gregory, Paul A; Martin, Craig

2009-11-12

To develop and establish the validity and reliability of a conflict management scale specific to pharmacy practice and education. A multistage inventory-item development process was undertaken involving 93 pharmacists and using a previously described explanatory model for conflict in pharmacy practice. A 19-item inventory was developed, field tested, and validated. The conflict management scale (CMS) demonstrated an acceptable degree of reliability and validity for use in educational or practice settings to promote self-reflection and self-awareness regarding individuals' conflict management styles. The CMS provides a unique, pharmacy-specific method for individuals to determine and reflect upon their own conflict management styles. As part of an educational program to facilitate self-reflection and heighten self-awareness, the CMS may be a useful tool to promote discussions related to an important part of pharmacy practice.
Pseudo-Equivalent Groups and Linking

ERIC Educational Resources Information Center

Haberman, Shelby J.

2015-01-01

Adjustment by minimum discriminant information provides an approach to linking test forms in the case of a nonequivalent groups design with no satisfactory common items. This approach employs background information on individual examinees in each administration so that weighted samples of examinees form pseudo-equivalent groups in the sense that…
Eye-Movement Analysis Demonstrates Strategic Influences on Intelligence

ERIC Educational Resources Information Center

Vigneau, Francois; Caissie, Andre F.; Bors, Douglas A.

2006-01-01

Taking into account various models and findings pertaining to the nature of analogical reasoning, this study explored quantitative and qualitative individual differences in intelligence using latency and eye-movement data. Fifty-five university students were administered 14 selected items of the Raven's Advanced Progressive Matrices test. Results…
77 FR 34986 - Notice of Intent To Repatriate Cultural Items: U.S. Department of the Interior, National Park...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-06-12

... appropriate Indian tribes, has determined that the cultural items meet the definition of sacred objects and... individuals who believe they are lineal descendants of the individual who owned these sacred objects and who... descendants of the individual who owned these sacred objects and who wish to claim the items should contact...
A quick aphasia battery for efficient, reliable, and multidimensional assessment of language function.

PubMed

Wilson, Stephen M; Eriksson, Dana K; Schneck, Sarah M; Lucanie, Jillian M

2018-01-01

This paper describes a quick aphasia battery (QAB) that aims to provide a reliable and multidimensional assessment of language function in about a quarter of an hour, bridging the gap between comprehensive batteries that are time-consuming to administer, and rapid screening instruments that provide limited detail regarding individual profiles of deficits. The QAB is made up of eight subtests, each comprising sets of items that probe different language domains, vary in difficulty, and are scored with a graded system to maximize the informativeness of each item. From the eight subtests, eight summary measures are derived, which constitute a multidimensional profile of language function, quantifying strengths and weaknesses across core language domains. The QAB was administered to 28 individuals with acute stroke and aphasia, 25 individuals with acute stroke but no aphasia, 16 individuals with chronic post-stroke aphasia, and 14 healthy controls. The patients with chronic post-stroke aphasia were tested 3 times each and scored independently by 2 raters to establish test-retest and inter-rater reliability. The Western Aphasia Battery (WAB) was also administered to these patients to assess concurrent validity. We found that all QAB summary measures were sensitive to aphasic deficits in the two groups with aphasia. All measures showed good or excellent test-retest reliability (overall summary measure: intraclass correlation coefficient (ICC) = 0.98), and excellent inter-rater reliability (overall summary measure: ICC = 0.99). Sensitivity and specificity for diagnosis of aphasia (relative to clinical impression) were 0.91 and 0.95 respectively. All QAB measures were highly correlated with corresponding WAB measures where available. Individual patients showed distinct profiles of spared and impaired function across different language domains. In sum, the QAB efficiently and reliably characterized individual profiles of language deficits.
A quick aphasia battery for efficient, reliable, and multidimensional assessment of language function

PubMed Central

Eriksson, Dana K.; Schneck, Sarah M.; Lucanie, Jillian M.

2018-01-01

This paper describes a quick aphasia battery (QAB) that aims to provide a reliable and multidimensional assessment of language function in about a quarter of an hour, bridging the gap between comprehensive batteries that are time-consuming to administer, and rapid screening instruments that provide limited detail regarding individual profiles of deficits. The QAB is made up of eight subtests, each comprising sets of items that probe different language domains, vary in difficulty, and are scored with a graded system to maximize the informativeness of each item. From the eight subtests, eight summary measures are derived, which constitute a multidimensional profile of language function, quantifying strengths and weaknesses across core language domains. The QAB was administered to 28 individuals with acute stroke and aphasia, 25 individuals with acute stroke but no aphasia, 16 individuals with chronic post-stroke aphasia, and 14 healthy controls. The patients with chronic post-stroke aphasia were tested 3 times each and scored independently by 2 raters to establish test-retest and inter-rater reliability. The Western Aphasia Battery (WAB) was also administered to these patients to assess concurrent validity. We found that all QAB summary measures were sensitive to aphasic deficits in the two groups with aphasia. All measures showed good or excellent test-retest reliability (overall summary measure: intraclass correlation coefficient (ICC) = 0.98), and excellent inter-rater reliability (overall summary measure: ICC = 0.99). Sensitivity and specificity for diagnosis of aphasia (relative to clinical impression) were 0.91 and 0.95 respectively. All QAB measures were highly correlated with corresponding WAB measures where available. Individual patients showed distinct profiles of spared and impaired function across different language domains. In sum, the QAB efficiently and reliably characterized individual profiles of language deficits. PMID:29425241
The use of the bi-factor model to test the uni-dimensionality of a battery of reasoning tests.

PubMed

Primi, Ricardo; Rocha da Silva, Marjorie Cristina; Rodrigues, Priscila; Muniz, Monalisa; Almeida, Leandro S

2013-02-01

The Battery of Reasoning Tests 5 (BPR-5) aims to assess the reasoning ability of individuals, using sub-tests with different formats and contents that require basic processes of inductive and deductive reasoning for their resolution. The BPR has three sequential forms: BPR-5i (for children from first to fifth grade), BPR-5 - Form A (for children from sixth to eighth grade) and BPR-5 - form B (for high school and undergraduate students). The present study analysed 412 questionnaires concerning BPR-5i, 603 questionnaires concerning BPR-5 - Form A and 1748 questionnaires concerning BPR-5 - Form B. The main goal was to test the uni-dimensionality of the battery and its tests in relation to items using the bi-factor model. Results suggest that the g factor loadings (extracted by the uni-dimensional model) do not change when the data is adjusted for a more flexible multi-factor model (bi-factor model). A general reasoning factor underlying different contents items is supported.
76 FR 80391 - Notice of Intent to Repatriate Cultural Items: U.S. Department of the Interior, National Park...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-12-23

... cultural items meet the definition of sacred objects and repatriation to the lineal descendant stated below... descendants of the individual who owned these sacred objects and who wish to claim the items should contact... descendants of the individual who owned these sacred objects and who wish to claim the items should contact...
76 FR 80390 - Notice of Intent To Repatriate Cultural Items: U.S. Department of the Interior, National Park...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-12-23

... cultural items meet the definition of sacred objects and repatriation to the lineal descendant stated below... descendants of the individual who owned these sacred objects and who wish to claim the items should contact... descendants of the individual who owned these sacred objects who wish to claim the items should contact Little...
Sentimental value and its influence on hedonic adaptation.

PubMed

Yang, Yang; Galak, Jeff

2015-11-01

Sentimental value is a highly prevalent, yet largely understudied phenomenon. We introduce the construct of sentimental value and investigate how and why sentimental value influences hedonic adaptation. Across 7 studies, we examine the antecedents of sentimental value and demonstrate its effect on hedonic adaptation using both naturally occurring and experimentally manipulated items with sentimental value. We further test the underlying process linking sentimental value and hedonic adaptation by showing that whereas feature-related utility decreases for all items with time, sentimental value typically does not, and that sentimental value moderates the influence of the decrement in feature-related utility on hedonic adaptation. Moreover, this moderating effect of sentimental value is driven by a shift in focus from features of the item to the associations that item possess. We conclude with a discussion of related phenomena and implications for individuals. (c) 2015 APA, all rights reserved).
Item response theory analysis of the Lichtenberg Financial Decision Screening Scale.

PubMed

Teresi, Jeanne A; Ocepek-Welikson, Katja; Lichtenberg, Peter A

2017-01-01

The focus of these analyses was to examine the psychometric properties of the Lichtenberg Financial Decision Screening Scale (LFDSS). The purpose of the screen was to evaluate the decisional abilities and vulnerability to exploitation of older adults. Adults aged 60 and over were interviewed by social, legal, financial, or health services professionals who underwent in-person training on the administration and scoring of the scale. Professionals provided a rating of the decision-making abilities of the older adult. The analytic sample included 213 individuals with an average age of 76.9 (SD = 10.1). The majority (57%) were female. Data were analyzed using item response theory (IRT) methodology. The results supported the unidimensionality of the item set. Several IRT models were tested. Ten ordinal and binary items evidenced a slightly higher reliability estimate (0.85) than other versions and better coverage in terms of the range of reliable measurement across the continuum of financial incapacity.

Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the ‘Claim Evaluation Tools’ database using Rasch modelling

PubMed Central

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-01-01

Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Psychometric properties of the communication Confidence Rating Scale for Aphasia (CCRSA): phase 1.

PubMed

Cherney, Leora R; Babbitt, Edna M; Semik, Patrick; Heinemann, Allen W

2011-01-01

Confidence is a construct that has not been explored previously in aphasia research. We developed the Communication Confidence Rating Scale for Aphasia (CCRSA) to assess confidence in communicating in a variety of activities and evaluated its psychometric properties using rating scale (Rasch) analysis. The CCRSA was administered to 21 individuals with aphasia before and after participation in a computer-based language therapy study. Person reliability of the 8-item CCRSA was .77. The 5-category rating scale demonstrated monotonic increases in average measures from low to high ratings. However, one item ("I follow news, sports, stories on TV/movies") misfit the construct defined by the other items (mean square infit = 1.69, item-measure correlation = .41). Deleting this item improved reliability to .79; the 7 remaining items demonstrated excellent fit to the underlying construct, although there was a modest ceiling effect in this sample. Pre- to posttreatment changes on the 7-item CCRSA measure were statistically significant using a paired samples t test. Findings support the reliability and sensitivity of the CCRSA in assessing participants' self-report of communication confidence. Further evaluation of communication confidence is required with larger and more diverse samples.
Measuring the impact and distress of health problems from the individual's perspective: development of the Perceived Impact of Problem Profile (PIPP)

PubMed Central

Pallant, Julie F; Misajon, RoseAnne; Bennett, Elizabeth; Manderson, Lenore

2006-01-01

Background The aim of this study was to develop and conduct preliminary validation of the Perceived Impact of Problem Profile (PIPP). Based on the biopsychosocial model of health and functioning, the PIPP was intended as a generic research and clinical measurement tool to assess the impact and distress of health conditions from the individuals' perspective. The ICF classification system was used to guide the structure of the PIPP with subscales included to assess impact on self-care, mobility, participation, relationships and psychological well-being. While the ICF focuses on the classification of objective health and health related status, the PIPP broadens this focus to address the individuals' subjective experience of their health condition. Methods An item pool of 23 items assessing both impact and distress on five key domains was generated. These were administered to 169 adults with mobility impairment. Rasch analysis using RUMM2020 was conducted to assess the psychometric properties of each set of items. Preliminary construct validation of the PIPP was performed using the EQ5D. Results For both the Impact and Distress scales of the PIPP, the five subscales (Self-care, Mobility, Participation, Relationships, and Psychological Well-being) showed adequate psychometric properties, demonstrating fit to the Rasch model. All subscales showed adequate person separation reliability and no evidence of differential item functioning for sex, age, educational level or rural vs urban residence. Preliminary validity testing using the EQ5D items provided support for the subscales. Conclusion This preliminary study, using a sample of adults with mobility impairment, provides support for the psychometric properties of the PIPP as a potential clinical and research measurement tool. The PIPP provides a brief, but comprehensive means to assess the key ICF components, focusing on the individuals' perspective of the impact and distress caused by their health condition. Further validation of its use across different health conditions and varying cultural settings is required. PMID:16808842
Problematic internet usage in US college students: a pilot study

PubMed Central

2011-01-01

Background Internet addiction among US college students remains a concern, but robust estimates of its prevalence are lacking. Methods We conducted a pilot survey of 307 college students at two US universities. Participants completed the Internet Addiction Test (IAT) as well as the Patient Health Questionnaire. Both are validated measures of problematic Internet usage and depression, respectively. We assessed the association between problematic Internet usage and moderate to severe depression using a modified Poisson regression approach. In addition, we examined the associations between individual items in the IAT and depression. Results A total of 224 eligible respondents completed the survey (73% response rate). Overall, 4% of students scored in the occasionally problematic or addicted range on the IAT, and 12% had moderate to severe depression. Endorsement of individual problematic usage items ranged from 1% to 70%. In the regression analysis, depressive symptoms were significantly associated with several individual items. Relative risk could not be estimated for three of the twenty items because of small cell sizes. Of the remaining 17 items, depressive symptoms were significantly associated with 13 of them, and three others had P values less than 0.10. There was also a significant association between problematic Internet usage overall and moderate to severe depression (relative risk 24.07, 95% confidence interval 3.95 to 146.69; P = 0.001). Conclusion The prevalence of problematic Internet usage among US college students is a cause for concern, and potentially requires intervention and treatment amongst the most vulnerable groups. The prevalence reported in this study is lower than that which has been reported in other studies, however the at-risk population is very high and preventative measures are also recommended. PMID:21696582
Development and psychometric testing the Health of Body, Mind and Spirit Scale for assessing individuals who have drug abuse histories.

PubMed

Sun, Fan-Ko; Chiang, Chun-Ying; Lu, Chu-Yun; Yu, Pei-Jane; Liao, Tzu-Chiao; Lan, Chu-Mei

2018-03-01

To develop the Health of Body, Mind and Spirit Scale (HBMSS), which was designed to assess drug abusers' health condition. Helping drug abusers to become healthy is important to healthcare professionals. However, no instrument exists to assess drug abusers' state of health. A cross-sectional questionnaire survey was implemented to examine the validity of the HBMSS. Data were collected from 2015-2016 at one drug abuse prevention centre in Taiwan. Participants (N = 320) who had abused drugs were invited to complete a preliminary 64-item version of the HBMSS. An item analysis, criterion-related validity analysis (using the Relapse Prediction Scale [RPS] score), split-half reliability testing and confirmatory factor analysis (CFA) were conducted to examine the psychometric properties of the HBMSS. The final version of the HBMSS contained 15 items that were divided into three subscales: the health of the body, mind and spirit. Cronbach's α and split-half reliability coefficients were all above .85. The factor loading of each item was between .74-.95. The HBMSS had satisfactory criterion-related validity with the RPS score (r = -.50, p < .001). A second-order CFA was conducted on the HBMSS. The fit indexes were good, χ 2 = 184.060, df = 94, χ 2 /df = 1.958 (p = .000). The entire HBMSS and the subscales had satisfactory reliability and validity. Healthcare professionals could use the HBMSS to evaluate the condition of the health of individuals with a drug abuse history. © 2017 John Wiley & Sons Ltd.
Microplastics and mesoplastics in fish from coastal and fresh waters of China.

PubMed

Jabeen, Khalida; Su, Lei; Li, Jiana; Yang, Dongqi; Tong, Chunfu; Mu, Jingli; Shi, Huahong

2017-02-01

Plastic pollution is a growing global concern. In the present study, we investigated plastic pollution in 21 species of sea fish and 6 species of freshwater fish from China. All of the species were found to ingest micro- or mesoplastics. The average abundance of microplastics varied from 1.1 to 7.2 items by individual and 0.2-17.2 items by gram. The average abundance of mesoplastics varied from 0.2 to 3.0 items by individual and 0.1-3.9 items by gram. Microplastics were abundant in 26 species, accounting for 55.9-92.3% of the total number of plastics items in each species. Thamnaconus septentrionalis contained the highest abundance of microplastics (7.2 items/individual). The average abundance of plastics in sea benthopelagic fishes was significantly higher than in freshwater benthopelagic fishes by items/individual. The plastics were dominanted by fiber in shape, transparent in color and cellophane in composition. The proportion of plastics in the stomach to the intestines showed great variation in different species, ranging from 0.5 to 1.9 by items/individual. The stomach of Harpodon nehereus and intestines of Pampus cinereus contained the highest number of plastics, (3.3) and (2.7), respectively, by items/individual. Our results suggested that plastic pollution was widespread in the investigated fish species and showed higher abundance in comparison with worldwide studies. The ingestion of plastics in fish was closely related to the habitat and gastrointestinal tract structure. We highly recommend that the entire gastrointestinal tract and digestion process be used in future investigations of plastic pollution in fish. Copyright © 2016 Elsevier Ltd. All rights reserved.
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

PubMed

Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

2013-06-01

To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
Psychometrics of Shared Decision Making and Communication as Patient Centered Measures for Two Language Groups

PubMed Central

Alvarez, Kiara; Wang, Ye; Alegria, Margarita; Ault-Brutus, Andrea; Ramanayake, Natasha; Yeh, Yi-Hui; Jeffries, Julia R.; Shrout, Patrick E.

2017-01-01

Shared decision-making (SDM) and effective patient-provider communication are key and interrelated elements of patient-centered care that impact health and behavioral health outcomes. Measurement of SDM and communication from the patient’s perspective is necessary in order to ensure that health care systems and individual providers are responsive to patient views. However, there is a void of research addressing the psychometric properties of these measures with diverse patients, including non-English speakers, and in the context of behavioral health encounters. This study evaluated the psychometric properties of two patient-centered outcome measures, the Shared Decision Making Questionnaire-9 (SDM-Q) and the Kim Alliance Scale-Communication Subscale (KAS-CM), in a sample of 239 English and Spanish-speaking behavioral health patients. One dominant factor was found for each scale and this structure was used to examine whether there was measurement invariance across the two language groups. One SDM-Q item was inconsistent with the configural invariance comparison and was removed. The remaining SDM-Q items exhibited strong invariance, meaning that item loadings and item means were similar across the two groups. The KAS-CM items had limited variability, with most respondents indicating high communication levels, and the invariance analysis was done on binary versions of the items. These had metric invariance (loadings the same over groups) but several items violated the strong invariance test. In both groups, the SDM-Q had high internal consistency, whereas the KAS-CM was only adequate. These findings help interpret results for individual patients, taking into account cultural and linguistic differences in how patients perceive SDM and patient-provider communication. PMID:27537002
Psychometrics of shared decision making and communication as patient centered measures for two language groups.

PubMed

Alvarez, Kiara; Wang, Ye; Alegria, Margarita; Ault-Brutus, Andrea; Ramanayake, Natasha; Yeh, Yi-Hui; Jeffries, Julia R; Shrout, Patrick E

2016-09-01

Shared decision making (SDM) and effective patient-provider communication are key and interrelated elements of patient-centered care that impact health and behavioral health outcomes. Measurement of SDM and communication from the patient's perspective is necessary in order to ensure that health care systems and individual providers are responsive to patient views. However, there is a void of research addressing the psychometric properties of these measures with diverse patients, including non-English speakers, and in the context of behavioral health encounters. This study evaluated the psychometric properties of 2 patient-centered outcome measures, the Shared Decision-Making Questionnaire-9 (SDM-Q) and the Kim Alliance Scale-Communication subscale (KAS-CM), in a sample of 239 English and Spanish-speaking behavioral health patients. One dominant factor was found for each scale and this structure was used to examine whether there was measurement invariance across the 2 language groups. One SDM-Q item was inconsistent with the configural invariance comparison and was removed. The remaining SDM-Q items exhibited strong invariance, meaning that item loadings and item means were similar across the 2 groups. The KAS-CM items had limited variability, with most respondents indicating high communication levels, and the invariance analysis was done on binary versions of the items. These had metric invariance (loadings the same over groups) but several items violated the strong invariance test. In both groups, the SDM-Q had high internal consistency, whereas the KAS-CM was only adequate. These findings help interpret results for individual patients, taking into account cultural and linguistic differences in how patients perceive SDM and patient-provider communication. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Spotting Incorrect Rules in Signed-Number Arithmetic by the Individual Consistency Index.

DTIC Science & Technology

1981-08-01

meaning of dimensionality of achievement data. It also shows the importance of construct validity, even in criterion referenced testing of the cognitive ... aspect of performance, and that the traditional means of item analysis that are based on taking the variances of binary scores and content analysis
The SQ3R Study Technique Enhances Comprehension of an Introductory Psychology Textbook.

ERIC Educational Resources Information Center

Chastain, Garvin; Thurber, Steven

1989-01-01

Examines the effectiveness of the SQ3R study technique in enhancing comprehension of material in an introductory psychology textbook. Finds significantly better performance on tests of recall or conceptual items for students using SQ3R than for those using individual study methods. (RS)
Free Recall Test Experience Potentiates Strategy-Driven Effects of Value on Memory

ERIC Educational Resources Information Center

Cohen, Michael S.; Rissman, Jesse; Hovhannisyan, Mariam; Castel, Alan D.; Knowlton, Barbara J.

2017-01-01

People tend to show better memory for information that is deemed valuable or important. By one mechanism, individuals selectively engage deeper, semantic encoding strategies for high value items (Cohen, Rissman, Suthana, Castel, & Knowlton, 2014). By another mechanism, information paired with value or reward is automatically strengthened in…
What-Where-When Memory and Encoding Strategies in Healthy Aging

ERIC Educational Resources Information Center

Cheke, Lucy G.

2016-01-01

Older adults exhibit disproportionate impairments in memory for item-associations. These impairments may stem from an inability to self-initiate deep encoding strategies. The present study investigates this using the "treasure-hunt task"; a what-where-when style episodic memory test that requires individuals to "hide" items…
Testing effects of free recall on organization in whole/part and part/whole transfer.

PubMed

Bacso, Sarah A; Marmurek, Harvey H C

2016-11-01

Testing of to-be-learned material facilitates subsequent learning of new material. We investigated this forward effect of testing in two experiments using the whole/part and part/whole transfer paradigms with categorized word lists. Learning was assessed for recall of individual words, higher order categories, and category clustering. In each experiment participants learned two lists in which the number of tests on the first list was varied. The first list contained either twice as many items as the second list (whole/part paradigm) or half as many items as the second list (part/whole paradigm). In the experimental condition, the part list contained half the items of the whole list. In the control condition, the two lists were unique. In the whole/part paradigm, learning of the part list was poorer in the experimental than in the control condition. Although testing during whole list learning facilitated learning of the part list, it did not moderate the negative transfer effect. In the part/whole paradigm, learning of the whole list was better in the experimental than in the control condition, and this positive transfer effect was strengthened by repeated testing of the part list. The findings are discussed in the context of discrimination and encoding explanations of the forward effect of testing. Copyright © 2016 Elsevier B.V. All rights reserved.
76 FR 80389 - Notice of Intent To Repatriate a Cultural Item: U.S. Department of the Interior, National Park...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-12-23

... cultural item meets the definition of sacred object and repatriation to the lineal descendant stated below... descendants of the individual who owned the sacred object and who wish to claim the item should contact Little... descendants of the individual who owned the sacred object and who wish to claim the item should contact Little...
The Bangor Voice Matching Test: A standardized test for the assessment of voice perception ability.

PubMed

Mühl, Constanze; Sheil, Orla; Jarutytė, Lina; Bestelmeyer, Patricia E G

2017-11-09

Recognising the identity of conspecifics is an important yet highly variable skill. Approximately 2 % of the population suffers from a socially debilitating deficit in face recognition. More recently the existence of a similar deficit in voice perception has emerged (phonagnosia). Face perception tests have been readily available for years, advancing our understanding of underlying mechanisms in face perception. In contrast, voice perception has received less attention, and the construction of standardized voice perception tests has been neglected. Here we report the construction of the first standardized test for voice perception ability. Participants make a same/different identity decision after hearing two voice samples. Item Response Theory guided item selection to ensure the test discriminates between a range of abilities. The test provides a starting point for the systematic exploration of the cognitive and neural mechanisms underlying voice perception. With a high test-retest reliability (r=.86) and short assessment duration (~10 min) this test examines individual abilities reliably and quickly and therefore also has potential for use in developmental and neuropsychological populations.
Relationship of college student characteristics and inquiry-based geometrical optics instruction to knowledge of image formation with light-ray tracing

NASA Astrophysics Data System (ADS)

Isik, Hakan

This study is premised on the fact that student conceptions of optics appear to be unrelated to student characteristics of gender, age, years since high school graduation, or previous academic experiences. This study investigated the relationships between student characteristics and student performance on image formation test items and the changes in student conceptions of optics after an introductory inquiry-based physics course. Data was collected from 39 college students who were involved in an inquiry-based physics course teaching topics of geometrical optics. Student data concerning characteristics and previous experiences with optics and mathematics were collected. Assessment of student understanding of optics knowledge for pinholes, plane mirrors, refraction, and convex lenses was collected with, the Test of Image Formation with Light-Ray Tracing instrument. Total scale and subscale scores representing the optics instrument content were derived from student pretest and posttest responses. The types of knowledge, needed to answer each optics item correctly, were categorized as situational, conceptual, procedural, and strategic knowledge. These types of knowledge were associated with student correct and incorrect responses to each item to explain the existences and changes in student scientific and naive conceptions. Correlation and stepwise multiple regression analyses were conducted to identify the student characteristics and academic experiences that significantly predicted scores on the subscales of the test. The results showed that student experience with calculus was a significant predictor of student performance on the total scale as well as on the refraction subscale of the Test of Image Formation with Light-Ray Tracing. A combination of student age and previous academic experience with precalculus was a significant predictor of student performance on the pretest pinhole subscale. Student characteristic of years since high school graduation significantly predicted the gain in student scores on pinhole and plane-mirror items from the pretest to the posttest with those students who were most recent graduates from high school doing better. Multivariate and univariate analyses of variance of the Test of Image Formation with Light-Ray Tracing pinhole scale and individual item changes from the pretest to the posttest resulted in statistically significant mean differences between total scores as well as between various individual pinhole items. There were no significant changes for individual plane-mirror items from pretest to posttest. Results revealed that there is a perceivable relationship between student optics-content knowledge and the types of knowledge required by items. At the pretest, the greatest selection of wrong responses related to the items requiring situational type of knowledge and the fewest selection of wrong responses was relate to the items requiring procedural type of knowledge. Student selection of wrong options for each item revealed the following naive optics conceptions: pinholes do not create reversed images (pretest), size and sharpness of pinhole images are related to the focus of a pinhole camera (pretest and posttest); propagation of light rays are interpreted as being radial rather than directional (pretest and posttest); no conception of image formation and observation for parallel mirrors (pretest and posttest), the place of an image depends on the position of the observer (pretest and posttest), a plane mirror reflects the images of the objects placed at one side of the mirror and the observers who were positioned at the other side of the mirror can see them (pretest and posttest); applying the law of reflection to plane mirrors without considering the variations in angles of incidence and reflection (pretest and posttest), and image observation is confused with the image formation in mirrors placed perpendicular to one another (pretest and posttest). Future research should focus on the acquisition, development, and identification of reliable measures of optics concepts, processes, types of knowledge, and specific optics understanding (i.e., pinhole, plane-mirror). Future research should focus on the identification of the more critical concepts such as changes in size and sharpness of pinhole images, image observation, image formation in general, and image formation and observation in parallel mirrors. Future research can be conducted with a larger set of participants so as to compare different instructional methods and address instructional deficiencies using more efficient statistical methods. Comparative studies can be conducted to investigate the relations of various instructional strategies on student conceptions of optics.
Defining surgical criteria for empty nose syndrome: Validation of the office-based cotton test and clinical interpretability of the validated Empty Nose Syndrome 6-Item Questionnaire.

PubMed

Thamboo, Andrew; Velasquez, Nathalia; Habib, Al-Rahim R; Zarabanda, David; Paknezhad, Hassan; Nayak, Jayakar V

2017-08-01

The validated Empty Nose Syndrome 6-Item Questionnaire (ENS6Q) identifies empty nose syndrome (ENS) patients. The unvalidated cotton test assesses improvement in ENS-related symptoms. By first validating the cotton test using the ENS6Q, we define the minimal clinically important difference (MCID) score for the ENS6Q. Individual case-control study. Fifteen patients diagnosed with ENS and 18 controls with non-ENS sinonasal conditions underwent office cotton placement. Both groups completed ENS6Q testing in three conditions-precotton, cotton in situ, and postcotton-to measure the reproducibility of ENS6Q scoring. Participants also completed a five-item transition scale ranging from "much better" to "much worse" to rate subjective changes in nasal breathing with and without cotton placement. Mean changes for each transition point, and the ENS6Q MCID, were then calculated. In the precotton condition, significant differences (P < .001) in all ENS6Q questions between ENS and controls were noted. With cotton in situ, nearly all prior ENS6Q differences normalized between ENS and control patients. For ENS patients, the changes in the mean differences between the precotton and cotton in situ conditions compared to postcotton versus cotton in situ conditions were insignificant among individuals. Including all 33 participants, the mean change in the ENS6Q between the parameters "a little better" and "about the same" was 4.25 (standard deviation [SD] = 5.79) and -2.00 (SD = 3.70), giving an MCID of 6.25. Cotton testing is a validated office test to assess for ENS patients. Cotton testing also helped to determine the MCID of the ENS6Q, which is a 7-point change from the baseline ENS6Q score. 3b. Laryngoscope, 127:1746-1752, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
An Item Gains and Losses Analysis of False Memories Suggests Critical Items Receive More Item-Specific Processing than List Items

ERIC Educational Resources Information Center

Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.

2006-01-01

In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
Testing measurement invariance of the patient-reported outcomes measurement information system pain behaviors score between the US general population sample and a sample of individuals with chronic pain.

PubMed

Chung, Hyewon; Kim, Jiseon; Cook, Karon F; Askew, Robert L; Revicki, Dennis A; Amtmann, Dagmar

2014-02-01

In order to test the difference between group means, the construct measured must have the same meaning for all groups under investigation. This study examined the measurement invariance of responses to the patient-reported outcomes measurement information system (PROMIS) pain behavior (PB) item bank in two samples: the PROMIS calibration sample (Wave 1, N = 426) and a sample recruited from the American Chronic Pain Association (ACPA, N = 750). The ACPA data were collected to increase the number of participants with higher levels of pain. Multi-group confirmatory factor analysis (MG-CFA) and two item response theory (IRT)-based differential item functioning (DIF) approaches were employed to evaluate the existence of measurement invariance. MG-CFA results supported metric invariance of the PROMIS-PB, indicating unstandardized factor loadings with equal across samples. DIF analyses revealed that impact of 6 DIF items was negligible. Based on the results of both MG-CFA and IRT-based DIF approaches, we recommend retaining the original parameter estimates obtained from the combined samples based on the results of MG-CFA.

Creation of a computer self-efficacy measure: analysis of internal consistency, psychometric properties, and validity.

PubMed

Howard, Matt C

2014-10-01

Computer self-efficacy is an often studied construct that has been shown to be related to an array of important individual outcomes. Unfortunately, existing measures of computer self-efficacy suffer from several deficiencies, including criterion contamination, outdated wording, and/or inadequate psychometric properties. For this reason, the current article presents the creation of a new computer self-efficacy measure. In Study 1, an over-representative item list is created and subsequently reduced through exploratory factor analysis to create an initial measure, and the discriminant validity of this initial measure is tested. In Study 2, the unidimensional factor structure of the initial measure is supported through confirmatory factor analysis and further reduced into a final, 12-item measure. In Study 3, the convergent and criterion validity of the 12-item measure is tested. Overall, this three study process demonstrates that the new computer self-efficacy measure has superb psychometric properties and internal reliability, and demonstrates excellent evidence for several aspects of validity. It is hoped that the 12-item computer self-efficacy measure will be utilized in future research on computer self-efficacy, which is discussed in the current article.
Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Solar panel acceptance testing using a pulsed solar simulator

NASA Technical Reports Server (NTRS)

Hershey, T. L.

1977-01-01

Utilizing specific parameters as area of an individual cell, number in series and parallel, and established coefficient of current and voltage temperature dependence, a solar array irradiated with one solar constant at AMO and at ambient temperature can be characterized by a current-voltage curve for different intensities, temperatures, and even different configurations. Calibration techniques include: uniformity in area, depth and time, absolute and transfer irradiance standards, dynamic and functional check out procedures. Typical data are given for individual cell (2x2 cm) to complete flat solar array (5x5 feet) with 2660 cells and on cylindrical test items with up to 10,000 cells. The time and energy saving of such testing techniques are emphasized.
A comparison of home-based exercise programs with and without self-manual therapy in individuals with knee osteoarthritis in community.

PubMed

Cheawthamai, Kornkamon; Vongsirinavarat, Mantana; Hiengkaew, Vimonwan; Saengrueangrob, Sasithorn

2014-07-01

The present study aimed to compare the effectiveness of the treatment programs of home-based exercise with and without self-manual therapy in individuals with knee osteoarthritis (knee OA) in community. Forty-three participants with knee OA were randomly assigned in groups. All participants received the same home-based exercise program with or without self-manual therapy over 12 weeks. Outcome measures were pain intensity, range of motions, six-minute walk test distance, the knee injury and osteoarthritis outcome score (KOOS), short-form 36 (SF-36) and satisfaction. The results showed that the self-manual therapy program significantly decreased pain at 4 weeks, increased flexion and extension at 4 and 12 weeks, and improved the KOOS in pain item and SF-36 in physical function and mental health items. The home-based exercise group showed significant increase of the six-minute walk distance at 4 and 12 weeks, improvements in the KOOS in pain and symptom items and SF-36 in the physical function and role-emotional items. Overall, the results favored a combination of self-manual therapy and home-based exercise for patients with knee OA, which apparently showed superior benefits in decreasing pain and improving active knee range of motions.
When less is more: validating a brief scale to rate interprofessional team competencies.

PubMed

Lie, Désirée A; Richter-Lagha, Regina; Forest, Christopher P; Walsh, Anne; Lohenry, Kevin

2017-01-01

There is a need for validated and easy-to-apply behavior-based tools for assessing interprofessional team competencies in clinical settings. The seven-item observer-based Modified McMaster-Ottawa scale was developed for the Team Objective Structured Clinical Encounter (TOSCE) to assess individual and team performance in interprofessional patient encounters. We aimed to improve scale usability for clinical settings by reducing item numbers while maintaining generalizability; and to explore the minimum number of observed cases required to achieve modest generalizability for giving feedback. We administered a two-station TOSCE in April 2016 to 63 students split into 16 newly-formed teams, each consisting of four professions. The stations were of similar difficulty. We trained sixteen faculty to rate two teams each. We examined individual and team performance scores using generalizability (G) theory and principal component analysis (PCA). The seven-item scale shows modest generalizability (.75) with individual scores. PCA revealed multicollinearity and singularity among scale items and we identified three potential items for removal. Reducing items for individual scores from seven to four (measuring Collaboration, Roles, Patient/Family-centeredness, and Conflict Management) changed scale generalizability from .75 to .73. Performance assessment with two cases is associated with reasonable generalizability (.73). Students in newly-formed interprofessional teams show a learning curve after one patient encounter. Team scores from a two-station TOSCE demonstrate low generalizability whether the scale consisted of four (.53) or seven items (.55). The four-item Modified McMaster-Ottawa scale for assessing individual performance in interprofessional teams retains the generalizability and validity of the seven-item scale. Observation of students in teams interacting with two different patients provides reasonably reliable ratings for giving feedback. The four-item scale has potential for assessing individual student skills and the impact of IPE curricula in clinical practice settings. IPE: Interprofessional education; SP: Standardized patient; TOSCE: Team objective structured clinical encounter.
Trunk control test as an early predictor of stroke rehabilitation outcome.

PubMed

Franchignoni, F P; Tesio, L; Ricupero, C; Martino, M T

1997-07-01

The aim of this study was to investigate the construct and predictive validity of the Trunk Control Test (TCT) in postacute stroke patients by comparing TCT scores at admission and discharge with the Functional Independence Measure (FIM) scores. Forty-nine patients participated in the study. The TCT examines four movements: rolling from a supine position to the weak side (T1) and to the strong side (T2), sitting up from a lying-down position (T3), and sitting balance (T4). The FIM is an 18-item scale (13 motor [motFIM] and 5 cognitive [cognFIM]) used to determine the level of dependence of patients in daily life. Thirty-six patients (73%) increased their TCT overall score at discharge. The TCT item-total correlations were high, both at admission and discharge (P < .0001). The individual TCT items were intercorrelated. Furthermore, the homogeneity of the TCT was confirmed by a high Cronbach's index. High correlations were found between admission and discharge scores in the different tests (TCT, FIM, and motFIM; P < .0001) and between TCT at admission and FIM (P < .0001) and motFIM (P < .0001) at admission. TCT at admission alone explained 71% of the variance in motFIM at discharge. The TCT showed a good sensitivity to change in assessing recovery of stroke patients. The high item-total correlation and Cronbach's alpha value of the TCT suggest that there is one homogeneous construct underlying the item list. The TCT construct validity was confirmed by the correlation between this test and the FIM scores. TCT at admission predicted motFIM at discharge even better than motFIM at admission alone. Possibly, the TCT captures basic motor skills that foreshadow the recovery of more complex behavioral skills described by the FIM.
Old and New Ideas for Data Screening and Assumption Testing for Exploratory and Confirmatory Factor Analysis

PubMed Central

Flora, David B.; LaBrish, Cathy; Chalmers, R. Philip

2011-01-01

We provide a basic review of the data screening and assumption testing issues relevant to exploratory and confirmatory factor analysis along with practical advice for conducting analyses that are sensitive to these concerns. Historically, factor analysis was developed for explaining the relationships among many continuous test scores, which led to the expression of the common factor model as a multivariate linear regression model with observed, continuous variables serving as dependent variables, and unobserved factors as the independent, explanatory variables. Thus, we begin our paper with a review of the assumptions for the common factor model and data screening issues as they pertain to the factor analysis of continuous observed variables. In particular, we describe how principles from regression diagnostics also apply to factor analysis. Next, because modern applications of factor analysis frequently involve the analysis of the individual items from a single test or questionnaire, an important focus of this paper is the factor analysis of items. Although the traditional linear factor model is well-suited to the analysis of continuously distributed variables, commonly used item types, including Likert-type items, almost always produce dichotomous or ordered categorical variables. We describe how relationships among such items are often not well described by product-moment correlations, which has clear ramifications for the traditional linear factor analysis. An alternative, non-linear factor analysis using polychoric correlations has become more readily available to applied researchers and thus more popular. Consequently, we also review the assumptions and data-screening issues involved in this method. Throughout the paper, we demonstrate these procedures using an historic data set of nine cognitive ability variables. PMID:22403561
Development and psychometric testing of the Dogs and WalkinG Survey (DAWGS).

PubMed

Richards, Elizabeth A; McDonough, Meghan H; Edwards, Nancy E; Lyle, Roseann M; Troped, Philip J

2013-12-01

Dog owners represent 40% of the population, a promising audience to increase population levels of physical activity. The purpose of this study was to develop and test the psychometric properties of a new instrument to assess social-cognitive theory constructs related to dog walking. Dog owners (N = 431) completed the Dogs and WalkinG Survey (DAWGS). Survey items assessed dog-walking behaviors and self-efficacy, social support, outcome expectations, and outcome expectancies for dog walking. Test-retest reliability was assessed among 252 (58%) survey respondents who completed the survey twice. Factorial validity and factorial invariance by age and walking level were tested using confirmatory factor analysis. DAWGS items demonstrated moderate test-retest reliability (p = .39-.79; k = .41-.89). Acceptable model fit was found for all subscales. All subscales were invariant by age and walking level, except self-efficacy, which showed mixed evidence of invariance. The DAWGS is a psychometrically sound instrument for examining individual and interpersonal correlates of dog walking.
Thermal Simulation Facilities Handbook.

DTIC Science & Technology

1983-02-01

tower provide incident radiation angles of 900 or less. Since each heliostat Is Individually controlled, the size of a test Item depends on application...designed such that it can be used for many other applications. (See also Section 3.) The solar furnace uses both a flat mirror ( heliostat ) that track...type solar thermal facility. It consists of four main components: (1) heliostat , (2) attenua- tor, (3) concentrator, and (4) test and control chamber
The inter-rater reliability test of the modified Morse Fall Scale among patients ≥ 55 years old in an acute care hospital in Singapore.

PubMed

Tang, Wing Sze; Chow, Yeow Leng; Koh, Serena Siew Lin

2014-02-01

A prospective, descriptive study was conducted in an acute care hospital in Singapore to determine the inter-rater reliability of the modified Morse Fall Scale by evaluating the degrees of agreement on the ratings of the individual items and overall score between the 'gold standard' assessor and the facility assessors. One hundred and forty-two subjects were recruited during the 1.5 month data collection period. The simple and weighted κ-values were all > 0.8 except for the item 'effects of medications' (κ and κw = 0.63), and the correlation coefficient (rs = 0.89) was significantly high at a significance level of < 0.001. The modified Morse Fall Scale was shown to be a reliable fall risk assessment tool having a relative high inter-rater reliability level for the overall score and individual items. This study provides evidence-based psychometric support for the clinical application of this tool. © 2013 Wiley Publishing Asia Pty Ltd.
Stakeholder opinion of functional communication activities following traumatic brain injury.

PubMed

Larkins, B M; Worrall, L E; Hickson, L M

2004-07-01

To establish a process whereby assessment of functional communication reflects the authentic communication of the target population. The major functional communication assessments available from the USA may not be as relevant to those who reside elsewhere, nor assessments developed primarily for persons who have had a stroke as relevant for traumatic brain injury rehabilitation. The investigation used the Nominal Group Technique to elicit free opinion and support individuals who have compromised communication ability. A survey mailed out sampled a larger number of stakeholders to test out differences among groups. Five stakeholder groups generated items and the survey determined relative 'importance'. The stakeholder groups in both studies comprised individuals with traumatic brain injury and their families, health professionals, third-party payers, employers, and Maori, the indigenous population of New Zealand. There was no statistically significant difference found between groups for 19 of the 31 items. Only half of the items explicitly appear on a well-known USA functional communication assessment. The present study has implications for whether functional communication assessments are valid across cultures and the type of impairment.
The influences of partner accuracy and partner memory ability on social false memories.

PubMed

Numbers, Katya T; Meade, Michelle L; Perga, Vladimir A

2014-11-01

In this study, we examined whether increasing the proportion of false information suggested by a confederate would influence the magnitude of socially introduced false memories in the social contagion paradigm Roediger, Meade, & Bergman (Psychonomic Bulletin & Review 8:365-371, 2001). One participant and one confederate collaboratively recalled items from previously studied household scenes. During collaboration, the confederate interjected 0 %, 33 %, 66 %, or 100 % false items. On subsequent individual-recall tests across three experiments, participants were just as likely to incorporate misleading suggestions from a partner who was mostly accurate (33 % incorrect) as they were from a partner who was not at all accurate (100 % incorrect). Even when participants witnessed firsthand that their partner had a very poor memory on a related memory task, they were still as likely to incorporate the confederate's entirely misleading suggestions on subsequent recall and recognition tests (Exp. 2). Only when participants witnessed firsthand that their partner had a very poor memory on a practice test of the experimental task itself were they able to reduce false memory, and this reduction occurred selectively on a subsequent individual recognition test (Exp. 3). These data demonstrate that participants do not always consider their partners' memory ability when working on collaborative memory tasks.
Has there been a change in the knowledge of GP registrars between 2011 and 2016 as measured by performance on common items in the Applied Knowledge Test?

PubMed

Neden, Catherine A; Parkin, Claire; Blow, Carol; Siriwardena, Aloysius Niroshan

2018-05-08

The aim of this study was to assess whether the absolute standard of candidates sitting the MRCGP Applied Knowledge Test (AKT) between 2011 and 2016 had changed. It is a descriptive study comparing the performance on marker questions of a reference group of UK graduates taking the AKT for the first time between 2011 and 2016. Using aggregated examination data, the performance of individual 'marker' questions was compared using Pearson's chi-squared tests and trend-line analysis. Binary logistic regression was used to analyse changes in performance over the study period. Changes in performance of individual marker questions using Pearson's chi-squared test showed statistically significant differences in 32 of the 49 questions included in the study. Trend line analysis showed a positive trend in 29 questions and a negative trend in the remaining 23. The magnitude of change was small. Logistic regression did not demonstrate any evidence for a change in the performance of the question set over the study period. However, candidates were more likely to get items on administration wrong compared with clinical medicine or research. There was no evidence of a change in performance of the question set as a whole.
There’s more to food store choice than proximity: a questionnaire development study

PubMed Central

2013-01-01

Background Proximity of food stores is associated with dietary intake and obesity; however, individuals frequently shop at stores that are not the most proximal. Little is known about other factors that influence food store choice. The current research describes the development of the Food Store Selection Questionnaire (FSSQ) and describes preliminary results of field testing the questionnaire. Methods Development of the FSSQ involved a multidisciplinary literature review, qualitative analysis of focus group transcripts, and expert and community reviews. Field testing consisted of 100 primary household food shoppers (93% female, 64% African American), in rural and urban Arkansas communities, rating FSSQ items as to their importance in store choice and indicating their top two reasons. After eliminating 14 items due to low mean importance scores and high correlations with other items, the final FSSQ questionnaire consists of 49 items. Results Items rated highest in importance were: meat freshness; store maintenance; store cleanliness; meat varieties; and store safety. Items most commonly rated as top reasons were: low prices; proximity to home; fruit/vegetable freshness; fruit/vegetable variety; and store cleanliness. Conclusions The FSSQ is a comprehensive questionnaire for detailing key reasons in food store choice. Although proximity to home was a consideration for participants, there were clearly other key factors in their choice of a food store. Understanding the relative importance of these different dimensions driving food store choice in specific communities may be beneficial in informing policies and programs designed to support healthy dietary intake and obesity prevention. PMID:23773428
There's more to food store choice than proximity: a questionnaire development study.

PubMed

Krukowski, Rebecca A; Sparks, Carla; DiCarlo, Marisha; McSweeney, Jean; West, Delia Smith

2013-06-17

Proximity of food stores is associated with dietary intake and obesity; however, individuals frequently shop at stores that are not the most proximal. Little is known about other factors that influence food store choice. The current research describes the development of the Food Store Selection Questionnaire (FSSQ) and describes preliminary results of field testing the questionnaire. Development of the FSSQ involved a multidisciplinary literature review, qualitative analysis of focus group transcripts, and expert and community reviews. Field testing consisted of 100 primary household food shoppers (93% female, 64% African American), in rural and urban Arkansas communities, rating FSSQ items as to their importance in store choice and indicating their top two reasons. After eliminating 14 items due to low mean importance scores and high correlations with other items, the final FSSQ questionnaire consists of 49 items. Items rated highest in importance were: meat freshness; store maintenance; store cleanliness; meat varieties; and store safety. Items most commonly rated as top reasons were: low prices; proximity to home; fruit/vegetable freshness; fruit/vegetable variety; and store cleanliness. The FSSQ is a comprehensive questionnaire for detailing key reasons in food store choice. Although proximity to home was a consideration for participants, there were clearly other key factors in their choice of a food store. Understanding the relative importance of these different dimensions driving food store choice in specific communities may be beneficial in informing policies and programs designed to support healthy dietary intake and obesity prevention.
Evolution of a Test Item

ERIC Educational Resources Information Center

Spaan, Mary

2007-01-01

This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
NEWS for Africa: adaptation and reliability of a built environment questionnaire for physical activity in seven African countries.

PubMed

Oyeyemi, Adewale L; Kasoma, Sandra S; Onywera, Vincent O; Assah, Felix; Adedoyin, Rufus A; Conway, Terry L; Moss, Sarah J; Ocansey, Reginald; Kolbe-Alexander, Tracy L; Akinroye, Kingsley K; Prista, Antonio; Larouche, Richard; Gavand, Kavita A; Cain, Kelli L; Lambert, Estelle V; Aryeetey, Richmond; Bartels, Clare; Tremblay, Mark S; Sallis, James F

2016-03-08

Built environment and policy interventions are effective strategies for controlling the growing worldwide deaths from physical inactivity-related non-communicable diseases. To improve built environment research and develop African specific evidence, it is important to first tailor built environment measures to African contexts and assess their psychometric properties across African countries. This study reports on the adaptation and test-retest reliability of the Neighborhood Environment Walkability Scale in seven sub-Saharan African countries (NEWS-Africa). The original NEWS comprising 8 subscales measuring reported physical and social attributes of neighborhood environments was systematically adapted for Africa through extensive input from physical activity and public health researchers, built environment professionals, and residents in seven African countries: Cameroon, Ghana, Kenya, Mozambique, Nigeria, South Africa and Uganda. Cognitive testing of NEWS-Africa was conducted among diverse residents (N = 109, 50 youth [12 - 17 years] and 59 adults [22 - 67 years], 69 % from low socioeconomic status [SES] neighborhoods). NEWS-Africa was translated into local languages and evaluated for 2-week test-retest reliability in adult participants (N = 301; female = 50.2 %; age = 32.3 ± 12.9 years) purposively recruited from neighborhoods varying in walkability (high and low walkable) and SES (high and low income) and from villages in six of seven participating countries. The original 67 NEWS items was expanded to 89 scores (76 individual NEWS items and 13 computed scales). Several modifications were made to individual items, and some new items were added to capture important attributes in the African environment. A new scale on personal safety was created, and the aesthetics scale was enlarged to reflect African specific characteristics. Over 95 % of all NEWS-Africa scores (items plus computed scales) demonstrated evidence of "excellent" (ICCs > .75 %) or "good" (ICCs = 0.60 to 0.74) reliability. Seven (53.8 %) of the 13 computed NEWS scales demonstrated "excellent" agreement and the other six had "good" agreement. No items or scales demonstrated "poor" reliability (ICCs < .40). The systematic adaptation and initial psychometric evaluation of NEWS-Africa indicates the instrument is feasible and reliable for use with adults of diverse demographic characteristics in Africa. The measure is likely to be useful for research, surveillance of built environment conditions for planning purposes, and to evaluate physical activity and policy interventions in Africa.
Assessment of Preference for Edible and Leisure Items in Individuals with Dementia

ERIC Educational Resources Information Center

Ortega, Javier Virues; Iwata, Brian A.; Nogales-Gonzalez, Celia; Frades, Belen

2012-01-01

We conducted 2 studies on reinforcer preference in patients with dementia. Results of preference assessments yielded differential selections by 14 participants. Unlike prior studies with individuals with intellectual disabilities, all participants showed a noticeable preference for leisure items over edible items. Results of a subsequent analysis…
A Multidimensional Tool Based on the eHealth Literacy Framework: Development and Initial Validity Testing of the eHealth Literacy Questionnaire (eHLQ).

PubMed

Kayser, Lars; Karnoe, Astrid; Furstrand, Dorthe; Batterham, Roy; Christensen, Karl Bang; Elsworth, Gerald; Osborne, Richard H

2018-02-12

For people to be able to access, understand, and benefit from the increasing digitalization of health services, it is critical that services are provided in a way that meets the user's needs, resources, and competence. The objective of the study was to develop a questionnaire that captures the 7-dimensional eHealth Literacy Framework (eHLF). Draft items were created in parallel in English and Danish. The items were generated from 450 statements collected during the conceptual development of eHLF. In all, 57 items (7 to 9 items per scale) were generated and adjusted after cognitive testing. Items were tested in 475 people recruited from settings in which the scale was intended to be used (community and health care settings) and including people with a range of chronic conditions. Measurement properties were assessed using approaches from item response theory (IRT) and classical test theory (CTT) such as confirmatory factor analysis (CFA) and reliability using composite scale reliability (CSR); potential bias due to age and sex was evaluated using differential item functioning (DIF). CFA confirmed the presence of the 7 a priori dimensions of eHLF. Following item analysis, a 35-item 7-scale questionnaire was constructed, covering (1) using technology to process health information (5 items, CSR=.84), (2) understanding of health concepts and language (5 items, CSR=.75), (3) ability to actively engage with digital services (5 items, CSR=.86), (4) feel safe and in control (5 items, CSR=.87), (5) motivated to engage with digital services (5 items, CSR=.84), (6) access to digital services that work (6 items, CSR=.77), and (7) digital services that suit individual needs (4 items, CSR=.85). A 7-factor CFA model, using small-variance priors for cross-loadings and residual correlations, had a satisfactory fit (posterior productive P value: .27, 95% CI for the difference between the observed and replicated chi-square values: -63.7 to 133.8). The CFA showed that all items loaded strongly on their respective factors. The IRT analysis showed that no items were found to have disordered thresholds. For most scales, discriminant validity was acceptable; however, 2 pairs of dimensions were highly correlated; dimensions 1 and 5 (r=.95), and dimensions 6 and 7 (r=.96). All dimensions were retained because of strong content differentiation and potential causal relationships between these dimensions. There is no evidence of DIF. The eHealth Literacy Questionnaire (eHLQ) is a multidimensional tool based on a well-defined a priori eHLF framework with robust properties. It has satisfactory evidence of construct validity and reliable measurement across a broad range of concepts (using both CTT and IRT traditions) in various groups. It is designed to be used to understand and evaluate people's interaction with digital health services. ©Lars Kayser, Astrid Karnoe, Dorthe Furstrand, Roy Batterham, Karl Bang Christensen, Gerald Elsworth, Richard H Osborne. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 12.02.2018.
A Mixed Effects Randomized Item Response Model

ERIC Educational Resources Information Center

Fox, J.-P.; Wyrick, Cheryl

2008-01-01

The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…

Parallel interactive retrieval of item and associative information from event memory.

PubMed

Cox, Gregory E; Criss, Amy H

2017-09-01

Memory contains information about individual events (items) and combinations of events (associations). Despite the fundamental importance of this distinction, it remains unclear exactly how these two kinds of information are stored and whether different processes are used to retrieve them. We use both model-independent qualitative properties of response dynamics and quantitative modeling of individuals to address these issues. Item and associative information are not independent and they are retrieved concurrently via interacting processes. During retrieval, matching item and associative information mutually facilitate one another to yield an amplified holistic signal. Modeling of individuals suggests that this kind of facilitation between item and associative retrieval is a ubiquitous feature of human memory. Copyright © 2017 Elsevier Inc. All rights reserved.
A Conflict Management Scale for Pharmacy

PubMed Central

Gregory, Paul A.; Martin, Craig

2009-01-01

Objectives To develop and establish the validity and reliability of a conflict management scale specific to pharmacy practice and education. Methods A multistage inventory-item development process was undertaken involving 93 pharmacists and using a previously described explanatory model for conflict in pharmacy practice. A 19-item inventory was developed, field tested, and validated. Results The conflict management scale (CMS) demonstrated an acceptable degree of reliability and validity for use in educational or practice settings to promote self-reflection and self-awareness regarding individuals' conflict management styles. Conclusions The CMS provides a unique, pharmacy-specific method for individuals to determine and reflect upon their own conflict management styles. As part of an educational program to facilitate self-reflection and heighten self-awareness, the CMS may be a useful tool to promote discussions related to an important part of pharmacy practice. PMID:19960081
AN EVALUATION OF ANTECEDENT EXERCISE ON BEHAVIOR MAINTAINED BY AUTOMATIC REINFORCEMENT USING A THREE-COMPONENT MULTIPLE SCHEDULE

PubMed Central

Morrison, Heather; Roscoe, Eileen M; Atwell, Amy

2011-01-01

We evaluated antecedent exercise for treating the automatically reinforced problem behavior of 4 individuals with autism. We conducted preference assessments to identify leisure and exercise items that were associated with high levels of engagement and low levels of problem behavior. Next, we conducted three 3-component multiple-schedule sequences: an antecedent-exercise test sequence, a noncontingent leisure-item control sequence, and a social-interaction control sequence. Within each sequence, we used a 3-component multiple schedule to evaluate preintervention, intervention, and postintervention effects. Problem behavior decreased during the postintervention component relative to the preintervention component for 3 of the 4 participants during the exercise-item assessment; however, the effects could not be attributed solely to exercise for 1 of these participants. PMID:21941383
Psychometrics of the preschool behavioral and emotional rating scale with children from early childhood special education settings.

PubMed

Lambert, Matthew C; Cress, Cynthia J; Epstein, Michael H

2015-01-01

In a previous study with a nationally representative sample, researchers found that the items of the Preschool Behavioral and Emotional Rating Scale can best be described by a four-factor structure model (Emotional Regulation, School Readiness, Social Confidence, and Family Involvement). The findings of this investigation replicate and extend these previous results with a national sample of children (N = 1,075) with disabilities enrolled in early childhood special education programs. Data were analyzed using classical tests theory, Rasch modeling, and confirmatory factor analysis. Results confirmed that for the most part, individual items were internally consistent within a four-factor model and showed consistent item difficulty, discrimination, and fit relative to their respective subscale scores. © 2015 Michigan Association for Infant Mental Health.
The Effect of the Position of an Item within a Test on the Item Difficulty Value.

ERIC Educational Resources Information Center

Rubin, Lois S.; Mott, David E. W.

An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
Relevance of Item Analysis in Standardizing an Achievement Test in Teaching of Physical Science in B.Ed Syllabus

ERIC Educational Resources Information Center

Marie, S. Maria Josephine Arokia; Edannur, Sreekala

2015-01-01

This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the 'Claim Evaluation Tools' database using Rasch modelling.

PubMed

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-05-25

The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Misattributing the Source of Self-Generated Representations Related to Dissociative and Psychotic Symptoms

PubMed Central

Chiu, Chui-De; Tseng, Mei-Chih Meg; Chien, Yi-Ling; Liao, Shih-Cheng; Liu, Chih-Min; Yeh, Yei-Yu; Hwu, Hai-Gwo

2016-01-01

Objective: An intertwined relationship has been found between dissociative and psychotic symptoms, as the two symptom clusters frequently co-occur, suggesting some shared risk factors. Using a source monitoring paradigm, previous studies have shown that patients with schizophrenia made more errors in source monitoring, suggesting that a weakened sense of individuality may be associated with psychotic symptoms. However, no studies have verified a relationship between sense of individuality and dissociation, and it is unclear whether an altered sense of individuality is a shared sociocognitive deficit underlying both dissociation and psychosis. Method: Data from 80 acute psychiatric patients with unspecified mental disorders were analyzed to test the hypothesis that an altered sense of individuality underlies dissociation and psychosis. Behavioral tasks, including tests of intelligence and source monitoring, as well as interview schedules and self-report measures of dissociative and psychotic symptoms, general psychopathology, and trauma history, were administered. Results: Significant correlations of medium effect sizes indicated an association between errors attributing the source of self-generated items and positive psychotic symptoms and the absorption and amnesia measures of dissociation. The associations with dissociative measures remained significant after the effects of intelligence, general psychopathology, and trauma history were excluded. Moreover, the relationships between source misattribution and dissociative measures remained marginally significant and significant after controlling for positive and negative psychotic symptoms, respectively. Limitations: Self-reported measures were collected from a small sample, and most of the participants were receiving medications when tested, which may have influenced their cognitive performance. Conclusions: A tendency to misidentify the source of self-generated items characterized both dissociation and psychosis. An altered sense of individuality embedded in self-referential representations appears to be a common sociocognitive deficit of dissociation and psychosis. PMID:27148147
Normative accuracy and response time data for the computerized Benton Facial Recognition Test (BFRT-c).

PubMed

Rossion, Bruno; Michel, Caroline

2018-03-16

We report normative data from a large (N = 307) sample of young adult participants tested with a computerized version of the long form of the classical Benton Facial Recognition Test (BFRT; Benton & Van Allen, 1968). The BFRT-c requires participants to match a target face photograph to either one or three of six face photographs presented simultaneously. We found that the percent accuracy on the BFRT-c (81%-83%) was below ceiling yet well above chance level, with little interindividual variance in this typical population sample, two important aspects of a sensitive clinical test. Although the split-half reliability on response accuracy was relatively low, due to the large variability in difficulty across items, the correct response times measured in this version-completed in 3 min, on average-provide a reliable and critical complementary measure of performance at individual unfamiliar-face matching. In line with previous observations from other measures, females outperformed male participants at the BFRT-c, especially for female faces. In general, performance was also lower following lighting changes than following head rotations, in line with previous studies that have emphasized participants' limited ability to match pictures of unfamiliar faces with important variations in illumination. Overall, this normative data set supports the validity of the BFRT-c as a key component of a battery of tests to identify clinical impairments in individual face recognition, such as observed in acquired prosopagnosia. However, this analysis strongly recommends that researchers consider the full test results: Beyond global indexes of performance based on accuracy rates only, they should consider the time taken to match individual faces as well as the variability in performance across items.
Psychometric properties of the Activities-specific Balance Confidence Scale among individuals with a lower-limb amputation.

PubMed

Miller, William C; Deathe, A Barry; Speechley, Mark

2003-05-01

To evaluate the internal consistency, test-retest reliability, and construct validity of the Activities-specific Balance Confidence (ABC) Scale among people who have a lower-limb amputation. Retest design. A university-affiliated outpatient amputee clinic in Ontario. Two samples of individuals who have unilateral transtibial and transfemoral amputation. Sample 1 (n=54) was a consecutive and sample 2 (n=329) a convenience sample of all members of the clinic population. Not applicable. Repeated application of the ABC Scale, a 16-item questionnaire that assesses confidence in performing various mobility-related tasks. Correlation to test hypothesized relationships between the ABC Scale and the 2-minute walk (2MWT) and the timed up-and-go (TUG) tests; and assessment of the ability of the ABC Scale to discriminate among groups based on amputation cause, amputation level, mobility device use, automatic stepping ability, wearing time, stair climbing ability, and walking distance. Test-retest reliability (intraclass correlation coefficient) of the ABC Scale was .91 (95% confidence interval [CI], .84-.95) with individual item test-retest coefficients ranging from .53 to .87. Internal consistency, measured by Cronbach alpha, was .95. Hypothesized associations with the 2MWT and TUG test were observed with correlations of .72 (95% CI, .56-.84) and -.70 (95% CI, -.82 to -.53), respectively. The ABC Scale discriminated between all groups except those based on amputation level. Balance confidence, as measured by the ABC Scale, is a construct that provides unique information potentially useful to clinicians who provide amputee rehabilitation. The ABC Scale is reliable, with strong support for validity. Study of the scale's responsiveness is recommended.
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

ERIC Educational Resources Information Center

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Test item linguistic complexity and assessments for deaf students.

PubMed

Cawthon, Stephanie

2011-01-01

Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
Development and psychometric properties of the Carer – Head Injury Neurobehavioral Assessment Scale (C-HINAS) and the Carer – Head Injury Participation Scale (C-HIPS): patient and family determined outcome scales

PubMed Central

Deb, Shoumitro; Bryant, Eleanor; Morris, Paul G; Prior, Lindsay; Lewis, Glyn; Haque, Sayeed

2007-01-01

Objective Develop and assess the psychometric properties of the Carer – Head Injury Participation Scale (C-HIPS) and its biggest factor the Carer – Head Injury Neurobehavioral Assessment Scale (C-HINAS). Furthermore, the aim was to examine the inter-informant reliability by comparing the self reports of individuals with traumatic brain injury (TBI) with the carer reports on the C-HIPS and the C-HINAS. Method Thirty-two TBI individuals and 27 carers took part in in-depth qualitative interviews exploring the consequences of the TBI. Interview transcripts were analysed and key themes and concepts were used to construct a 49-item and 58-item patient (Patient – Head Injury Participation Scale [P-HIPS]) and carer outcome measure (C-HIPS) respectively, of which 49 were parallel items and nine additional items were used to assess carer burden. Postal versions of the P-HIPS, C-HIPS, Mayo Portland Adaptability Inventory-3 (MPAI-3), and the Glasgow Outcome Scale-Extended (GOSE) were completed by a cohort of 113 TBI individuals and 80 carers. Data from a sub-group of 66 patient/carer pairs were used to compare inter-informant reliability between the P-HIPS and the C-HIPS, and the P-HINAS and the C-HINAS respectively. Results All individual 49 items of the C-HIPS and their total score showed good test-retest reliability (0.95) and internal consistency (0.95). Comparisons with the MPAI-3 and GOSE found a good correlation with the MPAI-3 (0.7) and a moderate negative correlation with the GOSE (−0.6). Factor analysis of these items extracted a 4-factor structure which represented the domains ‘Emotion/Behavior’ (C-HINAS), ‘Independence/Community Living’, ‘Cognition’, and ‘Physical’. The C-HINAS showed good internal consistency (0.92), test-retest reliability (0.93), and concurrent validity with one MPAI subscale (0.7). Assessment of inter-informant reliability revealed good correspondence between the reports of the patients and the carers for both the C-HIPS (0.83) and the C-HINAS (0.82). Conclusion Both the C-HINAS and the C-HIPS show strong psychometric properties. The qualitative methodology employed in the construction stage of the questionnaires provided good evidence of face and content validity. Comparisons between the P-HIPS and the C-HIPS, and the P-HINAS and the C-HINAS indicated high levels of agreement suggesting that in situations where the patient is unable to provide self-reports, information provided by the carer could be used. PMID:19300569
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

PubMed Central

Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
Analysis of the psychometric properties of the American Orthopaedic Foot and Ankle Society Score (AOFAS) in rheumatoid arthritis patients: application of the Rasch model.

PubMed

Conceição, Cristiano Sena da; Neto, Mansueto Gomes; Neto, Anolino Costa; Mendes, Selena M D; Baptista, Abrahão Fontes; Sá, Kátia Nunes

2016-01-01

To tested the reliability and validity of Aofas in a sample of rheumatoid arthritis patients. The scale was applicable to rheumatoid arthritis patients, twice by the interviewer 1 and once by the interviewer 2. The Aofas was subjected to test-retest reliability analysis (with 20 Rheumatoid arthritis subjects). The psychometric properties were investigated using Rasch analysis on 33 Rheumatoid arthritis patients. Intra-Class Correlation Coefficient (ICC) were (0.90
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

PubMed

Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
EORTC QLQ-COMU26: a questionnaire for the assessment of communication between patients and professionals. Phase III of the module development in ten countries.

PubMed

Arraras, Juan Ignacio; Wintner, Lisa M; Sztankay, Monika; Tomaszewski, Krzysztof A; Hofmeister, Dirk; Costantini, Anna; Bredart, Anne; Young, Teresa; Kuljanic, Karin; Tomaszewska, Iwona M; Kontogianni, Meropi; Chie, Wei-Chu; Kulis, Dagmara; Greimel, Eva

2017-05-01

Communication between patients and professionals is one major aspect of the support offered to cancer patients. The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group (QLG) has developed a cancer-specific instrument for the measurement of different issues related to the communication between cancer patients and their health care professionals. Questionnaire development followed the EORTC QLG Module Development Guidelines. A provisional questionnaire was pre-tested (phase III) in a multicenter study within ten countries from five cultural areas (Northern and South Europe, UK, Poland and Taiwan). Patients from seven subgroups (before, during and after treatment, for localized and advanced disease each, plus palliative patients) were recruited. Structured interviews were conducted. Qualitative and quantitative analyses have been performed. One hundred forty patients were interviewed. Nine items were deleted and one shortened. Patients' comments had a key role in item selection. No item was deleted due to just quantitative criteria. Consistency was observed in patients' answers across cultural areas. The revised version of the module EORTC QLQ-COMU26 has 26 items, organized in 6 scales and 4 individual items. The EORTC COMU26 questionnaire can be used in daily clinical practice and research, in various patient groups from different cultures. The next step will be an international field test with a large heterogeneous group of cancer patients.
To call a cloud 'cirrus': sound symbolism in names for categories or items.

PubMed

Ković, Vanja; Sučević, Jelena; Styles, Suzy J

2017-01-01

The aim of the present paper is to experimentally test whether sound symbolism has selective effects on labels with different ranges-of-reference within a simple noun-hierarchy. In two experiments, adult participants learned the make up of two categories of unfamiliar objects ('alien life forms'), and were passively exposed to either category-labels or item-labels, in a learning-by-guessing categorization task. Following category training, participants were tested on their visual discrimination of object pairs. For different groups of participants, the labels were either congruent or incongruent with the objects. In Experiment 1, when trained on items with individual labels, participants were worse (made more errors) at detecting visual object mismatches when trained labels were incongruent. In Experiment 2, when participants were trained on items in labelled categories, participants were faster at detecting a match if the trained labels were congruent, and faster at detecting a mismatch if the trained labels were incongruent. This pattern of results suggests that sound symbolism in category labels facilitates later similarity judgments when congruent, and discrimination when incongruent, whereas for item labels incongruence generates error in judgements of visual object differences. These findings reveal that sound symbolic congruence has a different outcome at different levels of labelling within a noun hierarchy. These effects emerged in the absence of the label itself, indicating subtle but pervasive effects on visual object processing.
Measuring assessment standards in undergraduate medical programs: Development and validation of AIM tool.

PubMed

Sajjad, Madiha; Khan, Rehan Ahmed; Yasmeen, Rahila

2018-01-01

To develop a tool to evaluate faculty perceptions of assessment quality in an undergraduate medical program. The Assessment Implementation Measure (AIM) tool was developed by a mixed method approach. A preliminary questionnaire developed through literature review was submitted to a panel of 10 medical education experts for a three-round 'Modified Delphi technique'. Panel agreement of > 75% was considered the criterion for inclusion of items in the questionnaire. Cognitive pre-testing of five faculty members was conducted. Pilot study was done with 30 randomly selected faculty members. Content validity index (CVI) was calculated for individual items (I-CVI) and composite scale (S-CVI). Cronbach's alpha was calculated to determine the internal consistency reliability of the tool. The final AIM tool had 30 items after the Delphi process. S-CVI was 0.98 with the S-CVI/Avg method and 0.86 by S-CVI/UA method, suggesting good content validity. Cut-off value of < 0.9 I-CVI was taken as criterion for item deletion. Cognitive pre-testing revealed good item interpretation. Cronbach's alpha calculated for the AIM was 0.9, whereas Cronbach's alpha for the four domains ranged from 0.67 to 0.80. 'AIM' is a relevant and useful instrument with good content validity and reliability of results, and may be used to evaluate the teachers´ perceptions about assessment quality.
Memory for Self-Performed Actions in Individuals with Asperger Syndrome

PubMed Central

Zalla, Tiziana; Daprati, Elena; Sav, Anca-Maria; Chaste, Pauline; Nico, Daniele; Leboyer, Marion

2010-01-01

Memory for action is enhanced if individuals are allowed to perform the corresponding movements, compared to when they simply listen to them (enactment effect). Previous studies have shown that individuals with Autism Spectrum Disorders (ASD) have difficulties with processes involving the self, such as autobiographical memories and self performed actions. The present study aimed at assessing memory for action in Asperger Syndrome (AS). We investigated whether adults with AS would benefit from the enactment effect when recalling a list of previously performed items vs. items that were only visually and verbally experienced through three experimental tasks (Free Recall, Old/New Recognition and Source Memory). The results showed that while performance on Recognition and Source Memory tasks was preserved in individuals with AS, the enactment effect for self-performed actions was not consistently present, as revealed by the lower number of performed actions being recalled on the Free Recall test, as compared to adults with typical development. Subtle difficulties in encoding specific motor and proprioceptive signals during action execution in individuals with AS might affect retrieval of relevant personal episodic information. These disturbances might be associated to an impaired action monitoring system. PMID:20967277

Memory for self-performed actions in individuals with Asperger syndrome.

PubMed

Zalla, Tiziana; Daprati, Elena; Sav, Anca-Maria; Chaste, Pauline; Nico, Daniele; Leboyer, Marion

2010-10-12

Memory for action is enhanced if individuals are allowed to perform the corresponding movements, compared to when they simply listen to them (enactment effect). Previous studies have shown that individuals with Autism Spectrum Disorders (ASD) have difficulties with processes involving the self, such as autobiographical memories and self performed actions. The present study aimed at assessing memory for action in Asperger Syndrome (AS). We investigated whether adults with AS would benefit from the enactment effect when recalling a list of previously performed items vs. items that were only visually and verbally experienced through three experimental tasks (Free Recall, Old/New Recognition and Source Memory). The results showed that while performance on Recognition and Source Memory tasks was preserved in individuals with AS, the enactment effect for self-performed actions was not consistently present, as revealed by the lower number of performed actions being recalled on the Free Recall test, as compared to adults with typical development. Subtle difficulties in encoding specific motor and proprioceptive signals during action execution in individuals with AS might affect retrieval of relevant personal episodic information. These disturbances might be associated to an impaired action monitoring system.
The development and initial validation of a questionnaire to measure help-seeking behaviour in patients with new onset rheumatoid arthritis.

PubMed

Stack, Rebecca J; Mallen, Christian D; Deighton, Chris; Kiely, Patrick; Shaw, Karen L; Booth, Alison; Kumar, Kanta; Thomas, Susan; Rowan, Ian; Horne, Rob; Nightingale, Peter; Herron-Marx, Sandy; Jinks, Clare; Raza, Karim

2015-12-01

Early treatment for rheumatoid arthritis (RA) is vital. However, people often delay in seeking help at symptom onset. An assessment of the reasons behind patient delay is necessary to develop interventions to promote rapid consultation. Using a mixed methods design, we aimed to develop and test a questionnaire to assess the barriers to help seeking at RA onset. Questionnaire items were extracted from previous qualitative studies. Fifteen people with a lived experience of arthritis participated in focus groups to enhance the questionnaire's face validity. The questionnaire was also reviewed by groups of multidisciplinary health-care professionals. A test-retest survey of 41 patients with newly presenting RA or unclassified arthritis assessed the questionnaire items' intraclass correlations. During focus groups, participants rephrased questions, added questions and deleted items not relevant to the questionnaire's aims. Participants organized items into themes: early symptom experience, initial reactions to symptoms, self-management behaviours, causal beliefs, involvement of significant others, pre-diagnosis knowledge about RA, direct barriers to seeking help and relationship with GP. The test-retest survey identified seven items (out of 79) with low intraclass correlations which were removed from the final questionnaire. The involvement of people with a lived experience of arthritis and multidisciplinary health-care professionals in the preliminary validation of the DELAY (delays in evaluating arthritis early) questionnaire has enriched its development. Preliminary assessment established its reliability. The DELAY questionnaire provides a tool for researchers to evaluate individual, cultural and health service barriers to help-seeking behaviour at RA onset. © 2014 John Wiley & Sons Ltd.
Cross-Culture Validation of the HIV/AIDS Stress Scale: The Development of a Revised Chinese Version.

PubMed

Niu, Lu; Qiu, Yangyang; Luo, Dan; Chen, Xi; Wang, Min; Pakenham, Kenneth I; Zhang, Xixing; Huang, Zhulin; Xiao, Shuiyuan

2016-01-01

Being HIV-infected is a stressful experience for many individuals. To assess HIV-related stress in the Chinese context, a measure with satisfied psychometric properties is yet underdeveloped. This study aimed to examine the psychometric characteristics of a simplified Chinese version of the HIV/AIDS Stress Scale (SS-HIV) among people living with HIV/AIDS in central China. A total of 667 people living with HIV (92% were male) were recruited from March 1st 2014 to August 31th 2015 by consecutive sampling. A standard questionnaire package containing the Chinese HIV/AIDS Stress Scale (CSS-HIV), the Chinese Patient Health Questionnaire-9 (PHQ-9), and the Chinese Generalized Anxiety Disorder Scale (GAD-7) were administered to all participants, and 38 of the participants were selected randomly to be re-tested in four weeks after the initial testing. Our data supported that a revised 17-item CSS-HIV had adequate psychometric properties. It consisted of 3 factors: emotional stress (6 items), social stress (6 items) and instrumental stress (5 items). The overall Cronbach's α was 0.906, and the test-retest reliability coefficient was 0.832. The revised CSS-HIV was significantly correlated with the number of HIV-related symptoms, as well as scores on the PHQ-9 and GAD-7, indicating acceptable concurrent validity. The 17-item Chinese version of the SS-HIV has potential research and clinical utility in identifying important stressors among the Chinese HIV-infected population and in understanding the effects of stress on adjustment to HIV.
Comparison of trait and ability measures of emotional intelligence in medical students.

PubMed

Brannick, Michael T; Wahi, Monika M; Arce, Melissa; Johnson, Hazel-Anne; Nazian, Stanley; Goldin, Steven B

2009-11-01

Emotional intelligence (EI), the ability to perceive emotions in the self and others, and to understand, regulate and use such information in productive ways, is believed to be important in health care delivery for both recipients and providers of health care. There are two types of EI measure: ability and trait. Ability and trait measures differ in terms of both the definition of constructs and the methods of assessment. Ability measures conceive of EI as a capacity that spans the border between reason and feeling. Items on such a measure include showing a person a picture of a face and asking what emotion the pictured person is feeling; such items are scored by comparing the test-taker's response to a keyed emotion. Trait measures include a very large array of non-cognitive abilities related to success, such as self-control. Items on such measures ask individuals to rate themselves on such statements as: 'I generally know what other people are feeling.' Items are scored by giving higher scores to greater self-assessments. We compared one of each type of test with the other for evidence of reliability, convergence and overlap with personality. Year 1 and 2 medical students completed the Meyer-Salovey-Caruso Emotional Intelligence Test (MSCEIT, an ability measure), the Wong and Law Emotional Intelligence Scale (WLEIS, a trait measure) and an industry standard personality test (the Neuroticism-Extroversion-Openness [NEO] test). The MSCEIT showed problems with reliability. The MSCEIT and the WLEIS did not correlate highly with one another (overall scores correlated at 0.18). The WLEIS was more highly correlated with personality scales than the MSCEIT. Different tests that are supposed to measure EI do not measure the same thing. The ability measure was not correlated with personality, but the trait measure was correlated with personality.
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Anxiety Short Forms in Ethnically Diverse Groups

PubMed Central

Teresi, Jeanne A.; Ocepek-Welikson, Katja; Kleinman, Marjorie; Ramirez, Mildred; Kim, Giyeon

2017-01-01

This is the first study of the measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Anxiety short forms in a large ethnically diverse sample. The psychometric properties and differential item functioning (DIF) were examined across different racial/ethnic, educational, age, gender and language groups. Methods These data are from individuals selected from cancer registries in the United States. For the analyses of race/ethnicity the reference group was non-Hispanic Whites (n = 2,263), the studied groups were non-Hispanic Blacks (n = 1,117), Hispanics (n = 1,043) and Asians/Pacific Islanders (n = 907). Within the Hispanic subsample, there were 335 interviews conducted in Spanish and 703 in English. The 11 anxiety items were from the PROMIS emotional disturbance item bank. DIF hypotheses were generated by content experts who rated whether or not they expected DIF to be present, and the direction of the DIF with respect to several comparison groups. The primary method used for DIF detection was the Wald test for examination of group differences in item response theory (IRT) item parameters accompanied by magnitude measures. Expected item scores were examined as measures of magnitude. The method used for quantification of the difference in the average expected item scores was the non-compensatory DIF (NCDIF) index. DIF impact was examined using expected scale score functions. Additionally, precision and reliabilities were examined using several methods. Results Although not hypothesized to show DIF for Asians/Pacific Islanders, every item evidenced DIF by at least one method. Two items showed DIF of higher magnitude for Asians/Pacific Islanders vs. Whites: “Many situations made me worry” and “I felt anxious”. However, the magnitude of DIF was small and the NCDIF statistics were not above threshold. The impact of DIF was negligible. For education, six items were identified with consistent DIF across methods: fearful, anxious, worried, hard to focus, uneasy and tense. However, the NCDIF was not above threshold and the impact of DIF on the scale was trivial. No items showed high magnitude DIF for gender. Two items showed slightly higher magnitude for age (although not above the cutoff): worried and fearful. The scale level impact was trivial. Only one item showed DIF with the Wald test after the Bonferroni correction for the language comparisons: “I felt fearful”. Two additional items were flagged in sensitivity analyses after Bonferroni correction, anxious and many situations made me worry. The latter item also showed DIF of higher magnitude, with an NCDIF value (0.144) above threshold. Individual impact was relatively small. Conclusions Although many items from the PROMIS short form anxiety measures were flagged with DIF, item level magnitude was low and scale level DIF impact was minimal; however, three items: anxious, worried and many situations made me worry might be singled out for further study. It is concluded that the PROMIS Anxiety short form evidenced good psychometric properties, was relatively invariant across the groups studied, and performed well among ethnically diverse subgroups of Blacks, Hispanic, White non-Hispanic and Asians/Pacific Islanders. In general more research with the Asians/Pacific Islanders group is needed. Further study of subgroups within these broad categories is recommended. PMID:28649483
The Selection of Test Items for Decision Making with a Computer Adaptive Test.

ERIC Educational Resources Information Center

Spray, Judith A.; Reckase, Mark D.

The issue of test-item selection in support of decision making in adaptive testing is considered. The number of items needed to make a decision is compared for two approaches: selecting items from an item pool that are most informative at the decision point or selecting items that are most informative at the examinee's ability level. The first…
Modeling Collaborative Interaction Patterns in a Simulation-Based Task

ERIC Educational Resources Information Center

Andrews, Jessica J.; Kerr, Deirdre; Mislevy, Robert J.; von Davier, Alina; Hao, Jiangang; Liu, Lei

2017-01-01

Simulations and games offer interactive tasks that can elicit rich data, providing evidence of complex skills that are difficult to measure with more conventional items and tests. However, one notable challenge in using such technologies is making sense of the data generated in order to make claims about individuals or groups. This article…
Reduced Specificity of Hippocampal and Posterior Ventrolateral Prefrontal Activity during Relational Retrieval in Normal Aging

ERIC Educational Resources Information Center

Giovanello, Kelly S.; Schacter, Daniel L.

2012-01-01

Neuroimaging studies of episodic memory in young adults demonstrate greater functional neural activity in ventrolateral pFC and hippocampus during retrieval of relational information as compared with item information. We tested the hypothesis that healthy older adults--individuals who exhibit behavioral declines in relational memory--would show…
Technology and Assessment. In Brief: Fast Facts for Policy and Practice No. 5.

ERIC Educational Resources Information Center

Austin, James T.; Mahlman, Robert A.

The process of assessment in career and technical education (CTE) is changing significantly under the influence of forces such as emphasis on assessment for individual and program accountability; emphasis on the investigation of consequences of assessment; emergence of item response theory, which supports computer adaptive testing; and pressure…
Reliability and Validity Tests of Singelis's Self-Construal Scale (1994).

ERIC Educational Resources Information Center

Wang, Qi

Two studies focused on the reliability and validity of T.M. Singelis's 24-item Self-Construal Scale (SCS) (1994). In the first study, Cronbach alphas were calculated to assess the internal consistency of the reliability of the two subscales that were supposed to measure individuals' independent and interdependent self construals. The sample was…
Connotative Meaning of Disability Labels under Standard and Ambiguous Test Conditions.

ERIC Educational Resources Information Center

Semmel, Melvyn I.

At the George Peabody College for Teachers, Nashville, Tennessee, 50 male students responded to a questionnaire concerning their reactions to individuals having mental or physical disabilities, to persons of another race, and to gifted persons. The 20 questions (scale items) focused on association with 12 types of "disabled" persons (disability…
Development and psychometric evaluation of an information literacy self-efficacy survey and an information literacy knowledge test.

PubMed

Tepe, Rodger; Tepe, Chabha

2015-03-01

To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. In this test-retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. The IL self-efficacy survey demonstrated good reliability (test-retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test-retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments.
A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.

ERIC Educational Resources Information Center

Lau, C. Allen; Wang, Tianyou

This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…
An empirical comparison of knowledge and skill in the context of traditional ecological knowledge

PubMed Central

2013-01-01

Background We test whether traditional ecological knowledge (TEK) about how to make an item predicts a person’s skill at making it among the Tsimane’ (Bolivia). The rationale for this research is that the failure to distinguish between knowledge and skill might account for some of the conflicting results about the relationships between TEK, human health, and economic development. Methods We test the association between a commonly-used measure of individual knowledge (cultural consensus analysis) about how to make an arrow or a bag and a measure of individual skill at making these items, using ordinary least-squares regression. The study consists of 43 participants from 3 villages. Results We find no association between our measures of knowledge and skill (core model, p > 0.5, R 2 = .132). Conclusions While we cannot rule out the possibility of a real association between these phenomena, we interpret our findings as support for the claim that researchers should distinguish between methods to measure knowledge and skill when studying trends in TEK. PMID:24131733
Restricted Interests and Teacher Presentation of Items

ERIC Educational Resources Information Center

Stocco, Corey S.; Thompson, Rachel H.; Rodriguez, Nicole M.

2011-01-01

Restricted and repetitive behavior (RRB) is more pervasive, prevalent, frequent, and severe in individuals with autism spectrum disorders (ASDs) than in their typical peers. One subtype of RRB is restricted interests in items or activities, which is evident in the manner in which individuals engage with items (e.g., repetitious wheel spinning),…
Examining the Relationships among Item Recognition, Source Recognition, and Recall from an Individual Differences Perspective

ERIC Educational Resources Information Center

Unsworth, Nash; Brewer, Gene A.

2009-01-01

The authors of the current study examined the relationships among item-recognition, source-recognition, free recall, and other memory and cognitive ability tasks via an individual differences analysis. Two independent sources of variance contributed to item-recognition and source-recognition performance, and these two constructs related…
Ethical imperatives against item restriction in the Supplemental Nutrition Assistance Program.

PubMed

Chrisinger, Benjamin W

2017-07-01

The Supplemental Nutrition Assistance Program (SNAP, formerly known as food stamps) is the federal government's largest form of food assistance, and a frequent focus of political and scholarly debate. Previous discourse in the public health community and recent proposals in state legislatures have suggested limiting the use of SNAP benefits on unhealthy food items, such as sugar-sweetened beverages (SSBs). This paper identifies two possible underlying motivations for item restriction, health and morals, and analyzes the level of empirical support for claims about the current state of the program, as well as expectations about how item restriction would change participant outcomes. It also assesses how item restriction would reduce individual agency of low-income individuals, and identifies mechanisms by which this may adversely affect program participants. Finally, this paper offers alternative policies to promote healthier purchasing and eating among SNAP participants that can be pursued without reducing individual agency. Health advocates and officials must more fully weigh the attendant risks of implementing SNAP item restrictions, including the reduction of individual agency of a vulnerable population. Copyright © 2017 Elsevier Inc. All rights reserved.
A Process for Reviewing and Evaluating Generated Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2016-01-01

Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…
Bifactor and Item Response Theory Analyses of Interviewer Report Scales of Cognitive Impairment in Schizophrenia

PubMed Central

Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert

2011-01-01

We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
‘Forget me (not)?’ – Remembering Forget-Items Versus Un-Cued Items in Directed Forgetting

PubMed Central

Zwissler, Bastian; Schindler, Sebastian; Fischer, Helena; Plewnia, Christian; Kissler, Johanna M.

2015-01-01

Humans need to be able to selectively control their memories. This capability is often investigated in directed forgetting (DF) paradigms. In item-method DF, individual items are presented and each is followed by either a forget- or remember-instruction. On a surprise test of all items, memory is then worse for to-be-forgotten items (TBF) compared to to-be-remembered items (TBR). This is thought to result mainly from selective rehearsal of TBR, although inhibitory mechanisms also appear to be recruited by this paradigm. Here, we investigate whether the mnemonic consequences of a forget instruction differ from the ones of incidental encoding, where items are presented without a specific memory instruction. Four experiments were conducted where un-cued items (UI) were interspersed and recognition performance was compared between TBR, TBF, and UI stimuli. Accuracy was encouraged via a performance-dependent monetary bonus. Experiments varied the number of items and their presentation speed and used either letter-cues or symbolic cues. Across all experiments, including perceptually fully counterbalanced variants, memory accuracy for TBF was reduced compared to TBR, but better than for UI. Moreover, participants made consistently fewer false alarms and used a very conservative response criterion when responding to TBF stimuli. Thus, the F-cue results in active processing and reduces false alarm rate, but this does not impair recognition memory beyond an un-cued baseline condition, where only incidental encoding occurs. Theoretical implications of these findings are discussed. PMID:26635657

Understanding Health-related Quality of Life in Caregivers of Civilians and Service Members/Veterans with Traumatic Brain Injury: Establishing the Reliability and Validity of PROMIS Mental Health Measures.

PubMed

Carlozzi, Noelle E; Hanks, Robin; Lange, Rael T; Brickell D Psych, Tracey A; Ianni, Phillip A; Miner, Jennifer A; French Psy D, Louis M; Kallen, Michael A; Sander, Angelle M

2018-06-19

To provide important reliability and validity data to support the use of the PROMIS Mental Health measures in caregivers of civilians or service members/veterans with traumatic brain injury (TBI). Patient-reported outcomes surveys administered through an electronic data collection platform. Three TBI Model Systems rehabilitation hospitals, an academic medical center, and a military medical treatment facility. 560 caregivers of individuals with a documented TBI (344 civilians and 216 military) INTERVENTION: Not Applicable MAIN OUTCOME MEASURES: PROMIS Anxiety, Depression, and Anger Item Banks RESULTS: Internal consistency for all of the PROMIS Mental Health item banks was very good (all α > .86) and three-week test retest reliability was good to adequate (ranged from .65 to .85). Convergent validity and discriminant validity of the PROMIS measures was also supported. Caregivers of individuals that were low functioning had worse emotional HRQOL (as measured by the three PROMIS measures) than caregivers of high functioning individuals, supporting known groups validity. Finally, levels of distress, as measured by the PROMIS measures, were elevated for those caring for low-functioning individuals in both samples (rates ranged from 26.2% to 43.6% for caregivers of low-functioning individuals). Results support the reliability and validity of the PROMIS Anxiety, Depression, and Anger item banks in caregivers of civilians and service members/veterans with TBI. Ultimately, these measures can be used to provide a standardized assessment of HRQOL as it relates to mental health in these caregivers. Copyright © 2018. Published by Elsevier Inc.
What's in a Topic? Exploring the Interaction between Test-Taker Age and Item Content in High-Stakes Testing

ERIC Educational Resources Information Center

Banerjee, Jayanti; Papageorgiou, Spiros

2016-01-01

The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Free recall test experience potentiates strategy-driven effects of value on memory.

PubMed

Cohen, Michael S; Rissman, Jesse; Hovhannisyan, Mariam; Castel, Alan D; Knowlton, Barbara J

2017-10-01

People tend to show better memory for information that is deemed valuable or important. By one mechanism, individuals selectively engage deeper, semantic encoding strategies for high value items (Cohen, Rissman, Suthana, Castel, & Knowlton, 2014). By another mechanism, information paired with value or reward is automatically strengthened in memory via dopaminergic projections from midbrain to hippocampus (Shohamy & Adcock, 2010). We hypothesized that the latter mechanism would primarily enhance recollection-based memory, while the former mechanism would strengthen both recollection and familiarity. We also hypothesized that providing interspersed tests during study is a key to encouraging selective engagement of strategies. To test these hypotheses, we presented participants with sets of words, and each word was associated with a high or low point value. In some experiments, free recall tests were given after each list. In all experiments, a recognition test was administered 5 minutes after the final word list. Process dissociation was accomplished via remember/know judgments at recognition, a recall test probing both item memory and memory for a contextual detail (word plurality), and a task dissociation combining a recognition test for plurality (intended to probe recollection) with a speeded item recognition test (to probe familiarity). When recall tests were administered after study lists, high value strengthened both recollection and familiarity. When memory was not tested after each study list, but rather only at the end, value increased recollection but not familiarity. These dual process dissociations suggest that interspersed recall tests guide learners' use of metacognitive control to selectively apply effective encoding strategies. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Development and Validation of the Homeostasis Concept Inventory

PubMed Central

McFarland, Jenny L.; Price, Rebecca M.; Wenderoth, Mary Pat; Martinková, Patrícia; Cliff, William; Michael, Joel; Modell, Harold; Wright, Ann

2017-01-01

We present the Homeostasis Concept Inventory (HCI), a 20-item multiple-choice instrument that assesses how well undergraduates understand this critical physiological concept. We used an iterative process to develop a set of questions based on elements in the Homeostasis Concept Framework. This process involved faculty experts and undergraduate students from associate’s colleges, primarily undergraduate institutions, regional and research-intensive universities, and professional schools. Statistical results provided strong evidence for the validity and reliability of the HCI. We found that graduate students performed better than undergraduates, biology majors performed better than nonmajors, and students performed better after receiving instruction about homeostasis. We used differential item analysis to assess whether students from different genders, races/ethnicities, and English language status performed differently on individual items of the HCI. We found no evidence of differential item functioning, suggesting that the items do not incorporate cultural or gender biases that would impact students’ performance on the test. Instructors can use the HCI to guide their teaching and student learning of homeostasis, a core concept of physiology. PMID:28572177
Development of a rapid screening instrument for mild cognitive impairment and undiagnosed dementia.

PubMed

Steenland, N Kyle; Auman, Courtney M; Patel, Purvi M; Bartell, Scott M; Goldstein, Felicia C; Levey, Allan I; Lah, James J

2008-11-01

Mild cognitive impairment (MCI) often presages development of Alzheimer's disease (AD). We recently completed a cross-sectional study to test the hypothesis that a combination of a brief cognitive screening instrument (Mini-Cog) with a functional scale (Functional Activities Questionnaire; FAQ) would accurately identify individuals with MCI and undiagnosed dementia. The Mini-Cog consists of a clock drawing task and 3-item recall, and takes less than 5 minutes to administer. The FAQ is a 30-item questionnaire completed by an informant. In addition to the Mini-Cog and FAQ, a traditional cognitive test battery was administered, and two neurologists and a neuropsychologist determined a consensus diagnosis of Normal, MCI, or Dementia. A classification tree algorithm was used to pick optimal cutpoints, and, using these cutpoints, the combined Mini-Cog and FAQ (MC-FAQ) predicted the consensus diagnosis with an accuracy of 83% and a weighted kappa of 0.81. When the population was divided into Normal and Abnormal, the sensitivity, specificity and positive predictive value were 89%, 90%, and 95%, respectively. The MC-FAQ discriminates individuals with MCI from cognitively normal individuals and those with dementia, and its ease of administration makes it an attractive screening instrument to aid detection of cognitive impairment in the elderly.
Psychometric properties and feasibility of the Swedish version of the Philadelphia Geriatric Center Morale Scale.

PubMed

Niklasson, Johan; Conradsson, Mia; Hörnsten, Carl; Nyqvist, Fredrica; Padyab, Mojgan; Nygren, Björn; Olofsson, Birgitta; Lövheim, Hugo; Gustafson, Yngve

2015-11-01

Morale is related to psychological well-being and quality of life in older people. The Philadelphia Geriatric Center Morale Scale (PGCMS) is widely used to assess morale. The purpose of this study was to evaluate the psychometric properties and feasibility of the Swedish version of the 17-item PGCMS among very old people. The Umeå 85+/GERDA study included Swedish-speaking people aged 85, 90 and 95 years and older, from Sweden and Finland. Participants were interviewed in their own homes using a predefined set of questions. In the main sample, 493 individuals answered all 17 PGCMS items (aged 89.0 ± 4.3 years). Another 105 answered between 1 and 16 questions (aged 89.6 ± 4.4 years). A convenience sample was also collected, and 54 individuals answered all 17 PGCMS items twice (aged 84.7 ± 6.7 years). The same assessor restated the questions within 1 week. Cronbach's alpha was 0.74 among those who answered all 17 questions in the main sample. Confirmatory factor analysis was used to test the construct validity of the most widely used version of the PGCMS, with 17 items and three factors, and showed a generally good fit. Among those answering between 1 and 17 PGCMS questions, 92.6 % (554/598) answered 16 or 17. The convenience sample was used for intra-rater test-retesting, and the intraclass correlation coefficient (ICC) was 0.89. The least significant change between two assessments, with 95 % confidence interval, was 3.53 PGCMS points. The Swedish version of the PGCMS seems to have satisfactory psychometric properties and feasibility among very old people.
Item validity vs. item discrimination index: a redundancy?

NASA Astrophysics Data System (ADS)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.

ERIC Educational Resources Information Center

Benson, Jeri; Wilson, Michael

Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Development of the functional vision questionnaire for children and young people with visual impairment: the FVQ_CYP.

PubMed

Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S

2013-12-01

To develop a novel age-appropriate measure of functional vision (FV) for self-reporting by visually impaired (VI) children and young people. Questionnaire development. A representative patient sample of VI children and young people aged 10 to 15 years, visual acuity of the logarithm of the minimum angle of resolution (logMAR) worse than 0.48, and a school-based (nonrandom) expert group sample of VI students aged 12 to 17 years. A total of 32 qualitative semistructured interviews supplemented by narrative feedback from 15 eligible VI children and young people were used to generate draft instrument items. Seventeen VI students were consulted individually on item relevance and comprehensibility, instrument instructions, format, and administration methods. The resulting draft instrument was piloted with 101 VI children and young people comprising a nationally representative sample, drawn from 21 hospitals in the United Kingdom. Initial item reduction was informed by presence of missing data and individual item response pattern. Exploratory factor analysis (FA) and parallel analysis (PA), and Rasch analysis (RA) were applied to test the instrument's psychometric properties. Psychometric indices and validity assessment of the Functional Vision Questionnaire for Children and Young People (FVQ_CYP). A total of 712 qualitative statements became a 56-item draft scale, capturing the level of difficulty in performing vision-dependent activities. After piloting, items were removed iteratively as follows: 11 for high percentage of missing data, 4 for skewness, and 1 for inadequate item infit and outfit values in RA, 3 having shown differential item functioning across age groups and 1 across gender in RA. The remaining 36 items showed item fit values within acceptable limits, good measurement precision and targeting, and ordered response categories. The reduced scale has a clear unidimensional structure, with all items having a high factor loading on the single factor in FA and PA. The summary scores correlated significantly with visual acuity. We have developed a novel, psychometrically robust self-report questionnaire for children and young people-the FVQ_CYP-that captures the functional impact of visual disability from their perspective. The 36-item, 4-point unidimensional scale has potential as a complementary adjunct to objective clinical assessments in routine pediatric ophthalmology practice and in research. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

ERIC Educational Resources Information Center

Sahin, Alper; Anil, Duygu

2017-01-01

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
[Perceptions on item disclosure for the Korean medical licensing examination].

PubMed

Yang, Eunbae B

2015-09-01

This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
The Cambridge Face Memory Test: results for neurologically intact individuals and an investigation of its validity using inverted face stimuli and prosopagnosic participants.

PubMed

Duchaine, Brad; Nakayama, Ken

2006-01-01

The two standardized tests of face recognition that are widely used suffer from serious shortcomings [Duchaine, B. & Weidenfeld, A. (2003). An evaluation of two commonly used tests of unfamiliar face recognition. Neuropsychologia, 41, 713-720; Duchaine, B. & Nakayama, K. (2004). Developmental prosopagnosia and the Benton Facial Recognition Test. Neurology, 62, 1219-1220]. Images in the Warrington Recognition Memory for Faces test include substantial non-facial information, and the simultaneous presentation of faces in the Benton Facial Recognition Test allows feature matching. Here, we present results from a new test, the Cambridge Face Memory Test, which builds on the strengths of the previous tests. In the test, participants are introduced to six target faces, and then they are tested with forced choice items consisting of three faces, one of which is a target. For each target face, three test items contain views identical to those studied in the introduction, five present novel views, and four present novel views with noise. There are a total of 72 items, and 50 controls averaged 58. To determine whether the test requires the special mechanisms used to recognize upright faces, we conducted two experiments. We predicted that controls would perform much more poorly when the face images are inverted, and as predicted, inverted performance was much worse with a mean of 42. Next we assessed whether eight prosopagnosics would perform poorly on the upright version. The prosopagnosic mean was 37, and six prosopagnosics scored outside the normal range. In contrast, the Warrington test and the Benton test failed to classify a majority of the prosopagnosics as impaired. These results indicate that the new test effectively assesses face recognition across a wide range of abilities.
Predictive validity of the Work Ability Index and its individual items in the general population.

PubMed

Lundin, Andreas; Leijon, Ola; Vaez, Marjan; Hallgren, Mats; Torgén, Margareta

2017-06-01

This study assesses the predictive ability of the full Work Ability Index (WAI) as well as its individual items in the general population. The Work, Health and Retirement Study (WHRS) is a stratified random national sample of 25-75-year-olds living in Sweden in 2000 that received a postal questionnaire ( n = 6637, response rate = 53%). Current and subsequent sickness absence was obtained from registers. The ability of the WAI to predict long-term sickness absence (LTSA; ⩾ 90 consecutive days) during a period of four years was analysed by logistic regression, from which the Area Under the Receiver Operating Characteristic curve (AUC) was computed. There were 313 incident LTSA cases among 1786 employed individuals. The full WAI had acceptable ability to predict LTSA during the 4-year follow-up (AUC = 0.79; 95% CI 0.76 to 0.82). Individual items were less stable in their predictive ability. However, three of the individual items: current work ability compared with lifetime best, estimated work impairment due to diseases, and number of diagnosed current diseases, exceeded AUC > 0.70. Excluding the WAI item on number of days on sickness absence did not result in an inferior predictive ability of the WAI. The full WAI has acceptable predictive validity, and is superior to its individual items. For public health surveys, three items may be suitable proxies of the full WAI; current work ability compared with lifetime best, estimated work impairment due to diseases, and number of current diseases diagnosed by a physician.
Working memory and inhibitory control across the life span: Intrusion errors in the Reading Span Test.

PubMed

Robert, Christelle; Borella, Erika; Fagot, Delphine; Lecerf, Thierry; de Ribaupierre, Anik

2009-04-01

The aim of this study was to examine to what extent inhibitory control and working memory capacity are related across the life span. Intrusion errors committed by children and younger and older adults were investigated in two versions of the Reading Span Test. In Experiment 1, a mixed Reading Span Test with items of various list lengths was administered. Older adults and children recalled fewer correct words and produced more intrusions than did young adults. Also, age-related differences were found in the type of intrusions committed. In Experiment 2, an adaptive Reading Span Test was administered, in which the list length of items was adapted to each individual's working memory capacity. Age groups differed neither on correct recall nor on the rate of intrusions, but they differed on the type of intrusions. Altogether, these findings indicate that the availability of attentional resources influences the efficiency of inhibition across the life span.
Do people have insight into their face recognition abilities?

PubMed

Palermo, Romina; Rossion, Bruno; Rhodes, Gillian; Laguesse, Renaud; Tez, Tolga; Hall, Bronwyn; Albonico, Andrea; Malaspina, Manuela; Daini, Roberta; Irons, Jessica; Al-Janabi, Shahd; Taylor, Libby C; Rivolta, Davide; McKone, Elinor

2017-02-01

Diagnosis of developmental or congenital prosopagnosia (CP) involves self-report of everyday face recognition difficulties, which are corroborated with poor performance on behavioural tests. This approach requires accurate self-evaluation. We examine the extent to which typical adults have insight into their face recognition abilities across four experiments involving nearly 300 participants. The experiments used five tests of face recognition ability: two that tap into the ability to learn and recognize previously unfamiliar faces [the Cambridge Face Memory Test, CFMT; Duchaine, B., & Nakayama, K. (2006). The Cambridge Face Memory Test: Results for neurologically intact individuals and an investigation of its validity using inverted face stimuli and prosopagnosic participants. Neuropsychologia, 44(4), 576-585. doi:10.1016/j.neuropsychologia.2005.07.001; and a newly devised test based on the CFMT but where the study phases involve watching short movies rather than viewing static faces-the CFMT-Films] and three that tap face matching [Benton Facial Recognition Test, BFRT; Benton, A., Sivan, A., Hamsher, K., Varney, N., & Spreen, O. (1983). Contribution to neuropsychological assessment. New York: Oxford University Press; and two recently devised sequential face matching tests]. Self-reported ability was measured with the 15-item Kennerknecht et al. questionnaire [Kennerknecht, I., Ho, N. Y., & Wong, V. C. (2008). Prevalence of hereditary prosopagnosia (HPA) in Hong Kong Chinese population. American Journal of Medical Genetics Part A, 146A(22), 2863-2870. doi:10.1002/ajmg.a.32552]; two single-item questions assessing face recognition ability; and a new 77-item meta-cognition questionnaire. Overall, we find that adults with typical face recognition abilities have only modest insight into their ability to recognize faces on behavioural tests. In a fifth experiment, we assess self-reported face recognition ability in people with CP and find that some people who expect to perform poorly on behavioural tests of face recognition do indeed perform poorly. However, it is not yet clear whether individuals within this group of poor performers have greater levels of insight (i.e., into their degree of impairment) than those with more typical levels of performance.
Research applications for an Object and Action Naming Battery to assess naming skills in adult Spanish-English bilingual speakers.

PubMed

Edmonds, Lisa A; Donovan, Neila J

2014-06-01

Virtually no valid materials are available to evaluate confrontation naming in Spanish-English bilingual adults in the U.S. In a recent study, a large group of young Spanish-English bilingual adults were evaluated on An Object and Action Naming Battery (Edmonds & Donovan in Journal of Speech, Language, and Hearing Research 55:359-381, 2012). Rasch analyses of the responses resulted in evidence for the content and construct validity of the retained items. However, the scope of that study did not allow for extensive examination of individual item characteristics, group analyses of participants, or the provision of testing and scoring materials or raw data, thereby limiting the ability of researchers to administer the test to Spanish-English bilinguals and to score the items with confidence. In this study, we present the in-depth information described above on the basis of further analyses, including (1) online searchable spreadsheets with extensive empirical (e.g., accuracy and name agreeability) and psycholinguistic item statistics; (2) answer sheets and instructions for scoring and interpreting the responses to the Rasch items; (3) tables of alternative correct responses for English and Spanish; (4) ability strata determined for all naming conditions (English and Spanish nouns and verbs); and (5) comparisons of accuracy across proficiency groups (i.e., Spanish dominant, English dominant, and balanced). These data indicate that the Rasch items from An Object and Action Naming Battery are valid and sensitive for the evaluation of naming in young Spanish-English bilingual adults. Additional information based on participant responses for all of the items on the battery can provide researchers with valuable information to aid in stimulus development and response interpretation for experimental studies in this population.
A Review of Classical Methods of Item Analysis.

ERIC Educational Resources Information Center

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Modeling Item-Position Effects within an IRT Framework

ERIC Educational Resources Information Center

Debeer, Dries; Janssen, Rianne

2013-01-01

Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…

Kindergarten Predictors of Math Learning Disability

PubMed Central

Mazzocco, Michèle M. M.; Thompson, Richard E.

2009-01-01

The aim of the present study was to address how to effectively predict mathematics learning disability (MLD). Specifically, we addressed whether cognitive data obtained during kindergarten can effectively predict which children will have MLD in third grade, whether an abbreviated test battery could be as effective as a standard psychoeducational assessment at predicting MLD, and whether the abbreviated battery corresponded to the literature on MLD characteristics. Participants were 226 children who enrolled in a 4-year prospective longitudinal study during kindergarten. We administered measures of mathematics achievement, formal and informal mathematics ability, visual-spatial reasoning, and rapid automatized naming and examined which test scores and test items from kindergarten best predicted MLD at grades 2 and 3. Statistical models using standardized scores from the entire test battery correctly classified ~80–83 percent of the participants as having, or not having, MLD. Regression models using scores from only individual test items were less predictive than models containing the standard scores, except for models using a specific subset of test items that dealt with reading numerals, number constancy, magnitude judgments of one-digit numbers, or mental addition of one-digit numbers. These models were as accurate in predicting MLD as was the model including the entire set of standard scores from the battery of tests examined. Our findings indicate that it is possible to effectively predict which kindergartners are at risk for MLD, and thus the findings have implications for early screening of MLD. PMID:20084182
Using the Patient Health Questionnaire-9 to measure depression among racially and ethnically diverse primary care patients.

PubMed

Huang, Frederick Y; Chung, Henry; Kroenke, Kurt; Delucchi, Kevin L; Spitzer, Robert L

2006-06-01

The Patient Health Questionnaire depression scale (PHQ-9) is a well-validated, Diagnostic and Statistical Manual of Mental Disorders- Fourth Edition (DSM-IV) criterion-based measure for diagnosing depression, assessing severity and monitoring treatment response. The performance of most depression scales including the PHQ-9, however, has not been rigorously evaluated in different racial/ethnic populations. Therefore, we compared the factor structure of the PHQ-9 between different racial/ethnic groups as well as the rates of endorsement and differential item functioning (DIF) of the 9 items of the PHQ-9. The presence of DIF would indicate that responses to an individual item differ significantly between groups, controlling for the level of depression. A combined dataset from 2 separate studies of 5,053 primary care patients including non-Hispanic white (n=2,520), African American (n=598), Chinese American (n=941), and Latino (n=974) patients was used for our analysis. Exploratory principal components factor analysis was used to derive the factor structure of the PHQ-9 in each of the 4 racial/ethnic groups. A generalized Mantel-Haenszel statistic was used to test for DIF. One main factor that included all PHQ-9 items was found in each racial/ethnic group with alpha coefficients ranging from 0.79 to 0.89. Although endorsement rates of individual items were generally similar among the 4 groups, evidence of DIF was found for some items. Our analyses indicate that in African American, Chinese American, Latino, and non-Hispanic white patient groups the PHQ-9 measures a common concept of depression and can be effective for the detection and monitoring of depression in these diverse populations.
Decision analysis for a data collection system of patient-controlled analgesia with a multi-attribute utility model.

PubMed

Lee, I-Jung; Huang, Shih-Yu; Tsou, Mei-Yung; Chan, Kwok-Hon; Chang, Kuang-Yi

2010-10-01

Data collection systems are very important for the practice of patient-controlled analgesia (PCA). This study aimed to evaluate 3 PCA data collection systems and selected the most favorable system with the aid of multiattribute utility (MAU) theory. We developed a questionnaire with 10 items to evaluate the PCA data collection system and 1 item for overall satisfaction based on MAU theory. Three systems were compared in the questionnaire, including a paper record, optic card reader and personal digital assistant (PDA). A pilot study demonstrated a good internal and test-retest reliability of the questionnaire. A weighted utility score combining the relative importance of individual items assigned by each participant and their responses to each question was calculated for each system. Sensitivity analyses with distinct weighting protocols were conducted to evaluate the stability of the final results. Thirty potential users of a PCA data collection system were recruited in the study. The item "easy to use" had the highest median rank and received the heaviest mean weight among all items. MAU analysis showed that the PDA system had a higher utility score than that in the other 2 systems. Sensitivity analyses revealed that both inverse and reciprocal weighting processes favored the PDA system. High correlations between overall satisfaction and MAU scores from miscellaneous weighting protocols suggested a good predictive validity of our MAU-based questionnaire. The PDA system was selected as the most favorable PCA data collection system by the MAU analysis. The item "easy to use" was the most important attribute of the PCA data collection system. MAU theory can evaluate alternatives by taking into account individual preferences of stakeholders and aid in better decision-making. Copyright © 2010 Elsevier. Published by Elsevier B.V. All rights reserved.
Music lessons are associated with increased verbal memory in individuals with Williams syndrome.

PubMed

Dunning, Brittany A; Martens, Marilee A; Jungers, Melissa K

2014-11-16

Williams syndrome (WS) is a genetic disorder characterized by intellectual delay and an affinity for music. It has been previously shown that familiar music can enhance verbal memory in individuals with WS who have had music training. There is also evidence that unfamiliar, or novel, music may also improve cognitive recall. This study was designed to examine if a novel melody could also enhance verbal memory in individuals with WS, and to more fully characterize music training in this population. We presented spoken or sung sentences that described an animal and its group name to 44 individuals with WS, and then tested their immediate and delayed memory using both recall and multiple choice formats. Those with formal music training (average duration of training 4½ years) scored significantly higher on both the spoken and sung recall items, as well as on the spoken multiple choice items, than those with no music training. Music therapy, music enjoyment, age, and Verbal IQ did not impact performance on the memory tasks. These findings provide further evidence that formal music lessons may impact the neurological pathways associated with verbal memory in individuals with WS, consistent with findings in typically developing individuals. Copyright © 2014 Elsevier Ltd. All rights reserved.
How social interactions affect emotional memory accuracy: Evidence from collaborative retrieval and social contagion paradigms.

PubMed

Kensinger, Elizabeth A; Choi, Hae-Yoon; Murray, Brendan D; Rajaram, Suparna

2016-07-01

In daily life, emotional events are often discussed with others. The influence of these social interactions on the veracity of emotional memories has rarely been investigated. The authors (Choi, Kensinger, & Rajaram Memory and Cognition, 41, 403-415, 2013) previously demonstrated that when the categorical relatedness of information is controlled, emotional items are more accurately remembered than neutral items. The present study examined whether emotion would continue to improve the accuracy of memory when individuals discussed the emotional and neutral events with others. Two different paradigms involving social influences were used to investigate this question and compare evidence. In both paradigms, participants studied stimuli that were grouped into conceptual categories of positive (e.g., celebration), negative (e.g., funeral), or neutral (e.g., astronomy) valence. After a 48-hour delay, recognition memory was tested for studied items and categorically related lures. In the first paradigm, recognition accuracy was compared when memory was tested individually or in a collaborative triad. In the second paradigm, recognition accuracy was compared when a prior retrieval session had occurred individually or with a confederate who supplied categorically related lures. In both of these paradigms, emotional stimuli were remembered more accurately than were neutral stimuli, and this pattern was preserved when social interaction occurred. In fact, in the first paradigm, there was a trend for collaboration to increase the beneficial effect of emotion on memory accuracy, and in the second paradigm, emotional lures were significantly less susceptible to the "social contagion" effect. Together, these results demonstrate that emotional memories can be more accurate than nonemotional ones even when events are discussed with others (Experiment 1) and even when that discussion introduces misinformation (Experiment 2).
Item response theory analysis of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised in the Pooled Resource Open-Access ALS Clinical Trials Database.

PubMed

Bacci, Elizabeth D; Staniewska, Dorota; Coyne, Karin S; Boyer, Stacey; White, Leigh Ann; Zach, Neta; Cedarbaum, Jesse M

2016-01-01

Our objective was to examine dimensionality and item-level performance of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) across time using classical and modern test theory approaches. Confirmatory factor analysis (CFA) and Item Response Theory (IRT) analyses were conducted using data from patients with amyotrophic lateral sclerosis (ALS) Pooled Resources Open-Access ALS Clinical Trials (PRO-ACT) database with complete ALSFRS-R data (n = 888) at three time-points (Time 0, Time 1 (6-months), Time 2 (1-year)). Results demonstrated that in this population of 888 patients, mean age was 54.6 years, 64.4% were male, and 93.7% were Caucasian. The CFA supported a 4* individual-domain structure (bulbar, gross motor, fine motor, and respiratory domains). IRT analysis within each domain revealed misfitting items and overlapping item response category thresholds at all time-points, particularly in the gross motor and respiratory domain items. Results indicate that many of the items of the ALSFRS-R may sub-optimally distinguish among varying levels of disability assessed by each domain, particularly in patients with less severe disability. Measure performance improved across time as patient disability severity increased. In conclusion, modifications to select ALSFRS-R items may improve the instrument's specificity to disability level and sensitivity to treatment effects.
Self-assessed efficacy of a clinical musculoskeletal anatomy workshop: A preliminary survey.

PubMed

Saavedra, Miguel Ángel; Navarro-Zarza, José E; Alvarez-Nemegyei, José; Canoso, Juan J; Kalish, Robert A; Villaseñor-Ovies, Pablo; Hernández-Díaz, Cristina

2015-01-01

To survey the efficacy of a practical workshop on clinical musculoskeletal anatomy held in five American countries. A self-assessment competence questionnaire sent to participants 1-3 months after the workshop. Results were compared to the results of a practical, instructor-assessed, pre-workshop test. The response rate of participants was 76.4%. The overall, self-assessed competence score for anatomical items that had been included in the pre-test was 76.9 (scale 0-100) as compared to an overall score of 48.1 in the practical, pre-workshop test (p<0.001). For items that were addressed in the workshop, but not included in the pre-test, self-assessed competence was rated at 62.9. Differences in anatomical knowledge between individuals from different countries and professional groups noted in the practical pre-test were no longer present in the post-test self-assessment. From this preliminary data and supporting evidence from the literature we believe that our anatomy workshop provides an effective didactic tool for increasing competence in musculoskeletal anatomy. Copyright © 2014 Elsevier España, S.L.U. All rights reserved.
Translation Fidelity of Psychological Scales: An Item Response Theory Analysis of an Individualism-Collectivism Scale.

ERIC Educational Resources Information Center

Bontempo, Robert

1993-01-01

Describes a method for assessing the quality of translations based on item response theory (IRT). Results from the IRT technique with French and Chinese versions of a scale measuring individualism-collectivism for samples of 250 U.S., 357 French, and 290 Chinese undergraduates show how several biased items are detected. (SLD)
Assessment of Work Performance (AWP)--development of an instrument.

PubMed

Sandqvist, Jan L; Törnquist, Kristina B; Henriksson, Chris M

2006-01-01

Adequate work assessments are a matter of importance both for individuals and society [5,29,31,38,40,46,52]. However, there is a lack of adequate and reliable instruments for use in work rehabilitation [14,15,20,21,31,44]. The purpose of this study was to develop and evaluate an observation instrument for assessing work performance, the AWP (Assessment of Work Performance). The purpose of the 14-item instrument is to assess the individual's observable working skills in three different areas: motor skills, process skills, and communication and interaction skills. This article describes the development and results of preliminary testing of the AWP. The testing indicates a satisfactory face validity and utility for the AWP and supports further research and testing of the instrument.
Rasch analysis of the Italian Lower Extremity Functional Scale: insights on dimensionality and suggestions for an improved 15-item version.

PubMed

Bravini, Elisabetta; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano

2017-04-01

To investigate dimensionality and the measurement properties of the Italian Lower Extremity Functional Scale using both classical test theory and Rasch analysis methods, and to provide insights for an improved version of the questionnaire. Rasch analysis of individual patient data. Rehabilitation centre. A total of 135 patients with musculoskeletal diseases of the lower limb. Patients were assessed with the Lower Extremity Functional Scale before and after the rehabilitation. Rasch analysis showed some problems related to rating scale category functioning, items fit, and items redundancy. After an iterative process, which resulted in the reduction of rating scale categories from 5 to 4, and in the deletion of 5 items, the psychometric properties of the Italian Lower Extremity Functional Scale improved. The retained 15 items with a 4-level response format fitted the Rasch model (internal construct validity), and demonstrated unidimensionality and good reliability indices (person-separation reliability 0.92; Cronbach's alpha 0.94). Then, the analysis showed differential item functioning for six of the retained items. The sensitivity to change of the Italian 15-item Lower Extremity Functional Scale was nearly equal to the one of the original version (effect size: 0.93 and 0.98; standardized response mean: 1.20 and 1.28, respectively for the 15-item and 20-item versions). The Italian Lower Extremity Functional Scale had unsatisfactory measurement properties. However, removing five items and simplifying the scoring from 5 to 4 levels resulted in a more valid measure with good reliability and sensitivity to change.
Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
When what we need influences what we see: choice of energetic replenishment is linked with perceived steepness.

PubMed

Taylor-Covill, Guy A H; Eves, Frank F

2014-06-01

The apparent steepness of the locomotor challenge presented by hills and staircases is overestimated in explicit awareness. Experimental evidence suggests the visual system may rescale our conscious experience of steepness in line with available energy resources. Skeptics of this "embodied" view argue that such findings reflect experimental demand. This article tested whether perceived steepness was related to resource choices in the built environment. Travelers in a station estimated the slant angle of a 6.45 m staircase (23.4°) either before (N = 302) or after (N = 109) choosing from a selection of consumable items containing differing levels of energetic resources. Participants unknowingly allocated themselves to a quasi-experimental group based on the energetic resources provided by the item they chose. Consistent with a resource based model, individuals that chose items with a greater energy density, or more rapidly available energy, estimated the staircase as steeper than those opting for items that provided less energetic resources. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Age-Related Differences in Recognition Memory for Items and Associations: Contribution of Individual Differences in Working Memory and Metamemory

PubMed Central

Bender, Andrew R.; Raz, Naftali

2012-01-01

Ability to form new associations between unrelated items is particularly sensitive to aging, but the reasons for such differential vulnerability are unclear. In this study, we examined the role of objective and subjective factors (working memory and beliefs about memory strategies) on differential relations of age with recognition of items and associations. Healthy adults (N = 100, age 21 to 79) studied word pairs, completed item and association recognition tests, and rated the effectiveness of shallow (e.g., repetition) and deep (e.g., imagery or sentence generation) encoding strategies. Advanced age was associated with reduced working memory (WM) capacity and poorer associative recognition. In addition, reduced WM capacity, beliefs in the utility of ineffective encoding strategies, and lack of endorsement of effective ones were independently associated with impaired associative memory. Thus, maladaptive beliefs about memory in conjunction with reduced cognitive resources account in part for differences in associative memory commonly attributed to aging. PMID:22251381
Failure of self-consistency in the discrete resource model of visual working memory.

PubMed

Bays, Paul M

2018-06-03

The discrete resource model of working memory proposes that each individual has a fixed upper limit on the number of items they can store at one time, due to division of memory into a few independent "slots". According to this model, responses on short-term memory tasks consist of a mixture of noisy recall (when the tested item is in memory) and random guessing (when the item is not in memory). This provides two opportunities to estimate capacity for each observer: first, based on their frequency of random guesses, and second, based on the set size at which the variability of stored items reaches a plateau. The discrete resource model makes the simple prediction that these two estimates will coincide. Data from eight published visual working memory experiments provide strong evidence against such a correspondence. These results present a challenge for discrete models of working memory that impose a fixed capacity limit. Copyright © 2018 The Author. Published by Elsevier Inc. All rights reserved.
Development and testing of the KERNset: an instrument to assess the quality of telephone triage in out-of-hours primary care services.

PubMed

Smits, Marleen; Keizer, Ellen; Ram, Paul; Giesen, Paul

2017-12-02

Telephone triage is a core but vulnerable part of the care process at out-of-hours general practitioner (GP) cooperatives. In the Netherlands, different instruments have been used for assessing the quality of telephone triage. These instruments focussed mainly on communicational aspects, and less on the medical quality of triage decisions. Our aim was to develop and test a minimum set of items to assess the quality of telephone triage. A national survey among all GP cooperatives in the Netherlands was performed to examine the most important aspects of telephone triage. Next, corresponding items from existing instruments were searched on these topics. Subsequently, an expert panel judged these items on importance, completeness and formulation. The concept KERNset consisted of 24 items about the telephone conversation: 13 medical, ten communicational and one regarding both types. It was pilot tested on measurement characteristics, reliability, validity and variation between triagists. In this pilot study, 114 anonymous calls from four GP cooperatives spread across the Netherlands were judged by three out of eight raters, both internal and external raters. Cronbach's alpha was .94 for the medical items and .75 for the communicational items. Inter-rater reliability: complete agreement between the external raters was 45% and reasonable agreement 73% (difference of maximally one point on the five-point scale). Intra-rater reliability: complete agreement within raters was 55% and reasonable agreement 84%. There were hardly any differences between internal and external raters, but there were differences in strictness between individual raters. The construct validity was confirmed by the high correlation between the general impression of the call and the items of the KERNset. Of the differences within items 19% could be explained by differences between triage nurses, which means the KERNset is able to demonstrate differences between triage nurses. The KERNset can be used to assess the quality of telephone triage. The validity is good and differences between calls and between triage nurses can be measured. A more intensive training for raters could improve the reliability.
The emotion dysregulation inventory: Psychometric properties and item response theory calibration in an autism spectrum disorder sample.

PubMed

Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A

2018-06-01

Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018, 11: 928-941. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical techniques were applied to identify the best final items. The EDI is unique because it captures common emotional problems in ASD and is appropriate for both nonverbal and verbal youth. It is an efficient and sensitive measure for use in clinical assessments, monitoring, and research with youth with ASD. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
A secondstep in development of a checklist for screening risk for violence in acute psychiatric patients: evaluation of interrater reliability of the Preliminary Scheme 33.

PubMed

Bjørkly, Stål; Moger, Tron A

2007-12-01

The Acute Project is a research project conducted on acute psychiatric admission wards in Norway. The objective is to develop and validate a structured, easy-to-use screening checklist for assessment of risk for violence in patients both during their stay in the ward and after discharge. The Preliminary Scheme 33 is a 33-item screening checklist with content domain inspired by the Historical-Clinical-Risk Management Scheme (HCR-20), the Brøset Violence Checklist, and eight risk factors extracted from the literature on risk assessment. The Preliminary Scheme 33 was designed and tested in two steps by a research group which includes the authors. The common aim of both steps was to develop this into a time economical, reliable, and valid checklist. In the first step in 2006 the predictive validity of the individual items was tested. The present work presents results from the second step, a study conducted to assess the interrater reliability of the 33 items. Eight clinicians working in an acute psychiatric unit volunteered to be raters and were trained to score the 33 items on a three-point scale in relation to 15 clinical vignettes, which contained information from 15 acute psychiatric patients' files. Analysis showed high interrater reliability for the total score with an intraclass correlation coefficient (ICC) of .86 (95% CI: 0.74-0.94). However, a substantial proportion of the items had medium to low ICCs. Consequences of this finding for further development of these items into a brief screen are discussed.
Development and validation of a measure of workplace climate for healthy weight maintenance.

PubMed

Sliter, Katherine A

2013-07-01

Due to the obesity epidemic, an increasing amount of research is being conducted to better understand the antecedents and consequences of excess employee weight. One construct often of interest to researchers in this area is organizational climate. Unfortunately, a viable measure of climate, as related to employee weight, does not exist. The purpose of this study was to remedy this by developing and validating a concise, psychometrically sound measure of climate for healthy weight. An item pool was developed based on surveys of full-time employees, and a sorting task was used to eliminate ambiguous items. Items were pilot tested by a sample of 338 full-time employees, and the item pool was reduced through item response theory (IRT) and reliability analyses. Finally, the retained 14 items, comprising 3 subscales, were completed by a sample of 360 full-time employees, representing 26 different organizations from across the United States. Multilevel modeling indicated that sufficient variance was explained by group membership to support aggregation, and confirmatory factor analysis (CFA) supported the hypothesized model of 3 subscale factors and an overall climate factor. Nine hypotheses specific to construct validation were tested. Scores on the new scale correlated significantly with individual-level reports of psychological constructs (e.g., health motivation, general leadership support for health) and physiological phenomena (e.g., body mass index [BMI], physical health problems) to which they should theoretically relate, supporting construct validity. Implications for the use of this scale in both applied and research settings are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.

Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests

ERIC Educational Resources Information Center

van der Linden, Wim J.; Ariel, Adelaide; Veldkamp, Bernard P.

2006-01-01

Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

ERIC Educational Resources Information Center

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

2016-01-01

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Item Specifications, Science Grade 8. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Item Specifications, Science Grade 6. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Criterion-Referenced Test Items for Welding.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…
77 FR 48533 - Notice of Intent To Repatriate Cultural Items: U.S. Department of the Interior, National Park...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-08-14

... Indian tribes, has determined that the cultural items meet the definition of sacred objects and... individuals who believe they are lineal descendants of the individual who owned these sacred objects and who... individuals who believe they are lineal descendants of the individual who owned these sacred objects and who...
Speeded Old-New Recognition of Multidimensional Perceptual Stimuli: Modeling Performance at the Individual-Participant and Individual-Item Levels

ERIC Educational Resources Information Center

Nosofsky, Robert M.; Stanton, Roger D.

2006-01-01

Observers made speeded old-new recognition judgments of color stimuli embedded in a multidimensional similarity space. The paradigm used multiple lists but with the underlying similarity structures repeated across lists, to allow for quantitative modeling of the data at the individual-participant and individual-item levels. Correct rejection…
Decomposing the interaction between retention interval and study/test practice: The role of retrievability

PubMed Central

Jang, Yoonhee; Wixted, John T.; Pecher, Diane; Zeelenberg, René; Huber, David E.

2012-01-01

Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially non-retrievable items. In two experiments, an initial test determined item retrievability. Retrievable or non-retrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical crossover interaction between retention interval and practice type. For retrievable items, however, the crossover interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For non-retrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially non-retrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and non-retrievable items. PMID:22304454
Decomposing the interaction between retention interval and study/test practice: the role of retrievability.

PubMed

Jang, Yoonhee; Wixted, John T; Pecher, Diane; Zeelenberg, René; Huber, David E

2012-01-01

Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially nonretrievable items. In two experiments, an initial test determined item retrievability. Retrievable or nonretrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical cross-over interaction between retention interval and practice type. For retrievable items, however, the cross-over interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For nonretrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially nonretrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and nonretrievable items.
Optimal Test Design with Rule-Based Item Generation

ERIC Educational Resources Information Center

Geerlings, Hanneke; van der Linden, Wim J.; Glas, Cees A. W.

2013-01-01

Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining…
Assessment of awareness of connectedness as a culturally-based protective factor for Alaska native youth.

PubMed

Mohatt, Nathaniel V; Fok, Carlotta Ching Ting; Burket, Rebekah; Henry, David; Allen, James

2011-10-01

Research with Native Americans has identified connectedness as a culturally based protective factor against substance abuse and suicide. Connectedness refers to the interrelated welfare of the individual, one's family, one's community, and the natural environment. We developed an 18-item quantitative assessment of awareness of connectedness and tested it with 284 Alaska Native youth. Evaluation with confirmatory factor analysis and item response theory identified a 12-item subset that functions satisfactorily in a second-order four-factor model. The proposed Awareness of Connectedness Scale (ACS) displays good convergent and discriminant validity, and correlates positively with hypothesized protective factors such as reasons for living and communal mastery. The measure has utility in the study of culture-specific protective factors and as an outcomes measure for behavioral health programs with Native American youth.
Assessment of Awareness of Connectedness as a Culturally-based Protective Factor for Alaska Native Youth

PubMed Central

Mohatt, Nathaniel V.; Fok, Carlotta Ching Ting; Burket, Rebekah; Henry, David; Allen, James

2011-01-01

Research with Native Americans has identified connectedness as a culturally-based protective factor against substance abuse and suicide. Connectedness refers to the interrelated welfare of the individual, one’s family, one’s community, and the natural environment. We developed an 18-item quantitative assessment of awareness of connectedness and tested it with 284 Alaska Native youth. Evaluation with confirmatory factor analysis and item response theory identified a 12-item subset that functions satisfactorily in a second-order, four-factor model. The proposed Awareness of Connectedness Scale displays good convergent and discriminant validity and correlates positively with hypothesized protective factors such as reasons for living and communal mastery. The measure has utility in the study of culture-specific protective factors and as an outcomes measure for behavioral health programs with Native American youth. PMID:21988583
Developing Items to Measure Theory of Planned Behavior Constructs for Opioid Administration for Children: Pilot Testing.

PubMed

Vincent, Catherine; Riley, Barth B; Wilkie, Diana J

2015-12-01

The Theory of Planned Behavior (TpB) is useful to direct nursing research aimed at behavior change. As proposed in the TpB, individuals' attitudes, perceived norms, and perceived behavior control predict their intentions to perform a behavior and subsequently predict their actual performance of the behavior. Our purpose was to apply Fishbein and Ajzen's guidelines to begin development of a valid and reliable instrument for pediatric nurses' attitudes, perceived norms, perceived behavior control, and intentions to administer PRN opioid analgesics when hospitalized children self-report moderate to severe pain. Following Fishbein and Ajzen's directions, we were able to define the behavior of interest and specify the research population, formulate items for direct measures, elicit salient beliefs shared by our target population and formulate items for indirect measures, and prepare and test our questionnaire. For the pilot testing of internal consistency of measurement items, Cronbach alphas were between 0.60 and 0.90 for all constructs. Test-retest reliability correlations ranged from 0.63 to 0.90. Following Fishbein and Ajzen's guidelines was a feasible and organized approach for instrument development. In these early stages, we demonstrated good reliability for most subscales, showing promise for the instrument and its use in pain management research. Better understanding of the TpB constructs will facilitate the development of interventions targeted toward nurses' attitudes, perceived norms, and/or perceived behavior control to ultimately improve their pain behaviors toward reducing pain for vulnerable children. Copyright © 2015 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
Variation in the Readability of Items Within Surveys

PubMed Central

Calderón, José L.; Morales, Leo S.; Liu, Honghu; Hays, Ron D.

2006-01-01

The objective of this study was to estimate the variation in the readability of survey items within 2 widely used health-related quality-of-life surveys: the National Eye Institute Visual Functioning Questionnaire–25 (VFQ-25) and the Short Form Health Survey, version 2 (SF-36v2). Flesch-Kincaid and Flesch Reading Ease formulas were used to estimate readability. Individual survey item scores and descriptive statistics for each survey were calculated. Variation of individual item scores from the mean survey score was graphically depicted for each survey. The mean reading grade level and reading ease estimates for the VFQ-25 and SF-36v2 were 7.8 (fairly easy) and 6.4 (easy), respectively. Both surveys had notable variation in item readability; individual item readability scores ranged from 3.7 to 12.0 (very easy to difficult) for the VFQ-25 and 2.2 to 12.0 (very easy to difficult) for the SF-36v2. Because survey respondents may not comprehend items with readability scores that exceed their reading ability, estimating the readability of each survey item is an important component of evaluating survey readability. Standards for measuring the readability of surveys are needed. PMID:16401705
SEQUenCE: a service user-centred quality of care instrument for mental health services.

PubMed

Hester, Lorraine; O'Doherty, Lorna Jane; Schnittger, Rebecca; Skelly, Niamh; O'Donnell, Muireann; Butterly, Lisa; Browne, Robert; Frorath, Charlotte; Morgan, Craig; McLoughlin, Declan M; Fearon, Paul

2015-08-01

To develop a quality of care instrument that is grounded in the service user perspective and validate it in a mental health service. The instrument (SEQUenCE (SErvice user QUality of CarE)) was developed through analysis of focus group data and clinical practice guidelines, and refined through field-testing and psychometric analyses. All participants were attending an independent mental health service in Ireland. Participants had a diagnosis of bipolar affective disorder (BPAD) or a psychotic disorder. Twenty-nine service users participated in six focus group interviews. Seventy-one service users participated in field-testing: 10 judged the face validity of an initial 61-item instrument; 28 completed a revised 52-item instrument from which 12 items were removed following test-retest and convergent validity analyses; 33 completed the resulting 40-item instrument. Test-retest reliability, internal consistency and convergent validity of the instrument. The final instrument showed acceptable test-retest reliability at 5-7 days (r = 0.65; P < 0.001), good convergent validity with the Verona Service Satisfaction Scale (r = 0.84, P < 0.001) and good internal consistency (Cronbach's alpha = 0.87). SEQUenCE is a valid, reliable scale that is grounded in the service user perspective and suitable for routine use. It may serve as a useful tool in individual care planning, service evaluation and research. The instrument was developed and validated with service users with a diagnosis of either BPAD or a psychotic disorder; it does not yet have established external validity for other diagnostic groups. © The Author 2015. Published by Oxford University Press in association with the International Society for Quality in Health Care; all rights reserved.
Reliability of a computer and Internet survey (Computer User Profile) used by adults with and without traumatic brain injury (TBI).

PubMed

Kilov, Andrea M; Togher, Leanne; Power, Emma

2015-01-01

To determine test-re-test reliability of the 'Computer User Profile' (CUP) in people with and without TBI. The CUP was administered on two occasions to people with and without TBI. The CUP investigated the nature and frequency of participants' computer and Internet use. Intra-class correlation coefficients and kappa coefficients were conducted to measure reliability of individual CUP items. Descriptive statistics were used to summarize content of responses. Sixteen adults with TBI and 40 adults without TBI were included in the study. All participants were reliable in reporting demographic information, frequency of social communication and leisure activities and computer/Internet habits and usage. Adults with TBI were reliable in 77% of their responses to survey items. Adults without TBI were reliable in 88% of their responses to survey items. The CUP was practical and valuable in capturing information about social, leisure, communication and computer/Internet habits of people with and without TBI. Adults without TBI scored more items with satisfactory reliability overall in their surveys. Future studies may include larger samples and could also include an exploration of how people with/without TBI use other digital communication technologies. This may provide further information on determining technology readiness for people with TBI in therapy programmes.
Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Criterion-Referenced Test Items for Small Engines.

ERIC Educational Resources Information Center

Herd, Amon

This notebook contains criterion-referenced test items for testing students' knowledge of small engines. The test items are based upon competencies found in the Missouri Small Engine Competency Profile. The test item bank is organized in 18 sections that cover the following duties: shop procedures; tools and equipment; fasteners; servicing fuel…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.