The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
ERIC Educational Resources Information Center
Sahin, Alper; Anil, Duygu
2017-01-01
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
DeGeest, David Scott; Schmidt, Frank
2015-01-01
Our objective was to apply the rigorous test developed by Browne (1992) to determine whether the circumplex model fits Big Five personality data. This test has yet to be applied to personality data. Another objective was to determine whether blended items explained correlations among the Big Five traits. We used two working adult samples, the Eugene-Springfield Community Sample and the Professional Worker Career Experience Survey. Fit to the circumplex was tested via Browne's (1992) procedure. Circumplexes were graphed to identify items with loadings on multiple traits (blended items), and to determine whether removing these items changed five-factor model (FFM) trait intercorrelations. In both samples, the circumplex structure fit the FFM traits well. Each sample had items with dual-factor loadings (8 items in the first sample, 21 in the second). Removing blended items had little effect on construct-level intercorrelations among FFM traits. We conclude that rigorous tests show that the fit of personality data to the circumplex model is good. This finding means the circumplex model is competitive with the factor model in understanding the organization of personality traits. The circumplex structure also provides a theoretically and empirically sound rationale for evaluating intercorrelations among FFM traits. Even after eliminating blended items, FFM personality traits remained correlated.
ERIC Educational Resources Information Center
Sengul Avsar, Asiye; Tavsancil, Ezel
2017-01-01
This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation
ERIC Educational Resources Information Center
Chuah, Siang Chee; Drasgow, Fritz; Luecht, Richard
2006-01-01
Adaptive tests offer the advantages of reduced test length and increased accuracy in ability estimation. However, adaptive tests require large pools of precalibrated items. This study looks at the development of an item pool for 1 type of adaptive administration: the computer-adaptive sequential test. An important issue is the sample size required…
ERIC Educational Resources Information Center
Nitko, Anthony J.; Hsu, Tse-chi
Item analysis procedures appropriate for domain-referenced classroom testing are described. A conceptual framework within which item statistics can be considered and promising statistics in light of this framework are presented. The sampling fluctuations of the more promising item statistics for sample sizes comparable to the typical classroom…
The Accuracy of Estimated Total Test Statistics. Final Report.
ERIC Educational Resources Information Center
Kleinke, David J.
In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…
Criterion-Referenced Test Items for Welding.
ERIC Educational Resources Information Center
Davis, Diane, Ed.
This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.
ERIC Educational Resources Information Center
Australian Council for Educational Research, Hawthorn.
The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
ERIC Educational Resources Information Center
Sahin, Alper; Weiss, David J.
2015-01-01
This study aimed to investigate the effects of calibration sample size and item bank size on examinee ability estimation in computerized adaptive testing (CAT). For this purpose, a 500-item bank pre-calibrated using the three-parameter logistic model with 10,000 examinees was simulated. Calibration samples of varying sizes (150, 250, 350, 500,…
ERIC Educational Resources Information Center
O'Keeffe, Lisa
2016-01-01
Language is frequently discussed as barrier to mathematics word problems. Hence this paper presents the initial findings of a linguistic analysis of numeracy skills test sample items. The theoretical perspective of multi-modal text analysis underpinned this study, in which data was extracted from the ten sample numeracy test items released by the…
Optimal Bayesian Adaptive Design for Test-Item Calibration.
van der Linden, Wim J; Ren, Hao
2015-06-01
An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
Interactions Between Item Content And Group Membership on Achievement Test Items.
ERIC Educational Resources Information Center
Linn, Robert L.; Harnisch, Delwyn L.
The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…
A Procedure to Detect Item Bias Present Simultaneously in Several Items
1991-04-25
exhibit a coherent and major biasing influence at the test level. In partic- ular, this can be true even if each individual item displays only a minor...response functions (IRFs) without the use of item parameter estimation algorithms when the sample size is too small for their use. Thissen, Steinberg...convention). A random sample of examinees is drawn from each group, and a test of N items is administered to them. Typically it is suspected that a
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift
ERIC Educational Resources Information Center
Guo, Rui; Zheng, Yi; Chang, Hua-Hua
2015-01-01
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2010-01-01
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
ERIC Educational Resources Information Center
Montague, Margariete A.
This study investigated the feasibility of concurrently and randomly sampling examinees and items in order to estimate group achievement. Seven 32-item tests reflecting a 640-item universe of simple open sentences were used such that item selection (random, systematic) and assignment (random, systematic) of items (four, eight, sixteen) to forms…
ERIC Educational Resources Information Center
Meyers, Jason L.; Murphy, Stephen; Goodman, Joshua; Turhan, Ahmet
2012-01-01
Operational testing programs employing item response theory (IRT) applications benefit from of the property of item parameter invariance whereby item parameter estimates obtained from one sample can be applied to other samples (when the underlying assumptions are satisfied). In theory, this feature allows for applications such as computer-adaptive…
Estimating Total-Test Scores from Partial Scores in a Matrix Sampling Design.
ERIC Educational Resources Information Center
Sachar, Jane; Suppes, Patrick
1980-01-01
The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students and 60 items of the 110-item Stanford Mental Arithmetic Test. Three methods yielded fairly good estimates of the total-test score. (Author/RL)
The Effect of Including or Excluding Students with Testing Accommodations on IRT Calibrations.
ERIC Educational Resources Information Center
Karkee, Thakur; Lewis, Dan M.; Barton, Karen; Haug, Carolyn
This study aimed to determine the degree to which the inclusion of accommodated students with disabilities in the calibration sample affects the characteristics of item parameters and the test results. Investigated were effects on test reliability, item fit to the applicable item response theory (IRT) model, item parameter estimates, and students'…
Racial and Ethnic Bias in Test Construction. Final Report.
ERIC Educational Resources Information Center
Green, Donald Ross
To determine if tryout samples typically used for item selection contribute to test bias against minority groups, item analyses were made of the California Achievement Tests using seven subgroups of the standardization sample: Northern White Suburban, Northern Black Urban, Southern White Suburban, Southern Black Rural, Southern White Rural,…
Racial and Ethnic Bias in Test Construction.
ERIC Educational Resources Information Center
Green, Donald Ross
To determine if tryout samples typically used for item selection contribute to test bias against minority groups, item analyses were made of the California Achievement Tests using seven sub-groups of the standardization sample: Northern White Suburban, Northern Black Urban, Southern White Suburban, Southern Black Rural, Southern White Rural,…
NASA Astrophysics Data System (ADS)
Schmiemann, Philipp; Nehm, Ross H.; Tornabene, Robyn E.
2017-12-01
Understanding how situational features of assessment tasks impact reasoning is important for many educational pursuits, notably the selection of curricular examples to illustrate phenomena, the design of formative and summative assessment items, and determination of whether instruction has fostered the development of abstract schemas divorced from particular instances. The goal of our study was to employ an experimental research design to quantify the degree to which situational features impact inferences about participants' understanding of Mendelian genetics. Two participant samples from different educational levels and cultural backgrounds (high school, n = 480; university, n = 444; Germany and USA) were used to test for context effects. A multi-matrix test design was employed, and item packets differing in situational features (e.g., plant, animal, human, fictitious) were randomly distributed to participants in the two samples. Rasch analyses of participant scores from both samples produced good item fit, person reliability, and item reliability and indicated that the university sample displayed stronger performance on the items compared to the high school sample. We found, surprisingly, that in both samples, no significant differences in performance occurred among the animal, plant, and human item contexts, or between the fictitious and "real" item contexts. In the university sample, we were also able to test for differences in performance between genders, among ethnic groups, and by prior biology coursework. None of these factors had a meaningful impact upon performance or context effects. Thus some, but not all, types of genetics problem solving or item formats are impacted by situational features.
Computerized adaptive testing: the capitalization on chance problem.
Olea, Julio; Barrada, Juan Ramón; Abad, Francisco J; Ponsoda, Vicente; Cuevas, Lara
2012-03-01
This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of theta, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (theta). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.
Effect of Differential Item Functioning on Test Equating
ERIC Educational Resources Information Center
Kabasakal, Kübra Atalay; Kelecioglu, Hülya
2015-01-01
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Estimating Total-test Scores from Partial Scores in a Matrix Sampling Design.
ERIC Educational Resources Information Center
Sachar, Jane; Suppes, Patrick
It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Gillespie, Brigid M; Polit, Denise F; Hamlin, Lois; Chaboyer, Wendy
2012-01-01
This paper describes the development and validation of the Revised Perioperative Competence Scale (PPCS-R). There is a lack of a psychometrically tested sound self-assessment tools to measure nurses' perceived competence in the operating room. Content validity was established by a panel of international experts and the original 98-item scale was pilot tested with 345 nurses in Queensland, Australia. Following the removal of several items, a national sample that included all 3209 nurses who were members of the Australian College of Operating Room Nurses was surveyed using the 94-item version. Psychometric testing assessed content validity using exploratory factor analysis, internal consistency using Cronbach's alpha, and construct validity using the "known groups" technique. During item reduction, several preliminary factor analyses were performed on two random halves of the sample (n=550). Usable data for psychometric assessment were obtained from 1122 nurses. The original 94-item scale was reduced to 40 items. The final factor analysis using the entire sample resulted in a 40 item six-factor solution. Cronbach's alpha for the 40-item scale was .96. Construct validation demonstrated significant differences (p<.0001) in perceived competence scores relative to years of operating room experience and receipt of specialty education. On the basis of these results, the psychometric properties of the PPCS-R were considered encouraging. Further testing of the tool in different samples of operating room nurses is necessary to enable cross-cultural comparisons. Copyright © 2011 Elsevier Ltd. All rights reserved.
Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E
2015-08-01
The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
ERIC Educational Resources Information Center
Magno, Carlo
2009-01-01
The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…
Home Economics. Sample Test Items. Levels I and II.
ERIC Educational Resources Information Center
New York State Education Dept., Albany. Bureau of Elementary and Secondary Educational Testing.
A sample of behavioral objectives and related test items that could be developed for content modules in Home Economics levels I and II, this book is intended to enable teachers to construct more valid and reliable test materials. Forty-eight one-page modules are presented, and opposite each module are listed two to seven specific behavioral…
Model Choice and Sample Size in Item Response Theory Analysis of Aphasia Tests
ERIC Educational Resources Information Center
Hula, William D.; Fergadiotis, Gerasimos; Martin, Nadine
2012-01-01
Purpose: The purpose of this study was to identify the most appropriate item response theory (IRT) measurement model for aphasia tests requiring 2-choice responses and to determine whether small samples are adequate for estimating such models. Method: Pyramids and Palm Trees (Howard & Patterson, 1992) test data that had been collected from…
NASA Astrophysics Data System (ADS)
Ilich, Maria O.
Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
ERIC Educational Resources Information Center
Çikirikçi Demirtasli, Nükhet; Ulutas, Seher
2015-01-01
Problem Statement: Item bias occurs when individuals from different groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test…
ERIC Educational Resources Information Center
Huitzing, Hiddo A.
2004-01-01
This article shows how set covering with item sampling (SCIS) methods can be used in the analysis and preanalysis of linear programming models for test assembly (LPTA). LPTA models can construct tests, fulfilling a set of constraints set by the test assembler. Sometimes, no solution to the LPTA model exists. The model is then said to be…
ERIC Educational Resources Information Center
Qian, Jiahe; Jiang, Yanming; von Davier, Alina A.
2013-01-01
Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal…
Ethnic Group Bias in Intelligence Test Items.
ERIC Educational Resources Information Center
Scheuneman, Janice
In previous studies of ethnic group bias in intelligence test items, the question of bias has been confounded with ability differences between the ethnic group samples compared. The present study is based on a conditional probability model in which an unbiased item is defined as one where the probability of a correct response to an item is the…
Maintaining Equivalent Cut Scores for Small Sample Test Forms
ERIC Educational Resources Information Center
Dwyer, Andrew C.
2016-01-01
This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…
U.S. History: Grades 7-9. Revised Edition.
ERIC Educational Resources Information Center
Instructional Objectives Exchange, Los Angeles, CA.
Sixty-three behavioral objectives and related test items for United States history in grades seven through nine are presented. Each sample contains the objective, sample test items and directions, and criteria for judging the adequacy of student responses. Fourteen of the 15 categories are content oriented and presented chronologically: (1)…
ERIC Educational Resources Information Center
Sullins, Walter L.
Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…
Flens, Gerard; Smits, Niels; Terwee, Caroline B; Dekker, Joost; Huijbrechts, Irma; Spinhoven, Philip; de Beurs, Edwin
2017-12-01
We used the Dutch-Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample ( N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch-Flemish version of the PROMIS Anxiety item bank.
ERIC Educational Resources Information Center
Michaelides, Michalis P.; Haertel, Edward H.
2014-01-01
The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…
Carbonneau, Elise; Robitaille, Julie; Lamarche, Benoît; Corneau, Louise; Lemieux, Simone
2017-08-01
The present study aimed to develop and validate a questionnaire assessing perceived food environment in a French-Canadian population. A questionnaire, the Perceived Food Environment Questionnaire, was developed assessing perceived accessibility to healthy (nine items) and unhealthy foods (three items). A pre-test sample was recruited for a pilot testing of the questionnaire. For the validation study, another sample was recruited and completed the questionnaire twice. Exploratory factor analysis was performed on the items to assess the number of factors (subscales). Cronbach's α was used to measure internal consistency reliability. Test-retest reliability was assessed with Pearson correlations. Online survey. Men and women from the Québec City area (n 31 in the pre-test sample; n 150 in the validation study sample). The pilot testing did not lead to any change in the questionnaire. The exploratory factor analysis revealed a two-subscale structure. The first subscale is composed of six items assessing accessibility to healthy foods and the second includes three items related to accessibility to unhealthy foods. Three items were removed from the questionnaire due to low loading on the two subscales. The subscales demonstrated adequate internal consistency (Cronbach's α=0·77 for healthy foods and 0·62 for unhealthy foods) and test-retest reliability (r=0·59 and 0·60, respectively; both P<0·0001). The Perceived Food Environment Questionnaire was developed for a French-Canadian population and demonstrated good psychometric properties. Further validation is recommended if the questionnaire is to be used in other populations.
ERIC Educational Resources Information Center
Khaksefidi, Saman
2017-01-01
This study investigates the psychological effect of a wrong question with wrong items on answering to the next question in a test of structure. Forty students selected through stratified random sampling are given 15 questions of a standardized test namely a TOEFL structure test in which questions number 7 and number 11 are wrong and their answers…
U.S. History: Grades 10-12. Revised Edition.
ERIC Educational Resources Information Center
Instructional Objectives Exchange, Los Angeles, CA.
Seventy-seven behavioral objectives and related test items for United States history in grades 10 through 12 are presented. Each sample contains the objective, sample test items, and criteria for judging the adequacy of student responses. Fourteen of the 15 categories are content-oriented, and presented in chronological groups: (1) discovery of…
ERIC Educational Resources Information Center
Seo, Dong Gi; Hao, Shiqi
2016-01-01
Differential item/test functioning (DIF/DTF) are routine procedures to detect item/test unfairness as an explanation for group performance difference. However, unequal sample sizes and small sample sizes have an impact on the statistical power of the DIF/DTF detection procedures. Furthermore, DIF/DTF cannot be used for two test forms without…
ERIC Educational Resources Information Center
Kohli, Nidhi; Koran, Jennifer; Henn, Lisa
2015-01-01
There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior…
ERIC Educational Resources Information Center
Immekus, Jason C.; Maller, Susan J.
2009-01-01
The Kaufman Adolescent and Adult Intelligence Test (KAIT[TM]) is an individually administered test of intelligence for individuals ranging in age from 11 to 85+ years. The item response theory-likelihood ratio procedure, based on the two-parameter logistic model, was used to detect differential item functioning (DIF) in the KAIT across males and…
An investigation of the measurement properties of the Spot-the-Word test in a community sample.
Mackinnon, Andrew; Christensen, Helen
2007-12-01
Intellectual ability is assessed with the Spot-the-Word (STW) test (A. Baddeley, H. Emslie, & I. Nimmo Smith, 1993) by asking respondents to identify a word in a word-nonword item pair. Results in moderate-sized samples suggest this ability is resistant to decline due to dementia. The authors used a 3-parameter item response theory model to investigate the measurement properties of the STW in a large community-dwelling sample (n=2,480) 60 to 64 years of age. A number of poorly performing items were identified. Substantial guessing was present; however, the number of words correctly identified was found to be an accurate index of ability. Performance was moderately related to a number of tests of cognitive performance and was effectively unrelated to visual acuity and to physical or mental health status. The STW is a promising test of ability that, in the future, may be refined by the deletion or replacement of poorly functioning items.
ERIC Educational Resources Information Center
Geiser, Christian; Lehmann, Wolfgang; Eid, Michael
2006-01-01
Items of mental rotation tests can not only be solved by mental rotation but also by other solution strategies. A multigroup latent class analysis of 24 items of the Mental Rotations Test (MRT) was conducted in a sample of 1,695 German pupils and students to find out how many solution strategies can be identified for the items of this test. The…
Tepe, Rodger; Tepe, Chabha
2015-03-01
To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. In this test-retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. The IL self-efficacy survey demonstrated good reliability (test-retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test-retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments.
Development of a noise annoyance sensitivity scale
NASA Technical Reports Server (NTRS)
Bregman, H. L.; Pearson, R. G.
1972-01-01
Examining the problem of noise pollution from the psychological rather than the engineering view, a test of human sensitivity to noise was developed against the criterion of noise annoyance. Test development evolved from a previous study in which biographical, attitudinal, and personality data was collected on a sample of 166 subjects drawn from the adult community of Raleigh. Analysis revealed that only a small subset of the data collected was predictive of noise annoyance. Item analysis yielded 74 predictive items that composed the preliminary noise sensitivity test. This was administered to a sample of 80 adults who later rate the annoyance value of six sounds (equated in terms of peak sound pressure level) presented in a simulated home, living-room environment. A predictive model involving 20 test items was developed using multiple regression techniques, and an item weighting scheme was evaluated.
Methodology for the development and calibration of the SCI-QOL item banks
Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David
2015-01-01
Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Methodology for the development and calibration of the SCI-QOL item banks.
Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David
2015-05-01
To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Forkmann, Thomas; Kroehne, Ulf; Wirtz, Markus; Norra, Christine; Baumeister, Harald; Gauggel, Siegfried; Elhan, Atilla Halil; Tennant, Alan; Boecker, Maren
2013-11-01
This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated. Recalibration was performed in a sample of 161 patients treated for a depressive syndrome, 103 patients from cardiology, and 103 patients from otorhinolaryngology (mean age 44.1, SD=14.0; 44.7% female) and was cross-validated in a sample of 117 patients undergoing rehabilitation for cardiac diseases (mean age 58.4, SD=10.5; 24.8% women). Unidimensionality of the itembank was checked and a Rasch analysis was performed that evaluated local dependency (LD), differential item functioning (DIF), item fit and reliability. CAT-simulation was conducted with the total sample and additional simulated data. Recalibration resulted in a strictly unidimensional item bank with 36 items, showing good Rasch model fit (item fit residuals<|2.5|) and no DIF or LD. CAT simulation revealed that 13 items on average were necessary to estimate depression in the range of -2 and +2 logits when terminating at SE≤0.32 and 4 items if using SE≤0.50. Receiver Operating Characteristics analysis showed that θ estimates based on the CAT algorithm have good criterion validity with regard to depression diagnoses (Area Under the Curve≥.78 for all cut-off criteria). The recalibration of the ADIB succeeded and the simulation studies conducted suggest that it has good screening performance in the samples investigated and that it may reasonably add to the improvement of depression assessment. © 2013.
Gender-Based Differential Item Performance in Mathematics Achievement Items.
ERIC Educational Resources Information Center
Doolittle, Allen E.; Cleary, T. Anne
1987-01-01
Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)
Clinton-McHarg, Tara; Carey, Mariko; Sanson-Fisher, Rob; D'Este, Catherine; Shakeshaft, Anthony
2012-01-30
Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken.
2012-01-01
Background Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Methods Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. Results The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. Conclusions The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken. PMID:22284545
ERIC Educational Resources Information Center
Puhan, Gautam; Moses, Tim P.; Yu, Lei; Dorans, Neil J.
2007-01-01
The purpose of the current study was to examine whether log-linear smoothing of observed score distributions in small samples results in more accurate differential item functioning (DIF) estimates under the simultaneous item bias test (SIBTEST) framework. Data from a teacher certification test were analyzed using White candidates in the reference…
ERIC Educational Resources Information Center
Sawaki, Yasuyo; Stricker, Lawrence; Oranje, Andreas
2008-01-01
The present study investigated the factor structure of a field trial sample of the Test of English as a Foreign Language™ Internet-based test (TOEFL® iBT). An item-level confirmatory factor analysis (CFA) was conducted for a polychoric correlation matrix of items on a test form completed by 2,720 participants in the 2003-2004 TOEFL iBT Field…
Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.
ERIC Educational Resources Information Center
O'Neill, Thomas R.; Lunz, Mary E.
To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam
2014-01-01
The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…
The development of the Pictorial Thai Quality of Life.
Phattharayuttawat, Sucheera; Ngamthipwatthana, Thienchai; Pitiyawaranun, Buncha
2005-11-01
"Quality of life" has become a main focus of interest in medicine. The Pictorial Thai Quality of Life (PTQL) was developed in order to measure the Thai mental illness both in a clinical setting and community. The purpose of this study was to develop the Pictorial Thai Quality of Life (PTQL), having adequate and sufficient construct validity, discriminant power, concurrent validity, and reliability. To develop the Pictorial Thai Quality of Life Test, two samples groups were used in the present study: (1) pilot study samples: 30 samples and (2) survey samples were 672 samples consisting of normal, and psychiatric patients. The developing tests items were collected from a review of the literature in which all the items were based on the WHO definition of Quality of Life. Then, experts judgment by the Delphi technique was used in the first stage. After that a pilot study was used to evaluate the testing administration, and wording of the tests items. The final stage was collected data from the survey samples. The results of the present study showed that the final test was composed 25 items. The construct validity of this test consists of six domains: Physical, Cognitive, Affective, Social Function, Economic and Self-Esteem. All the PTQL items have sufficient discriminant power It was found to be statistically significant different at the. 001 level between those people with mental disorders and normal people. There was a high level of concurrent validity association with WHOQOL-BREF, Pearson correlation coefficient and Area under ROC curve were 0.92 and 0.97 respectively. The reliability coefficients for the Alpha coefficients of the PTQL total test was 0.88. The values of the six scales were from 0.81 to 0:91. The present study was directed at developing an effective psychometric properties pictorial quality of life questionnaire. The result will be a more direct and meaningful application of an instrument to detect the mental health illness poor quality of life in Thai communities.
Tepe, Rodger; Tepe, Chabha
2015-01-01
Objective To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. Methods In this test–retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. Results The IL self-efficacy survey demonstrated good reliability (test–retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test–retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). Conclusions This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments. PMID:25517736
Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank
2010-10-01
The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Sex differences and gender-invariance of mother-reported childhood problem behavior.
van der Sluis, Sophie; Polderman, Tinca J C; Neale, Michael C; Verhulst, Frank C; Posthuma, Danielle; Dieleman, Gwen C
2017-09-01
Prevalence and severity of childhood behavioral problems differ between boys and girls, and in psychiatry, testing for gender differences is common practice. Population-based studies show that many psychopathology scales are (partially) Measurement Invariance (MI) with respect to gender, i.e. are unbiased. It is, however, unclear whether these studies generalize towards clinical samples. In a psychiatric outpatient sample, we tested whether the Child Behavior Checklist 6-18 (CBCL) is unbiased with respect to gender. We compared mean scores across gender of all syndrome scales of the CBCL in 3271 patients (63.3% boys) aged 6-18. Second, we tested for MI on both the syndrome scale and the item-level using a stepwise modeling procedure. Six of the eight CBCL syndrome scales included one or more gender-biased items (12.6% of all items), resulting in slight over- or under-estimation of the absolute gender difference in mean scores. Two scales, Somatic Complaints and Rule-breaking Behavior, contained no biased items. The CBCL is a valid instrument to measure gender differences in problem behavior in children and adolescents from a clinical sample; while various gender-biased items were identified, the resulting bias was generally clinically irrelevant, and sufficient items per subscale remained after exclusion of biased items. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
An Investigation of the Measurement Properties of the Spot-the-Word Test In a Community Sample
ERIC Educational Resources Information Center
Mackinnon, Andrew; Christensen, Helen
2007-01-01
Intellectual ability is assessed with the Spot-the-Word (STW) test (A. Baddeley, H. Emslie, & I. Nimmo Smith, 1993) by asking respondents to identify a word in a word-nonword item pair. Results in moderate-sized samples suggest this ability is resistant to decline due to dementia. The authors used a 3-parameter item response theory model to…
ERIC Educational Resources Information Center
Paek, Insu; Wilson, Mark
2011-01-01
This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…
ERIC Educational Resources Information Center
Chen, Pei-Hua; Chang, Hua-Hua; Wu, Haiyan
2012-01-01
Two sampling-and-classification-based procedures were developed for automated test assembly: the Cell Only and the Cell and Cube methods. A simulation study based on a 540-item bank was conducted to compare the performance of the procedures with the performance of a mixed-integer programming (MIP) method for assembling multiple parallel test…
Developmental changes in visual short-term memory in infancy: evidence from eye-tracking.
Oakes, Lisa M; Baumgartner, Heidi A; Barrett, Frederick S; Messenger, Ian M; Luck, Steven J
2013-01-01
We assessed visual short-term memory (VSTM) for color in 6- and 8-month-old infants (n = 76) using a one-shot change detection task. In this task, a sample array of two colored squares was visible for 517 ms, followed by a 317-ms retention period and then a 3000-ms test array consisting of one unchanged item and one item in a new color. We tracked gaze at 60 Hz while infants looked at the changed and unchanged items during test. When the two sample items were different colors (Experiment 1), 8-month-old infants exhibited a preference for the changed item, indicating memory for the colors, but 6-month-olds exhibited no evidence of memory. When the two sample items were the same color and did not need to be encoded as separate objects (Experiment 2), 6-month-old infants demonstrated memory. These results show that infants can encode information in VSTM in a single, brief exposure that simulates the timing of a single fixation period in natural scene viewing, and they reveal rapid developmental changes between 6 and 8 months in the ability to store individuated items in VSTM.
Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias
2017-12-01
To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (< 0.1 point on the PROMIS metric [mean = 50, standard deviation =10]), while the effect on the short forms was small (< 1 point). The mean depression severity (43.6) in the German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Ali, Usama S.; Walker, Michael E.
2014-01-01
Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…
Small-Sample DIF Estimation Using SIBTEST, Cochran's Z, and Log-Linear Smoothing
ERIC Educational Resources Information Center
Lei, Pui-Wa; Li, Hongli
2013-01-01
Minimum sample sizes of about 200 to 250 per group are often recommended for differential item functioning (DIF) analyses. However, there are times when sample sizes for one or both groups of interest are smaller than 200 due to practical constraints. This study attempts to examine the performance of Simultaneous Item Bias Test (SIBTEST),…
Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.
ERIC Educational Resources Information Center
Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas
2002-01-01
Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…
Chung, Hyewon; Kim, Jiseon; Cook, Karon F; Askew, Robert L; Revicki, Dennis A; Amtmann, Dagmar
2014-02-01
In order to test the difference between group means, the construct measured must have the same meaning for all groups under investigation. This study examined the measurement invariance of responses to the patient-reported outcomes measurement information system (PROMIS) pain behavior (PB) item bank in two samples: the PROMIS calibration sample (Wave 1, N = 426) and a sample recruited from the American Chronic Pain Association (ACPA, N = 750). The ACPA data were collected to increase the number of participants with higher levels of pain. Multi-group confirmatory factor analysis (MG-CFA) and two item response theory (IRT)-based differential item functioning (DIF) approaches were employed to evaluate the existence of measurement invariance. MG-CFA results supported metric invariance of the PROMIS-PB, indicating unstandardized factor loadings with equal across samples. DIF analyses revealed that impact of 6 DIF items was negligible. Based on the results of both MG-CFA and IRT-based DIF approaches, we recommend retaining the original parameter estimates obtained from the combined samples based on the results of MG-CFA.
Gender fairness within the Force Concept Inventory
NASA Astrophysics Data System (ADS)
Traxler, Adrienne; Henderson, Rachel; Stewart, John; Stewart, Gay; Papak, Alexis; Lindell, Rebecca
2018-01-01
Research on the test structure of the Force Concept Inventory (FCI) has largely ignored gender, and research on FCI gender effects (often reported as "gender gaps") has seldom interrogated the structure of the test. These rarely crossed streams of research leave open the possibility that the FCI may not be structurally valid across genders, particularly since many reported results come from calculus-based courses where 75% or more of the students are men. We examine the FCI considering both psychometrics and gender disaggregation (while acknowledging this as a binary simplification), and find several problematic questions whose removal decreases the apparent gender gap. We analyze three samples (total Npre=5391 , Npost=5769 ) looking for gender asymmetries using classical test theory, item response theory, and differential item functioning. The combination of these methods highlights six items that appear substantially unfair to women and two items biased in favor of women. No single physical concept or prior experience unifies these questions, but they are broadly consistent with problematic items identified in previous research. Removing all significantly gender-unfair items halves the gender gap in the main sample in this study. We recommend that instructors using the FCI report the reduced-instrument score as well as the 30-item score, and that credit or other benefits to students not be assigned using the biased items.
Rasch model based analysis of the Force Concept Inventory
NASA Astrophysics Data System (ADS)
Planinic, Maja; Ivanjek, Lana; Susac, Ana
2010-06-01
The Force Concept Inventory (FCI) is an important diagnostic instrument which is widely used in the field of physics education research. It is therefore very important to evaluate and monitor its functioning using different tools for statistical analysis. One of such tools is the stochastic Rasch model, which enables construction of linear measures for persons and items from raw test scores and which can provide important insight in the structure and functioning of the test (how item difficulties are distributed within the test, how well the items fit the model, and how well the items work together to define the underlying construct). The data for the Rasch analysis come from the large-scale research conducted in 2006-07, which investigated Croatian high school students’ conceptual understanding of mechanics on a representative sample of 1676 students (age 17-18 years). The instrument used in research was the FCI. The average FCI score for the whole sample was found to be (27.7±0.4)% , indicating that most of the students were still non-Newtonians at the end of high school, despite the fact that physics is a compulsory subject in Croatian schools. The large set of obtained data was analyzed with the Rasch measurement computer software WINSTEPS 3.66. Since the FCI is routinely used as pretest and post-test on two very different types of population (non-Newtonian and predominantly Newtonian), an additional predominantly Newtonian sample ( N=141 , average FCI score of 64.5%) of first year students enrolled in introductory physics course at University of Zagreb was also analyzed. The Rasch model based analysis suggests that the FCI has succeeded in defining a sufficiently unidimensional construct for each population. The analysis of fit of data to the model found no grossly misfitting items which would degrade measurement. Some items with larger misfit and items with significantly different difficulties in the two samples of students do require further examination. The analysis revealed some problems with item distribution in the FCI and suggested that the FCI may function differently in non-Newtonian and predominantly Newtonian population. Some possible improvements of the test are suggested.
ERIC Educational Resources Information Center
Shepard, Lorrie, And Others
1981-01-01
Sixteen approaches for detecting item bias were compared on samples of Black, White, and Chicano elementary school pupils using the Lorge-Thorndike and Raven's Coloured Progressive Matrices tests. Recommendations for practical use are made. (JKS)
Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa
2017-11-01
The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
Online Calibration of Polytomous Items Under the Generalized Partial Credit Model
Zheng, Yi
2016-01-01
Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. PMID:29881063
Emperical Tests of Acceptance Sampling Plans
NASA Technical Reports Server (NTRS)
White, K. Preston, Jr.; Johnson, Kenneth L.
2012-01-01
Acceptance sampling is a quality control procedure applied as an alternative to 100% inspection. A random sample of items is drawn from a lot to determine the fraction of items which have a required quality characteristic. Both the number of items to be inspected and the criterion for determining conformance of the lot to the requirement are given by an appropriate sampling plan with specified risks of Type I and Type II sampling errors. In this paper, we present the results of empirical tests of the accuracy of selected sampling plans reported in the literature. These plans are for measureable quality characteristics which are known have either binomial, exponential, normal, gamma, Weibull, inverse Gaussian, or Poisson distributions. In the main, results support the accepted wisdom that variables acceptance plans are superior to attributes (binomial) acceptance plans, in the sense that these provide comparable protection against risks at reduced sampling cost. For the Gaussian and Weibull plans, however, there are ranges of the shape parameters for which the required sample sizes are in fact larger than the corresponding attributes plans, dramatically so for instances of large skew. Tests further confirm that the published inverse-Gaussian (IG) plan is flawed, as reported by White and Johnson (2011).
Prediction of true test scores from observed item scores and ancillary data.
Haberman, Shelby J; Yao, Lili; Sinharay, Sandip
2015-05-01
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
Lúcio, Patrícia Silva; Cogo-Moreira, Hugo; Puglisi, Marina; Polanczyk, Guilherme Vanoni; Little, Todd D
2017-11-01
The present study investigated the psychometric properties of the Raven's Colored Progressive Matrices (CPM) test in a sample of preschoolers from Brazil ( n = 582; age: mean = 57 months, SD = 7 months; 46% female). We investigated the plausibility of unidimensionality of the items (confirmatory factor analysis) and differential item functioning (DIF) for sex and age (multiple indicators multiple causes method). We tested four unidimensional models and the one with the best-fit index was a reduced form of the Raven's CPM. The DIF analysis was carried out with the reduced form of the test. A few items presented DIF (two for sex and one for age), confirming that the Raven's CPM items are mostly measurement invariant. There was no effect of sex on the general factor, but increasing age was associated with higher values of the g factor. Future research should indicate if the reduced form is suitable for evaluating the general ability of preschoolers.
ERIC Educational Resources Information Center
Chalmers, R. Philip; Counsell, Alyssa; Flora, David B.
2016-01-01
Differential test functioning, or DTF, occurs when one or more items in a test demonstrate differential item functioning (DIF) and the aggregate of these effects are witnessed at the test level. In many applications, DTF can be more important than DIF when the overall effects of DIF at the test level can be quantified. However, optimal statistical…
Rapp-Santos, Kamala; Havas, Karyn; Vest, Kelly
2015-01-01
The Destination Monitoring Program, operated by the US Army Public Health Command (APHC), is one component that supports the APHC Veterinary Service's mission to ensure safety and quality of food procured for the Department of Defense (DoD). This program relies on retail product testing to ensure compliance of production facilities and distributors that supply food to the DoD. This program was assessed to determine the validity and timeliness by specifically evaluating whether sample size of items collected was adequate, if food samples collected were representative of risk, and whether the program returns results in a timely manner. Data was collected from the US Army Veterinary Services Lotus Notes database, including all food samples collected and submitted from APHC Region-North for the purposes of destination monitoring from January 1, 2013 to December 31, 2013. For most food items, only one sample was submitted for testing. The ability to correctly identify a contaminated food lot may be limited by reliance on test results from only one sample, as the level of confidence in a negative test result is low. The food groups most frequently sampled by APHC correlated with the commodities that were implicated in foodborne illness in the United States. Food items to be submitted were equally distributed among districts and branches, but sections within large branches submitted relatively few food samples compared to sections within smaller branches and districts. Finally, laboratory results were not available for about half the food items prior to their respective expiration dates.
ERIC Educational Resources Information Center
Zwick, Rebecca
2012-01-01
Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…
ERIC Educational Resources Information Center
Klein, Thomas W.
Steps involved in the item analysis and scaling of the 1990 edition of Forms A and B of the Nevada High School Proficiency Examinations (NHSPEs) are described. Pilot tests of Forms A and B of the 47-item reading and 45-item mathematics tests were each administered to random samples of more than 600 eleventh-grade students. A computer program was…
Flens, Gerard; Smits, Niels; Terwee, Caroline B; Dekker, Joost; Huijbrechts, Irma; de Beurs, Edwin
2017-03-01
We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample ( N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development.
Development and Preliminary Validation of the Strategic Thinking Mindset Test (STMT)
2017-06-01
reliability. The test’s three subscales (intellectual flexibility, inclusiveness, and humility) each correlated significantly with alternative measures of...34 TABLE 9. STAGE 4 SAMPLE DEMOGRAPHICS ................................................................ 35 TABLE 10. INTERITEM CORRELATION ...MATRIX (ALL ITEMS) ...................................... 39 TABLE 11. ITEM-SCALE AND VALIDITY CORRELATIONS (ALL ITEMS) .................... 40
Al-Modallal, Hanan
2010-08-01
This study examined the psychometric qualities of the Center for Epidemiologic Studies-Depression scale (CES-D) in Jordanian women. Cronbach's alpha for the 20-item CES-D was .90. Factor analysis yielded three components. Four of the items had poor factor loadings and, therefore, were dropped. Cronbach's alpha for the remaining 16 items was .85. Validity testing using independent samples t-test provided evidence of discriminant validity for the 20-item and the 16-item CES-D. Attributes of the CES-D items indicated that depression status can be easily identified by clinicians. Co morbidity of depressive symptoms with physical and mental problems necessitates routine screening for depressed mood.
ERIC Educational Resources Information Center
Instructional Objectives Exchange, Los Angeles, CA.
Ninety objectives and related test items for use in grades 7 through 12 are presented. Each sample contains an objective, test items, and criteria for judging the adequacy of the response. Objectives are organized into the following categories: (1) property of metals; (2) operations and functions; (3) cutting and shearing; (4) filing; (5) cutting…
Development and Validation of the Poverty Attributions Survey
ERIC Educational Resources Information Center
Bennett, Robert M.; Raiz, Lisa; Davis, Tamara S.
2016-01-01
This article describes the process of developing and testing the Poverty Attribution Survey (PAS), a measure of poverty attributions. The PAS is theory based and includes original items as well as items from previously tested poverty attribution instruments. The PAS was electronically administered to a sample of state-licensed professional social…
ERIC Educational Resources Information Center
Abed, Eman Rasmi; Al-Absi, Mohammad Mustafa; Abu shindi, Yousef Abdelqader
2016-01-01
The purpose of the present study is developing a test to measure the numerical ability for students of education. The sample of the study consisted of (504) students from 8 universities in Jordan. The final draft of the test contains 45 items distributed among 5 dimensions. The results revealed that acceptable psychometric properties of the test;…
Nickel and cobalt release from jewellery and metal clothing items in Korea.
Cheong, Seung Hyun; Choi, You Won; Choi, Hae Young; Byun, Ji Yeon
2014-01-01
In Korea, the prevalence of nickel allergy has shown a sharply increasing trend. Cobalt contact allergy is often associated with concomitant reactions to nickel, and is more common in Korea than in western countries. The aim of the present study was to investigate the prevalence of items that release nickel and cobalt on the Korean market. A total of 471 items that included 193 branded jewellery, 202 non-branded jewellery and 76 metal clothing items were sampled and studied with a dimethylglyoxime (DMG) test and a cobalt spot test to detect nickel and cobalt release, respectively. Nickel release was detected in 47.8% of the tested items. The positive rates in the DMG test were 12.4% for the branded jewellery, 70.8% for the non-branded jewellery, and 76.3% for the metal clothing items. Cobalt release was found in 6.2% of items. Among the types of jewellery, belts and hair pins showed higher positive rates in both the DMG test and the cobalt spot test. Our study shows that the prevalence of items that release nickel or cobalt among jewellery and metal clothing items is high in Korea. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Crocker, Linda M.; Mehrens, William A.
Four new methods of item analysis were used to select subsets of items which would yield measures of attitude change. The sample consisted of 263 students at Michigan State University who were tested on the Inventory of Beliefs as freshmen and retested on the same instrument as juniors. Item change scores and total change scores were computed for…
Absorption in Sport: A Cross-Validation Study
Koehn, Stefan; Stavrou, Nektarios A. M.; Cogley, Jeremy; Morris, Tony; Mosek, Erez; Watt, Anthony P.
2017-01-01
Absorption has been identified as readiness for experiences of deep involvement in the task. Conceptually, absorption is a key psychological construct, incorporating experiential, cognitive, and motivational components. Although, no operationalization of the construct has been provided to facilitate research in this area, the purpose of this research was the development and examination of the psychometric properties of a sport-specific measure of absorption that evolved from the use of the modified Tellegen Absorption Scale (MODTAS; Jamieson, 2005) in mainstream psychology. The study aimed to provide evidence of the psychometric properties, reliability, and validity of the Measure of Absorption in Sport Contexts (MASCs). The psychometric examination included a calibration sample from Scotland and a cross-validation sample from Australia using a cross-sectional design. The item pool was developed based on existing items from the modified Tellegen Absorption Scale (Jamieson, 2005). The MODTAS items were reworded and translated into a sport context. The Scottish sample consisted of 292 participants and the Australian sample of 314 participants. Congeneric model testing and confirmatory factor analysis for both samples and multi-group invariance testing across samples was used. In the cross-validation sample the MASC subscales showed acceptable internal consistency and construct reliability (≥0.70). Excellent fit indices were found for the final 18-item, six-factor measure in the cross-validation sample, χ(120)2 = 197.486, p < 0.001; CFI = 0.957; TLI = 0.945; RMSEA = 0.045; SRMR = 0.044. Multi-group invariance testing revealed no differences in item meaning, except for two items. The MASC and the Dispositional Flow Scale-2 showed moderate-to-strong positive correlations in both samples, r = 0.38, p < 0.001 and r = 0.42, p < 0.001, supporting the external validity of the MASC. This article provides initial evidence in support of the psychometric properties, reliability, and validity of the sport-specific measure of absorption. The MASC provides rich research opportunities in sport psychology that can enhance the theoretical understanding between absorption and related constructs and facilitate future intervention studies. PMID:28883802
Dion, Mélissa; Potvin, Olivier; Belleville, Sylvie; Ferland, Guylaine; Renaud, Mélanie; Bherer, Louis; Joubert, Sven; Vallet, Guillaume T; Simard, Martine; Rouleau, Isabelle; Lecomte, Sarah; Macoir, Joël; Hudon, Carol
2015-01-01
Performance on verbal memory tests is generally associated with socio-demographic variables such as age, sex, and education level. Performance also varies between different cultural groups. The present study aimed to establish normative data for the Rappel libre/Rappel indicé à 16 items (16-item Free and Cued Recall; RL/RI-16), a French adaptation of the Free and Cued Selective Reminding Test (Buschke, 1984; Grober, Buschke, Crystal, Bang, & Dresner, 1988). The sample consisted of 566 healthy French-speaking older adults (50-88 years old) from the province of Quebec, Canada. Normative data for the RL/RI-16 were derived from 80% of the total sample (normative sample) and cross-validated using the remaining participants (20%; validation sample). The effects of participants' age, sex, and education level were assessed on different indices of memory performance. Results indicated that these variables were independently associated with performance. Normative data are presented as regression equations with standard deviations (symmetric distributions) and percentiles (asymmetric distributions).
Invariance Properties for General Diagnostic Classification Models
ERIC Educational Resources Information Center
Bradshaw, Laine P.; Madison, Matthew J.
2016-01-01
In item response theory (IRT), the invariance property states that item parameter estimates are independent of the examinee sample, and examinee ability estimates are independent of the test items. While this property has long been established and understood by the measurement community for IRT models, the same cannot be said for diagnostic…
Developing and testing new smoking measures for the Health Plan Employer Data and Information Set.
Pbert, Lori; Vuckovic, Nancy; Ockene, Judith K; Hollis, Jack F; Riedlinger, Karen
2003-04-01
To develop and test items for the Health Plan Employee Data and Information Set (HEDIS) that assess delivery of the full range of provider-delivered tobacco interventions. The authors identified potential items via literature review; items were reviewed by national experts. Face validity of candidate items was tested in focus groups. The final survey was sent to a random sample of 1711 adult primary care patients; the re-test survey was sent to self-identified smokers. The process identified reliable items to capture provider assessment of motivation and provision of assistance and follow-up. One can reliably assess patient self-report of provider delivery of the full range of brief tobacco interventions. Such assessment and feedback to health plans and providers may increase use of evidence-based brief interventions.
Pilkonis, Paul A; Yu, Lan; Dodds, Nathan E; Johnston, Kelly L; Lawrence, Suzanne M; Hilton, Thomas F; Daley, Dennis C; Patkar, Ashwin A; McCarty, Dennis
2017-08-01
There is a need to monitor patients receiving prescription opioids to detect possible signs of abuse. To address this need, we developed and calibrated an item bank for severity of abuse of prescription pain medication as part of the Patient-Reported Outcomes Measurement Information System (PROMIS ® ). Comprehensive literature searches yielded an initial bank of 5,310 items relevant to substance use and abuse, including abuse of prescription pain medication, from over 80 unique instruments. After qualitative item analysis (i.e., focus groups, cognitive interviewing, expert review, and item revision), 25 items for abuse of prescribed pain medication were included in field testing. Items were written in a first-person, past-tense format, with a three-month time frame and five response options reflecting frequency or severity. The calibration sample included 448 respondents, 367 from the general population (ascertained through an internet panel) and 81 from community treatment programs participating in the National Drug Abuse Treatment Clinical Trials Network. A final bank of 22 items was calibrated using the two-parameter graded response model from item response theory. A seven-item static short form was also developed. The test information curve showed that the PROMIS ® item bank for abuse of prescription pain medication provided substantial information in a broad range of severity. The initial psychometric characteristics of the item bank support its use as a computerized adaptive test or short form, with either version providing a brief, precise, and efficient measure relevant to both clinical and community samples. © 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
The Music Attentiveness Screening Assessment, Revised (MASA-R): A Study of Technical Adequacy.
Waldon, Eric G; Lesser, Alexander; Weeden, Lydia; Messick, Emily
2016-01-01
Evidence suggests that attention is an important consideration when designing procedural support interventions for children undergoing distressing medical procedures. As such, the extent to which children can attend to musical stimuli used during music-based procedural support interventions would seem important. The Music Attentiveness Screening Assessment (MASA) was designed to assess a child's ability to attend to musical stimuli, but further revisions were deemed necessary to improve administration, test-retest reliability, and interobserver agreement for the measure's items. This study investigated the technical adequacy of the Music Attentiveness Screening Assessment, Revised (MASA-R), with a non-clinical sample of children aged 4 to 9 years by examining (a) Construct validity using comparator instruments measuring auditory attention; (b) Test-retest reliability following a two-week delay; and (c) Interobserver agreement when administered by two independent examiners. This non-clinical sample included 69 children who were administered both items from MASA-R and two comparator instruments: the Auditory Attention subtest from the NEPSY-II (NII-AA) for children aged 5 to 9 years (n = 47); and the Auditory Attention subtest from the Woodcock-Johnson Tests of Cognitive Abilities, 3rd ed. (WJIII-AA), for children aged 4 years (n = 22). A significant proportion of score variance was shared by both MASA-R items and the comparator measures: R (2) = .16, F(2, 66) = 6.30, p = .003. MASA-R score estimates with regard to test-retest reliability (Item I, intra-class correlation [ICC] = .88; Item II, ICC = .91) and interobserver agreement (Item I, ICC = .99; Item II, ICC = .98) also fell into acceptable ranges. Estimates of MASA-R score construct validity, test-retest reliability, and interobserver agreement appear improved over its predecessor, MASA. While findings are promising, additional investigation of its use with a clinical sample is needed before it can be confidently used in pediatrics. © the American Music Therapy Association 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Study of Bias in 2012-Placement Test through Rasch Model in Terms of Gender Variable
ERIC Educational Resources Information Center
Turkan, Azmi; Cetin, Bayram
2017-01-01
Validity and reliability are among the most crucial characteristics of a test. One of the steps to make sure that a test is valid and reliable is to examine the bias in test items. The purpose of this study was to examine the bias in 2012 Placement Test items in terms of gender variable using Rasch Model in Turkey. The sample of this study was…
Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann
2015-01-01
Background Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). Objective The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. Methods The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Results Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Conclusions Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES. PMID:26399428
Alber, Julia M; Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann
2015-09-23
Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES.
ERIC Educational Resources Information Center
French, Brian F.; Gotch, Chad M.
2013-01-01
The Brigance Comprehensive Inventory of Basic Skills-II (CIBS-II) is a diagnostic battery intended for children in grades 1st through 6th. The aim of this study was to test for item invariance, or differential item functioning (DIF), of the CIBS-II across sex in the standardization sample through the use of item response theory DIF detection…
Emotional Intelligence in Applicant Selection for Care-Related Academic Programs
ERIC Educational Resources Information Center
Zysberg, Leehu; Levy, Anat; Zisberg, Anna
2011-01-01
Two studies describe the development of the Audiovisual Test of Emotional Intelligence (AVEI), aimed at candidate selection in educational settings. Study I depicts the construction of the test and the preliminary examination of its psychometric properties in a sample of 92 college students. Item analysis allowed the modification of problem items,…
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
ERIC Educational Resources Information Center
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Cross-Cultural Bias Analysis of Cattell Culture-Fair Intelligence Test.
ERIC Educational Resources Information Center
Nenty, H. Johnson
The Cattell Culture Fair Intelligence Test (CCFIT) was administered to a large sample of American, Nigerian, and Indian adolescents, and item data were examined for cultural bias. The CCFIT was designed to measure fluid intelligence, which is not influenced by cultural differences. Four different item analysis techniques were used to determine…
Sund, Terje; Iwarsson, Susanne; Anttila, Heidi; Helle, Tina; Brandt, Ase
2014-07-01
The purpose of this study was to investigate test-retest reliability, agreement, internal consistency, and floor- and ceiling effects of the Danish and Finnish versions of the Satisfaction with the Assistive Technology Services (SATS) instrument among adult users of powered wheelchairs (PWCs) or powered scooters (scooters). Test-retest design, two telephone interviews 7-18 days apart of 40 informants, with mean age of 67.5 (SD 13.09) years in the Danish; and 54 informants with mean age of 55.6 (SD 12.09) years in the Finnish sample. The intra-class correlation coefficient varied between 0.57 and 0.93 for items in the Danish and between 0.41 and 0.93 in the Finnish sample. The percentage agreement varied between 54.2 and 79.5 for items in the Danish and between 69.2 and 81.1 in the Finnish sample, while the Cronbach's alpha values varied between 0.87 and 0.96 in the two samples. A ceiling effect was found in all items of both samples. This study indicates that the SATS may be reliably administered for telephone interviews among adult PWC and scooter users, and give information about aspects of the service delivery process for quality development improvement purposes. Further psychometric testing of the SATS is required.
Visual short-term memory guides infants' visual attention.
Mitsven, Samantha G; Cantrell, Lisa M; Luck, Steven J; Oakes, Lisa M
2018-08-01
Adults' visual attention is guided by the contents of visual short-term memory (VSTM). Here we asked whether 10-month-old infants' (N = 41) visual attention is also guided by the information stored in VSTM. In two experiments, we modified the one-shot change detection task (Oakes, Baumgartner, Barrett, Messenger, & Luck, 2013) to create a simplified cued visual search task to ask how information stored in VSTM influences where infants look. A single sample item (e.g., a colored circle) was presented at fixation for 500 ms, followed by a brief (300 ms) retention interval and then a test array consisting of two items, one on each side of fixation. One item in the test array matched the sample stimulus and the other did not. Infants were more likely to look at the non-matching item than at the matching item, demonstrating that the information stored rapidly in VSTM guided subsequent looking behavior. Copyright © 2018 Elsevier B.V. All rights reserved.
Factorial and Item-Level Invariance of a Principal Perspectives Survey: German and U.S. Principals.
Wang, Chuang; Hancock, Dawson R; Muller, Ulrich
This study examined the factorial and item-level invariance of a survey of principals' job satisfaction and perspectives about reasons and barriers to becoming a principal with a sample of US principals and another sample of German principals. Confirmatory factor analysis (CFA) and differential item functioning (DIF) analysis were employed at the test and item level, respectively. A single group CFA was conducted first, and the model was found to fit the data collected. The factorial invariance between the German and the US principals was tested through three steps: (a) configural invariance; (b) measurement invariance; and (c) structural invariance. The results suggest that the survey is a viable measure of principals' job satisfaction and perspectives about reasons and barriers to becoming a principal because principals from two different cultures shared a similar pattern on all three constructs. The DIF analysis further revealed that 22 out of the 28 items functioned similarly between German and US principals.
Cupani, Marcos; Zamparella, Tatiana Castro; Piumatti, Gisella; Vinculado, Grupo
The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. This study aims to develop a bank of items to measure the level of Knowledge on Biology using the Rasch model. The sample consisted of 1219 participants that studied in different faculties of the National University of Cordoba (mean age = 21.85 years, SD = 4.66; 66.9% are women). The items were organized in different forms and into separate subtests, with some common items across subtests. The students were told they had to answer 60 questions of knowledge on biology. Evaluation of Rasch model fit (Zstd >|2.0|), differential item functioning, dimensionality, local independence, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 180 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. The contribution of this work is significant in the field of educational assessment in Argentina.
A psychometric comparison of three scales and a single-item measure to assess sexual satisfaction.
Mark, Kristen P; Herbenick, Debby; Fortenberry, J Dennis; Sanders, Stephanie; Reece, Michael
2014-01-01
This study was designed to systematically compare and contrast the psychometric properties of three scales developed to measure sexual satisfaction and a single-item measure of sexual satisfaction. The Index of Sexual Satisfaction (ISS), Global Measure of Sexual Satisfaction (GMSEX), and the New Sexual Satisfaction Scale-Short (NSSS-S) were compared to one another and to a single-item measure of sexual satisfaction. Conceptualization of the constructs, distribution of scores, internal consistency, convergent validity, test-retest reliability, and factor structure were compared between the measures. A total of 211 men and 214 women completed the scales and a measure of relationship satisfaction, with 33% (n = 139) of the sample reassessed two months later. All scales demonstrated appropriate distribution of scores and adequate internal consistency. The GMSEX, NSSS-S, and the single-item measure demonstrated convergent validity. Test-retest reliability was demonstrated by the ISS, GMSEX, and NSSS-S, but not the single-item measure. Taken together, the GMSEX received the strongest psychometric support in this sample for a unidimensional measure of sexual satisfaction and the NSSS-S received the strongest psychometric support in this sample for a bidimensional measure of sexual satisfaction.
Zhao, Yue
2017-03-01
In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation. Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen's summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items. The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant. Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.
Validation of the HIV/AIDS Stigma Instrument - PLWA (HASI-P).
Holzemer, William L; Uys, Leana R; Chirwa, Maureen L; Greeff, Minrie; Makoae, Lucia N; Kohi, Thecla W; Dlamini, Priscilla S; Stewart, Anita L; Mullan, Joseph; Phetlhu, René D; Wantland, Dean; Durrheim, Kevin
2007-09-01
This article describes the development and testing of a quantitative measure of HIV/AIDS stigma as experienced by people living with HIV/AIDS. This instrument is designed to measure perceived stigma, create a baseline from which to measure changes in stigma over time, and track potential progress towards reducing stigma. It was developed in three phases from 2003-2006: generating items based on results of focus group discussions; pilot testing and reducing the original list of items; and validating the instrument. Data for all phases were collected from five African countries: Lesotho, Malawi, South Africa, Swaziland and Tanzania. The instrument was validated with a sample of 1,477 persons living with HIV/AIDS from all of the five countries. The sample had a mean age of 36.1 years and 74.1% was female. The participants reported they knew they were HIV positive for an average of 3.4 years and 46% of the sample was taking antiretroviral medications. A six factor solution with 33 items explained 60.72% of the variance. Scale alpha reliabilities were examined and items that did not contribute to scale reliability were dropped. The factors included: Verbal Abuse (8 items, alpha=0.886); Negative Self-Perception (5 items, alpha=0.906); Health Care Neglect (7 items, alpha=0.832); Social Isolation (5 items, alpha=0.890); Fear of Contagion (6 items, alpha=0.795); and Workplace Stigma (2 items, alpha=0.758). This article reports on the development and validation of a new measure of stigma, HIV/AIDS Stigma Instrument - PLWA (HASI-P) providing evidence that supports adequate content and construct validity, modest concurrent validity, and acceptable internal consistency reliability for each of the six subscales and total score. The scale is available is several African languages.
Maples, Jessica L; Guan, Li; Carter, Nathan T; Miller, Joshua D
2014-12-01
There has been a substantial increase in the use of personality assessment measures constructed using items from the International Personality Item Pool (IPIP) such as the 300-item IPIP-NEO (Goldberg, 1999), a representation of the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992). The IPIP-NEO is free to use and can be modified to accommodate its users' needs. Despite the substantial interest in this measure, there is still a dearth of data demonstrating its convergence with the NEO PI-R. The present study represents an investigation of the reliability and validity of scores on the IPIP-NEO. Additionally, we used item response theory (IRT) methodology to create a 120-item version of the IPIP-NEO. Using an undergraduate sample (n = 359), we examined the reliability, as well as the convergent and criterion validity, of scores from the 300-item IPIP-NEO, a previously constructed 120-item version of the IPIP-NEO (Johnson, 2011), and the newly created IRT-based IPIP-120 in comparison to the NEO PI-R across a range of outcomes. Scores from all 3 IPIP measures demonstrated strong reliability and convergence with the NEO PI-R and a high degree of similarity with regard to their correlational profiles across the criterion variables (rICC = .983, .972, and .976, respectively). The replicability of these findings was then tested in a community sample (n = 757), and the results closely mirrored the findings from Sample 1. These results provide support for the use of the IPIP-NEO and both 120-item IPIP-NEO measures as assessment tools for measurement of the five-factor model. (c) 2014 APA, all rights reserved.
Development and preliminary testing of a computerized adaptive assessment of chronic pain.
Anatchkova, Milena D; Saris-Baglama, Renee N; Kosinski, Mark; Bjorner, Jakob B
2009-09-01
The aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain.
Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure
McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.
2013-01-01
Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342
Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina
2015-06-01
This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
A content validated questionnaire for assessment of self reported venous blood sampling practices
2012-01-01
Background Venous blood sampling is a common procedure in health care. It is strictly regulated by national and international guidelines. Deviations from guidelines due to human mistakes can cause patient harm. Validated questionnaires for health care personnel can be used to assess preventable "near misses"--i.e. potential errors and nonconformities during venous blood sampling practices that could transform into adverse events. However, no validated questionnaire that assesses nonconformities in venous blood sampling has previously been presented. The aim was to test a recently developed questionnaire in self reported venous blood sampling practices for validity and reliability. Findings We developed a questionnaire to assess deviations from best practices during venous blood sampling. The questionnaire contained questions about patient identification, test request management, test tube labeling, test tube handling, information search procedures and frequencies of error reporting. For content validity, the questionnaire was confirmed by experts on questionnaires and venous blood sampling. For reliability, test-retest statistics were used on the questionnaire answered twice. The final venous blood sampling questionnaire included 19 questions out of which 9 had in total 34 underlying items. It was found to have content validity. The test-retest analysis demonstrated that the items were generally stable. In total, 82% of the items fulfilled the reliability acceptance criteria. Conclusions The questionnaire could be used for assessment of "near miss" practices that could jeopardize patient safety and gives several benefits instead of assessing rare adverse events only. The higher frequencies of "near miss" practices allows for quantitative analysis of the effect of corrective interventions and to benchmark preanalytical quality not only at the laboratory/hospital level but also at the health care unit/hospital ward. PMID:22260505
A content validated questionnaire for assessment of self reported venous blood sampling practices.
Bölenius, Karin; Brulin, Christine; Grankvist, Kjell; Lindkvist, Marie; Söderberg, Johan
2012-01-19
Venous blood sampling is a common procedure in health care. It is strictly regulated by national and international guidelines. Deviations from guidelines due to human mistakes can cause patient harm. Validated questionnaires for health care personnel can be used to assess preventable "near misses"--i.e. potential errors and nonconformities during venous blood sampling practices that could transform into adverse events. However, no validated questionnaire that assesses nonconformities in venous blood sampling has previously been presented. The aim was to test a recently developed questionnaire in self reported venous blood sampling practices for validity and reliability. We developed a questionnaire to assess deviations from best practices during venous blood sampling. The questionnaire contained questions about patient identification, test request management, test tube labeling, test tube handling, information search procedures and frequencies of error reporting. For content validity, the questionnaire was confirmed by experts on questionnaires and venous blood sampling. For reliability, test-retest statistics were used on the questionnaire answered twice. The final venous blood sampling questionnaire included 19 questions out of which 9 had in total 34 underlying items. It was found to have content validity. The test-retest analysis demonstrated that the items were generally stable. In total, 82% of the items fulfilled the reliability acceptance criteria. The questionnaire could be used for assessment of "near miss" practices that could jeopardize patient safety and gives several benefits instead of assessing rare adverse events only. The higher frequencies of "near miss" practices allows for quantitative analysis of the effect of corrective interventions and to benchmark preanalytical quality not only at the laboratory/hospital level but also at the health care unit/hospital ward.
Using Empirical Data to Set Cutoff Scores.
ERIC Educational Resources Information Center
Hills, John R.
Six experimental approaches to the problems of setting cutoff scores and choosing proper test length are briefly mentioned. Most of these methods share the premise that a test is a random sample of items, from a domain associated with a carefully specified objective. Each item is independent and is scored zero or one, with no provision for…
Linking Outcomes from Peabody Picture Vocabulary Test Forms Using Item Response Models
ERIC Educational Resources Information Center
Hoffman, Lesa; Templin, Jonathan; Rice, Mabel L.
2012-01-01
Purpose: The present work describes how vocabulary ability as assessed by 3 different forms of the Peabody Picture Vocabulary Test (PPVT; Dunn & Dunn, 1997) can be placed on a common latent metric through item response theory (IRT) modeling, by which valid comparisons of ability between samples or over time can then be made. Method: Responses…
Psychological distress in cancer survivors: the further development of an item bank.
Smith, Adam B; Armes, Jo; Richardson, Alison; Stark, Dan P
2013-02-01
Assessment of psychological distress by patient report is necessary to meet patients' needs throughout the cancer journey. We have previously developed an item bank to assess psychological distress but not evaluated it for cancer survivors. Our first aim in this study was to test whether we could extend our item bank to include cancer survivors. The second aim was to examine whether the item bank could assess positive affect as a single construct alongside negative psychological symptoms. Responses from 1315 cancer survivors to the Hospital Anxiety and Depression Scale (HADS) and the Positive and Negative Affect Scale (PANAS) were considered for inclusion in a pre-existing item bank created from a heterogeneous sample of 4914 cancer patients. Differential item functioning (DIF) was used to assess whether HADS responses drawn from the two samples were equivalent. Common-item equating was used to anchor the shared (HADS) items, whilst the PANAS items were added. Item fit was evaluated at each stage, and misfitting items were removed. Unidimensionality was assessed with a principal components factor analysis. The DIF analysis did not reveal any differences between the HADS item locations from the two samples. Three misfitting PANAS items were removed, resulting in a final unidimensional bank of 80 items with good internal reliability (α = 0.85). The new item bank is valid for use across the cancer journey, including cancer survivors, and modestly improves the assessment of all levels of psychological distress and positive psychological function. Copyright © 2011 John Wiley & Sons, Ltd.
Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M
2013-09-01
To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Cross-sectional survey followed by IRT calibration data simulations. Community. Sample of individuals applying for Social Security Administration disability benefits: claimants (n=1015) and a normative comparative sample of U.S. adults (n=1000). None. SSA-BH measurement instrument. IRT analyses supported the unidimensionality of 4 SSA-BH scales: mood and emotions (35 items), self-efficacy (23 items), social interactions (6 items), and behavioral control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10-item computer adaptive tests with the full item bank indicated robust ability of the computer adaptive testing approach to comprehensively characterize behavioral health function along 4 distinct dimensions. Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all 4 scales. Behavioral function profiles of Social Security Administration claimants were generated and compared with age- and sex-matched norms along 4 scales: mood and emotions, behavioral control, social interactions, and self-efficacy. Using the computer adaptive test-based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the Social Security Administration's work disability programs. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Examination of the PROMIS upper extremity item bank.
Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R
Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Waller, Niels G; Feuerstahler, Leah
2017-01-01
In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).
What Is Learned when Concept Learning Fails?--A Theory of Restricted-Domain Relational Learning
ERIC Educational Resources Information Center
Wright, Anthony A.; Lickteig, Mark T.
2010-01-01
Two matching-to-sample (MTS) and four same/different (S/D) experiments employed tests to distinguish between item-specific learning and relational learning. One MTS experiment showed item-specific learning when concept learning failed (i.e., no novel-stimulus transfer). Another MTS experiment showed item-specific learning when pigeons'…
Reading Achievements of Vietnamese Grade 5 Pupils
ERIC Educational Resources Information Center
Griffin, Patrick; Thanh, Mai Thi
2006-01-01
This article described a national study in Vietnam whereby a probability sample of students was chosen from each of the 61 provinces. A reading test consisting of 60 items was administered. The items were matched to the Vietnam reading and language curriculum for Year 5 students. Using a skills audit of the items, a variable of reading development…
International Space Station (ISS) 3D Printer Performance and Material Characterization Methodology
NASA Technical Reports Server (NTRS)
Bean, Q. A.; Cooper, K. G.; Edmunson, J. E.; Johnston, M. M.; Werkheiser, M. J.
2015-01-01
In order for human exploration of the Solar System to be sustainable, manufacturing of necessary items on-demand in space or on planetary surfaces will be a requirement. As a first step towards this goal, the 3D Printing In Zero-G (3D Print) technology demonstration made the first items fabricated in space on the International Space Station. From those items, and comparable prints made on the ground, information about the microgravity effects on the printing process can be determined. Lessons learned from this technology demonstration will be applicable to other in-space manufacturing technologies, and may affect the terrestrial manufacturing industry as well. The flight samples were received at the George C. Marshall Space Flight Center on 6 April 2015. These samples will undergo a series of tests designed to not only thoroughly characterize the samples, but to identify microgravity effects manifested during printing by comparing their results to those of samples printed on the ground. Samples will be visually inspected, photographed, scanned with structured light, and analyzed with scanning electron microscopy. Selected samples will be analyzed with computed tomography; some will be assessed using ASTM standard tests. These tests will provide the information required to determine the effects of microgravity on 3D printing in microgravity.
Item-nonspecific proactive interference in monkeys' auditory short-term memory.
Bigelow, James; Poremba, Amy
2015-09-01
Recent studies using the delayed matching-to-sample (DMS) paradigm indicate that monkeys' auditory short-term memory (STM) is susceptible to proactive interference (PI). During the task, subjects must indicate whether sample and test sounds separated by a retention interval are identical (match) or not (nonmatch). If a nonmatching test stimulus also occurred on a previous trial, monkeys are more likely to incorrectly make a "match" response (item-specific PI). However, it is not known whether PI may be caused by sounds presented on prior trials that are similar, but nonidentical to the current test stimulus (item-nonspecific PI). This possibility was investigated in two experiments. In Experiment 1, memoranda for each trial comprised tones with a wide range of frequencies, thus minimizing item-specific PI and producing a range of frequency differences among nonidentical tones. In Experiment 2, memoranda were drawn from a set of eight artificial sounds that differed from each other by one, two, or three acoustic dimensions (frequency, spectral bandwidth, and temporal dynamics). Results from both experiments indicate that subjects committed more errors when previously-presented sounds were acoustically similar (though not identical) to the test stimulus of the current trial. Significant effects were produced only by stimuli from the immediately previous trial, suggesting that item-nonspecific PI is less perseverant than item-specific PI, which can extend across noncontiguous trials. Our results contribute to existing human and animal STM literature reporting item-nonspecific PI caused by perceptual similarity among memoranda. Together, these observations underscore the significance of both temporal and discriminability factors in monkeys' STM. Copyright © 2015 Elsevier B.V. All rights reserved.
Validation of the Cross-Cultural Alcoholism Screening Test (CCAST).
Gorenc, K D; Peredo, S; Pacurucu, S; Llanos, R; Vincente, B; López, R; Abreu, L F; Paez, E
1999-01-01
When screening instruments that are used in the assessment and diagnosis of alcoholism of individuals from different ethnicities, some cultural variables based on norms and societal acceptance of drinking behavior can play an important role in determining the outcome. The accepted diagnostic criteria of current market testing are based on Western standards. In this study, the Munich Alcoholism Test (31 items) was the base instrument applied to subjects from several Hispanic-American countries (Bolivia, Chile, Ecuador, Mexico, and Peru). After the sample was submitted to several statistical procedures, these 31 items were reduced to a culture-free, 31-item test named the Cross-Cultural Alcohol Screening Test (CCAST). The results of this Hispanic-American sample (n = 2,107) empirically demonstrated that CCAST measures alcoholism with an adequate degree of accuracy when compared to other available cross-cultural tests. CCAST is useful in the diagnosis of alcoholism in Spanish-speaking immigrants living in countries where English is spoken. CCAST can be used in general hospitals, psychiatric wards, emergency services and police stations. The test can be useful for other professionals, such as psychological consultants, researchers, and those conducting expertise appraisal.
MICROBIAL CONTAMINATION OF STREET VENDED FOODS FROM A UNIVERSITY CAMPUS IN BANGLADESH.
Islam, Sufia; Nasrin, Nishat; Rizwan, Farhana; Nahar, Lutfun; Bhowmik, Adity; Esha, Sayma Afrin; Talukder, Kaisar Ali; Akter, Mahmuda; Roy, Ajoy; Ahmed, Muniruddin
2015-05-01
The microbiological quality of street vended food samples from Dhaka, Bangladesh was evaluated. The objective of the study was to identify the presence of common pathogens (Escherichia coli, Shigella spp, Salmonella and Vibrio spp) and to describe the molecular characterization of E coli, a commonly found pathogen in various street foods. Fifty food samples were collected from fixed and mobile vendors from two sampling locations (Mohakhali and Aftabnagar) in Dhaka city, Bangladesh. The tested samples included deep fried and fried snacks; quick lunch items; pickles; fruit chutney; baked items; spicy, sour and hot snacks etc: Juices, tamarind water and plain drinking water were also tested. Sterile polythene bags were used for collecting 200 g of each category of samples. They were tested for the presence of microorganisms following conventional microbiological processes. Biochemical tests followed by serology were done for the confirmation of Shigella and Salmonella. Serological reaction was carried out for confirmation of Vibrio spp. DNA was isolated for the molecular characterization to detect the pathogenic E. coli by polymerase chain reaction (PCR). Out of 50 food samples, six (12%) were confirmed to contain different species of E. coli and Shigella. Molecular characterization of E. coli revealed that three samples were contaminated with enteroaggregative E. coli (EAEC) and one was contaminated with enterotoxigenic E. coli (ETEC). Shigellaflexneri X variant was detected in one food item and Shigella flexneri 2a was found in drinking water. All these enteric pathogens could be the potential cause for foodborne illnesses.
Design Constructibility Reviews.
1987-01-01
specifications for base and sub-base courses, and wearing course.I Item 21 - Has provision been made in the specifications for positive control of the temperature...of the bituminous material? Item 22 - Test results on samples of asphalt , aggregate, sand and mix should be obtained from the plant prior to placing...in the * drawings. Item 3 - Make sure that stud types, sizes and pacinqs are spelled out in the plans and sc-i catr-. Item 4 - All welders that will
Garcia-Campayo, Javier; Navarro-Gil, Mayte; Andrés, Eva; Montero-Marin, Jesús; López-Artal, Lorena; Demarzo, Marcelo Marcos Piva
2014-01-10
Self-compassion is a key psychological construct for assessing clinical outcomes in mindfulness-based interventions. The aim of this study was to validate the Spanish versions of the long (26 item) and short (12 item) forms of the Self-Compassion Scale (SCS). The translated Spanish versions of both subscales were administered to two independent samples: Sample 1 was comprised of university students (n = 268) who were recruited to validate the long form, and Sample 2 was comprised of Aragon Health Service workers (n = 271) who were recruited to validate the short form. In addition to SCS, the Mindful Attention Awareness Scale (MAAS), the State-Trait Anxiety Inventory-Trait (STAI-T), the Beck Depression Inventory (BDI) and the Perceived Stress Questionnaire (PSQ) were administered. Construct validity, internal consistency, test-retest reliability and convergent validity were tested. The Confirmatory Factor Analysis (CFA) of the long and short forms of the SCS confirmed the original six-factor model in both scales, showing goodness of fit. Cronbach's α for the 26 item SCS was 0.87 (95% CI = 0.85-0.90) and ranged between 0.72 and 0.79 for the 6 subscales. Cronbach's α for the 12-item SCS was 0.85 (95% CI = 0.81-0.88) and ranged between 0.71 and 0.77 for the 6 subscales. The long (26-item) form of the SCS showed a test-retest coefficient of 0.92 (95% CI = 0.89-0.94). The Intraclass Correlation (ICC) for the 6 subscales ranged from 0.84 to 0.93. The short (12-item) form of the SCS showed a test-retest coefficient of 0.89 (95% CI: 0.87-0.93). The ICC for the 6 subscales ranged from 0.79 to 0.91. The long and short forms of the SCS exhibited a significant negative correlation with the BDI, the STAI and the PSQ, and a significant positive correlation with the MAAS. The correlation between the total score of the long and short SCS form was r = 0.92. The Spanish versions of the long (26-item) and short (12-item) forms of the SCS are valid and reliable instruments for the evaluation of self-compassion among the general population. These results substantiate the use of this scale in research and clinical practice.
Garcia, Sofia F.; Hahn, Elizabeth A.; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W.
2014-01-01
Objective To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. Design A mixed methods approach included a literature review; item classification, selection and writing; cognitive interviews and field testing with participants with spinal cord injury (SCI), traumatic brain injury (TBI) or stroke; and rating scale analysis to evaluate initial psychometric properties. Setting General community. Participants Nine individuals with SCI, TBI or stroke participated in cognitive interviews; 305 community residents with those same conditions participated in field testing. Interventions None. Main Outcome Measure(s) Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. Results An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing 82 items. Field test data indicated that the pool satisfies a one-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Conclusions Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample in order to develop a social attitudes item bank for persons with disabilities. PMID:25045803
Garcia, Sofia F; Hahn, Elizabeth A; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W
2015-04-01
To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. A mixed-methods approach included a literature review; item classification, selection, and writing; cognitive interviews and field testing of participants with spinal cord injury (SCI), traumatic brain injury (TBI), or stroke; and rating scale analysis to evaluate initial psychometric properties. General community. Individuals with SCI, TBI, or stroke participated in cognitive interviews (n=9); community residents with those same conditions participated in field testing (n=305). None. Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing of 82 items. Field test data indicated that the pool satisfies a 1-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample to develop a social attitudes item bank for persons with disabilities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Smolen, Tomasz; Chuderski, Adam
2015-01-01
Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
Wood, David L; Sawicki, Gregory S; Miller, M David; Smotherman, Carmen; Lukens-Bull, Katryne; Livingood, William C; Ferris, Maria; Kraemer, Dale F
2014-01-01
National consensus statements recommend that providers regularly assess the transition readiness skills of adolescent and young adults (AYA). In 2010 we developed a 29-item version of Transition Readiness Assessment Questionnaire (TRAQ). We reevaluated item performance and factor structure, and reassessed the TRAQ's reliability and validity. We surveyed youth from 3 academic clinics in Jacksonville, Florida; Chapel Hill, North Carolina; and Boston, Massachusetts. Participants were AYA with special health care needs aged 14 to 21 years. From a convenience sample of 306 patients, we conducted item reduction strategies and exploratory factor analysis (EFA). On a second convenience sample of 221 patients, we conducted confirmatory factor analysis (CFA). Internal reliability was assessed by Cronbach's alpha and criterion validity. Analyses were conducted by the Wilcoxon rank sum test and mixed linear models. The item reduction and EFA resulted in a 20-item scale with 5 identified subscales. The CFA conducted on a second sample provided a good fit to the data. The overall scale has high reliability overall (Cronbach's alpha = .94) and good reliability for 4 of the 5 subscales (Cronbach's alpha ranging from .90 to .77 in the pooled sample). Each of the 5 subscale scores were significantly higher for adolescents aged 18 years and older versus those younger than 18 (P < .0001) in both univariate and multivariate analyses. The 20-item, 5-factor structure for the TRAQ is supported by EFA and CFA on independent samples and has good internal reliability and criterion validity. Additional work is needed to expand or revise the TRAQ subscales and test their predictive validity. Copyright © 2014 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong
2018-06-01
Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Validation of a clinical critical thinking skills test in nursing.
Shin, Sujin; Jung, Dukyoo; Kim, Sungeun
2015-01-27
The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing
2015-01-01
Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
Visual search by chimpanzees (Pan): assessment of controlling relations.
Tomonaga, M
1995-01-01
Three experimentally sophisticated chimpanzees (Pan), Akira, Chloe, and Ai, were trained on visual search performance using a modified multiple-alternative matching-to-sample task in which a sample stimulus was followed by the search display containing one target identical to the sample and several uniform distractors (i.e., negative comparison stimuli were identical to each other). After they acquired this task, they were tested for transfer of visual search performance to trials in which the sample was not followed by the uniform search display (odd-item search). Akira showed positive transfer of visual search performance to odd-item search even when the display size (the number of stimulus items in the search display) was small, whereas Chloe and Ai showed a transfer only when the display size was large. Chloe and Ai used some nonrelational cues such as perceptual isolation of the target among uniform distractors (so-called pop-out). In addition to the odd-item search test, various types of probe trials were presented to clarify the controlling relations in multiple-alternative matching to sample. Akira showed a decrement of accuracy as a function of the display size when the search display was nonuniform (i.e., each "distractor" stimulus was not the same), whereas Chloe and Ai showed perfect performance. Furthermore, when the sample was identical to the uniform distractors in the search display, Chloe and Ai never selected an odd-item target, but Akira selected it when the display size was large. These results indicated that Akira's behavior was controlled mainly by relational cues of target-distractor oddity, whereas an identity relation between the sample and the target strongly controlled the performance of Chloe and Ai. PMID:7714449
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This booklet contains sample items from the Missouri social studies test for eighth graders. The first sample is based on a speech delivered by Elizabeth Cady Stanton in the mid-1880s, which proposed a new approach to raising girls. Students are directed to use their own knowledge and the speech excerpt to do three activities. The second sample…
Validity of Computer Adaptive Tests of Daily Routines for Youth with Spinal Cord Injury
Haley, Stephen M.
2013-01-01
Objective: To evaluate the accuracy of computer adaptive tests (CATs) of daily routines for child- and parent-reported outcomes following pediatric spinal cord injury (SCI) and to evaluate the validity of the scales. Methods: One hundred ninety-six daily routine items were administered to 381 youths and 322 parents. Pearson correlations, intraclass correlation coefficients (ICC), and 95% confidence intervals (CI) were calculated to evaluate the accuracy of simulated 5-item, 10-item, and 15-item CATs against the full-item banks and to evaluate concurrent validity. Independent samples t tests and analysis of variance were used to evaluate the ability of the daily routine scales to discriminate between children with tetraplegia and paraplegia and among 5 motor groups. Results: ICC and 95% CI demonstrated that simulated 5-, 10-, and 15-item CATs accurately represented the full-item banks for both child- and parent-report scales. The daily routine scales demonstrated discriminative validity, except between 2 motor groups of children with paraplegia. Concurrent validity of the daily routine scales was demonstrated through significant relationships with the FIM scores. Conclusion: Child- and parent-reported outcomes of daily routines can be obtained using CATs with the same relative precision of a full-item bank. Five-item, 10-item, and 15-item CATs have discriminative and concurrent validity. PMID:23671380
Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias
2018-04-10
To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
ERIC Educational Resources Information Center
Tsutakawa, Robert K.
This paper presents a method for estimating certain characteristics of test items which are designed to measure ability, or knowledge, in a particular area. Under the assumption that ability parameters are sampled from a normal distribution, the EM algorithm is used to derive maximum likelihood estimates to item parameters of the two-parameter…
The EORTC CAT Core-The computer adaptive version of the EORTC QLQ-C30 questionnaire.
Petersen, Morten Aa; Aaronson, Neil K; Arraras, Juan I; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Dirven, Linda; Fayers, Peter; Gamper, Eva-Maria; Giesinger, Johannes M; Habets, Esther J J; Hammerlid, Eva; Helbostad, Jorunn; Hjermstad, Marianne J; Holzner, Bernhard; Johnson, Colin; Kemmler, Georg; King, Madeleine T; Kaasa, Stein; Loge, Jon H; Reijneveld, Jaap C; Singer, Susanne; Taphoorn, Martin J B; Thamsborg, Lise H; Tomaszewski, Krzysztof A; Velikova, Galina; Verdonck-de Leeuw, Irma M; Young, Teresa; Groenvold, Mogens
2018-06-21
To optimise measurement precision, relevance to patients and flexibility, patient-reported outcome measures (PROMs) should ideally be adapted to the individual patient/study while retaining direct comparability of scores across patients/studies. This is achievable using item banks and computerised adaptive tests (CATs). The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Core 30 (QLQ-C30) is one of the most widely used PROMs in cancer research and clinical practice. Here we provide an overview of the research program to develop CAT versions of the QLQ-C30's 14 functional and symptom domains. The EORTC Quality of Life Group's strategy for developing CAT item banks consists of: literature search to identify potential candidate items; formulation of new items compatible with the QLQ-C30 item style; expert evaluations and patient interviews; field-testing and psychometric analyses, including factor analysis, item response theory calibration and simulation of measurement properties. In addition, software for setting up, running and scoring CAT has been developed. Across eight rounds of data collections, 9782 patients were recruited from 12 countries for the field-testing. The four phases of development resulted in a total of 260 unique items across the 14 domains. Each item bank consists of 7-34 items. Psychometric evaluations indicated higher measurement precision and increased statistical power of the CAT measures compared to the QLQ-C30 scales. Using CAT, sample size requirements may be reduced by approximately 20-35% on average without loss of power. The EORTC CAT Core represents a more precise, powerful and flexible measurement system than the QLQ-C30. It is currently being validated in a large independent, international sample of cancer patients. Copyright © 2018 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
de la Torre, Jimmy; Lee, Young-Sun
2013-01-01
This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…
McDonough, Christine M.; Jette, Alan M.; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M.; Rasch, Elizabeth K.
2014-01-01
Objectives To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Design Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. Setting In-person and semi-structured interviews; internet and telephone surveys. Participants A sample of 1,017 SSA claimants, and a normative sample of 999 adults from the US general population. Interventions Not Applicable. Main Outcome Measure Model fit statistics Results The final item pool consisted of 139 items. Within the claimant sample 58.7% were white; 31.8% were black; 46.6% were female; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution which included more items and allowed separate characterization of: 1) Changing and Maintaining Body Position, 2) Whole Body Mobility, 3) Upper Body Function and 4) Upper Extremity Fine Motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples respectively were: Comparative Fit Index = 0.93 and 0.98; Tucker-Lewis Index = 0.92 and 0.98; Root Mean Square Error Approximation = 0.05 and 0.04. Conclusions The factor structure of the Physical Function item pool closely resembled the hypothesized content model. The four scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. PMID:23542402
McDonough, Christine M; Jette, Alan M; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M; Rasch, Elizabeth K
2013-09-01
To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. In-person and semistructured interviews and Internet and telephone surveys. Sample of SSA claimants (n=1017) and a normative sample of adults from the U.S. general population (n=999). Not applicable. Model fit statistics. The final item pool consisted of 139 items. Within the claimant sample, 58.7% were white; 31.8% were black; 46.6% were women; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution, which included more items and allowed separate characterization of: (1) changing and maintaining body position, (2) whole body mobility, (3) upper body function, and (4) upper extremity fine motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples, respectively, were: Comparative Fit Index=.93 and .98; Tucker-Lewis Index=.92 and .98; and root mean square error approximation=.05 and .04. The factor structure of the physical function item pool closely resembled the hypothesized content model. The 4 scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Gabriel, Adel; Violato, Claudio
2009-01-01
Background To develop and psychometrically assess a multiple choice question (MCQ) instrument to test knowledge of depression and its treatments in patients suffering from depression. Methods A total of 63 depressed patients and twelve psychiatric experts participated. Based on empirical evidence from an extensive review, theoretical knowledge and in consultations with experts, 27-item MCQ knowledge of depression and its treatment test was constructed. Data collected from the psychiatry experts were used to assess evidence of content validity for the instrument. Results Cronbach's alpha of the instrument was 0.68, and there was an overall 87.8% agreement (items are highly relevant) between experts about the relevance of the MCQs to test patient knowledge on depression and its treatments. There was an overall satisfactory patients' performance on the MCQs with 78.7% correct answers. Results of an item analysis indicated that most items had adequate difficulties and discriminations. Conclusion There was adequate reliability and evidence for content and convergent validity for the instrument. Future research should employ a lager and more heterogeneous sample from both psychiatrist and community samples, than did the present study. Meanwhile, the present study has resulted in psychometrically tested instruments for measuring knowledge of depression and its treatment of depressed patients. PMID:19754944
Crows spontaneously exhibit analogical reasoning.
Smirnova, Anna; Zorina, Zoya; Obozova, Tanya; Wasserman, Edward
2015-01-19
Analogical reasoning is vital to advanced cognition and behavioral adaptation. Many theorists deem analogical thinking to be uniquely human and to be foundational to categorization, creative problem solving, and scientific discovery. Comparative psychologists have long been interested in the species generality of analogical reasoning, but they initially found it difficult to obtain empirical support for such thinking in nonhuman animals (for pioneering efforts, see [2, 3]). Researchers have since mustered considerable evidence and argument that relational matching-to-sample (RMTS) effectively captures the essence of analogy, in which the relevant logical arguments are presented visually. In RMTS, choice of test pair BB would be correct if the sample pair were AA, whereas choice of test pair EF would be correct if the sample pair were CD. Critically, no items in the correct test pair physically match items in the sample pair, thus demanding that only relational sameness or differentness is available to support accurate choice responding. Initial evidence suggested that only humans and apes can successfully learn RMTS with pairs of sample and test items; however, monkeys have subsequently done so. Here, we report that crows too exhibit relational matching behavior. Even more importantly, crows spontaneously display relational responding without ever having been trained on RMTS; they had only been trained on identity matching-to-sample (IMTS). Such robust and uninstructed relational matching behavior represents the most convincing evidence yet of analogical reasoning in a nonprimate species, as apes alone have spontaneously exhibited RMTS behavior after only IMTS training. Copyright © 2015 Elsevier Ltd. All rights reserved.
Looman, Wendy Sue; Farrag, Shewikar
2009-01-01
Social capital, defined as an investment in relationships that facilitates the exchange of resources, has been identified as a possible protective factor for child health in the context of risk factors such as poverty. Reliable and valid measures of social capital are needed for research and practice, particularly in non-English-speaking populations in developing countries. To evaluate the psychometric properties and cross-cultural equivalence of the Arabic translation of the Social Capital Scale (SCS). Descriptive, cross-sectional study for psychometric testing of a translated tool. Two metropolitan health clinics in Alexandria, Egypt. A convenience sample of 117 Egyptian parents of children with chronic conditions. To be eligible to participate, respondents had to be a parent of child with a chronic health condition between the ages of 1 and 18 years. The sample included primarily biological parents between the ages of 20 and 56 years. The 20-item Arabic SCS was administered as part of a written survey that included additional measures on demographic information and parent ratings of the child's overall health. Six items were ultimately removed based on item analysis, and exploratory factor analysis was conducted on the resulting 14-item scale. As a measure of construct validity, hypothesis testing was conducted using an independent samples t-test to determine whether a significant difference exists between mean total social capital scores for two groups of respondents based on the parental rating of the child's overall health. Item and factor analysis yielded preliminary support for a revised, 14-item Arabic SCS with four internally consistent factors. The standardized item alpha reliability coefficient for the total 14-item scale was .75. Respondents who reported that their child was in good health had significantly higher social capital scores than those who rated their child's health as poor. The 14-item Arabic SCS was found to be reliable and valid in this sample, with four internally consistent factors. While the tool may not be appropriate for comparing social capital between cultural groups, it will enable clinicians and researchers to address an important gap in knowledge characterized by a paucity of research on childhood chronic illness in low- and middle-income countries such as Egypt.
Persson, Lars-Olof; Erichsen, Magdalena; Wändell, Per; Gåfvels, Catharina
2013-10-01
The study examines internal item/scale structure and concurrent validity of a newly developed 48-item questionnaire [General Coping Questionnaire (GCQ)] that measures 10 aspects of coping with chronic illness (self-trust, problem-reducing actions, change of values, social trust, minimization, fatalism, resignation, protest, isolation and intrusion). The tests were performed in two independent samples of persons with diabetes mellitus. The first sample consisted of 119 subjects with type I diabetes and the second sample of 184 subjects with type II diabetes. Concurrent validity was examined by comparisons with measures of health-related quality of life (SF-36), a measure of metabolic control (HbA1c) and incidence of diabetic complications. The item/scale structure was found to be similar and very good in both samples. The 10 dimensions correlated as expected with the measure of mental health, although the 'negative' dimensions of the GCQ correlated higher compared with the 'positive' dimensions. Weaker relations with metabolic control were also found in one of the samples. These tests provide further evidence that GCQ is a well-structured, relevant and reliable instrument for assessing coping reactions in chronic somatic conditions. Copyright © 2012 John Wiley & Sons, Ltd.
Forkmann, Thomas; Boecker, Maren; Norra, Christine; Eberle, Nicole; Kircher, Tilo; Schauerte, Patrick; Mischke, Karl; Westhofen, Martin; Gauggel, Siegfried; Wirtz, Markus
2009-05-01
The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities
Hong, Ickpyo; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L.; Shulman, Lisa M.
2017-01-01
Purpose The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. Methods The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R2 less than 10 %). Results The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59–0.85) and acceptable internal consistency (Cronbach’s alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. Conclusion The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms. PMID:27048495
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities.
Hong, Ickpyo; Velozo, Craig A; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L; Shulman, Lisa M
2016-09-01
The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R (2) less than 10 %). The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59-0.85) and acceptable internal consistency (Cronbach's alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms.
Planning a Study for Testing the Rasch Model given Missing Values due to the use of Test-booklets.
Yanagida, Takuya; Kubinger, Klaus D; Rasch, Dieter
2015-01-01
Though calibration of an achievement test within a psychological and educational context is very often carried out by the Rasch model, data sampling is hardly designed according to statistical foundations. However, Kubinger, Rasch, and Yanagida (2009, 2011) suggested an approach for the determination of sample size according to a given Type-I and Type-II risk and a certain effect of model contradiction when testing the Rasch model. The approach uses a three-way analysis of variance design with mixed classification. For the while, their simulation studies deal with complete data, meaning every examinee is administered with all of the items of an item pool. The simulation study now presented in this paper deals with the practical relevant case, in particular for large-scale assessments, that item presentation happens to use several test-booklets. As a consequence, there are missing values by design. Therefore, the question to be considered is, whether this approach works in this case as well. Besides the fact, that data are not normally distributed but there is a dichotomous variable (an examinee either solves an item or fails to solve it), only a single entry for each cell exists in the given three-way analysis of variance design, if at all, due to missing values. Hence, the obligatory test-statistic's distribution may not be retained, in contrast to the case of having no missing values. The result of our simulation study, despite applying only to a very special scenario, is that this approach works, indeed: Whether test-booklets were used or every examinee is administered all of the items changes nothing in respect to the actual Type-I risk or to the power of the test, given almost the same amount of information of examinees per item. However, as the results are limited to a special scenario, we currently recommend any interested researcher to simulate the appropriate one in advance by him/herself.
Psychometric properties of the communication Confidence Rating Scale for Aphasia (CCRSA): phase 1.
Cherney, Leora R; Babbitt, Edna M; Semik, Patrick; Heinemann, Allen W
2011-01-01
Confidence is a construct that has not been explored previously in aphasia research. We developed the Communication Confidence Rating Scale for Aphasia (CCRSA) to assess confidence in communicating in a variety of activities and evaluated its psychometric properties using rating scale (Rasch) analysis. The CCRSA was administered to 21 individuals with aphasia before and after participation in a computer-based language therapy study. Person reliability of the 8-item CCRSA was .77. The 5-category rating scale demonstrated monotonic increases in average measures from low to high ratings. However, one item ("I follow news, sports, stories on TV/movies") misfit the construct defined by the other items (mean square infit = 1.69, item-measure correlation = .41). Deleting this item improved reliability to .79; the 7 remaining items demonstrated excellent fit to the underlying construct, although there was a modest ceiling effect in this sample. Pre- to posttreatment changes on the 7-item CCRSA measure were statistically significant using a paired samples t test. Findings support the reliability and sensitivity of the CCRSA in assessing participants' self-report of communication confidence. Further evaluation of communication confidence is required with larger and more diverse samples.
ERIC Educational Resources Information Center
Carvajal, Jorge; Skorupski, William P.
2010-01-01
This study is an evaluation of the behavior of the Liu-Agresti estimator of the cumulative common odds ratio when identifying differential item functioning (DIF) with polytomously scored test items using small samples. The Liu-Agresti estimator has been proposed by Penfield and Algina as a promising approach for the study of polytomous DIF but no…
Barnett, Lisa M; Ridgers, Nicola D; Zask, Avigdor; Salmon, Jo
2015-01-01
To determine reliability and face validity of an instrument to assess young children's perceived fundamental movement skill competence. Validation and reliability study. A pictorial instrument based on the Test Gross Motor Development-2 assessed perceived locomotor (six skills) and object control (six skills) competence using the format and item structure from the physical competence subscale of the Pictorial Scale of Perceived Competence and Acceptance for Young Children. Sample 1 completed object control items in May (n=32) and locomotor items in October 2012 (n=23) at two time points seven days apart. Children were asked at the end of the test-retest their understanding of what was happening in each picture to determine face validity. Sample 2 (n=58) completed 12 items in November 2012 on a single occasion to test internal reliability only. Sample 1 children were aged 5-7 years (M=6.0, SD=0.8) at object control assessment and 5-8 years at locomotor assessment (M=6.5, SD=0.9). Sample 2 children were aged 6-8 years (M=7.2, SD=0.73). Intra-class correlations assessed in Sample 1 children were excellent for object control (intra-class correlation=0.78), locomotor (intra-class correlation=0.82) and all 12 skills (intra-class correlations=0.83). Face validity was acceptable. Internal consistency was adequate in both samples for each subscale and all 12 skills (alpha range 0.60-0.81). This study has provided preliminary evidence for instrument reliability and face validity. This enables future alignment between the measurement of perceived and actual fundamental movement skill competence in young children. Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Romero, Dulce; Ricarte, Jorge J.; Serrano, Juan P.; Nieto, Marta; Latorre, Jose M.
2018-01-01
The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used. PMID:29672583
Ros, Laura; Romero, Dulce; Ricarte, Jorge J; Serrano, Juan P; Nieto, Marta; Latorre, Jose M
2018-01-01
The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used.
Development of a Work Sample Criterion for General Vehicle Mechanic.
ERIC Educational Resources Information Center
Engel, John D.
A work sample criterion test was developed for General Vehicle Repairman, MOS 63C30 and 63C40. Test items covered three task categories: troubleshooting, corrective action, and preventive maintenance. Thirty-eight organizational mechanics were tested at Fort Knox, Kentucky. Data were also collected on the quality of performance, for example, use…
The Utility of IRT in Small-Sample Testing Applications.
ERIC Educational Resources Information Center
Sireci, Stephen G.
The utility of modified item response theory (IRT) models in small sample testing applications was studied. The modified IRT models were modifications of the one- and two-parameter logistic models. One-, two-, and three-parameter models were also studied. Test data were from 4 years of a national certification examination for persons desiring…
American College Student Values: Their Relationship to Selected Personal and Academic Variables.
ERIC Educational Resources Information Center
Ritter, Carolyn E.
A 20-item chi-square test of independence was administered to a selected sample of college students that was stratified 50% male and 50% female. Male and female responses showed a significant difference on 18 of the 20 items. The 2 items on which attitudes of both sexes were the same were the role of government in business and a solution to the…
Hobart, J; Thompson, A
2001-01-01
OBJECTIVES—Routine data collection is now considered mandatory. Therefore, staff rated clinical scales that consist of multiple items should have the minimum number of items necessary for rigorous measurement. This study explores the possibility of developing a short form Barthel index, suitable for use in clinical trials, epidemiological studies, and audit, that satisfies criteria for rigorous measurement and is psychometrically equivalent to the 10 item instrument. METHODS—Data were analysed from 844 consecutive admissions to a neurological rehabilitation unit in London. Random half samples were generated. Short forms were developed in one sample (n=419), by selecting items with the best measurement properties, and tested in the other (n=418). For each of the 10 items of the BI, item total correlations and effect sizes were computed and rank ordered. The best items were defined as those with the lowest cross product of these rank orderings. The acceptability, reliability, validity, and responsiveness of three short form BIs (five, four, and three item) were determined and compared with the 10 item BI. Agreement between scores generated by short forms and 10 item BI was determined using intraclass correlation coefficients and the method of Bland and Altman. RESULTS—The five best items in this sample were transfers, bathing, toilet use, stairs, and mobility. Of the three short forms examined, the five item BI had the best measurement properties and was psychometrically equivalent to the 10 item BI. Agreement between scores generated by the two measures for individual patients was excellent (ICC=0.90) but not identical (limits of agreement=1.84±3.84). CONCLUSIONS—The five item short form BI may be a suitable outcome measure for group comparison studies in comparable samples. Further evaluations are needed. Results demonstrate a fundamental difference between assessment and measurement and the importance of incorporating psychometric methods in the development and evaluation of health measures. PMID:11459898
Scale development for measuring and predicting adolescents' leisure time physical activity behavior.
Ries, Francis; Romero Granados, Santiago; Arribas Galarraga, Silvia
2009-01-01
The aim of this study was to develop a scale for assessing and predicting adolescents' physical activity behavior in Spain and Luxembourg using the Theory of Planned Behavior as a framework. The sample was comprised of 613 Spanish (boys = 309, girls = 304; M age =15.28, SD =1.127) and 752 Luxembourgish adolescents (boys = 343, girls = 409; M age = 14.92, SD = 1.198), selected from students of two secondary schools in both countries, with a similar socio-economic status. The initial 43-items were all scored on a 4-point response format using the structured alternative format and translated into Spanish, French and German. In order to ensure the accuracy of the translation, standardized parallel back-translation techniques were employed. Following two pilot tests and subsequent revisions, a second order exploratory factor analysis with oblimin direct rotation was used for factor extraction. Internal consistency and test-retest reliabilities were also tested. The 4-week test-retest correlations confirmed the items' time stability. The same five factors were obtained, explaining 63.76% and 63.64% of the total variance in both samples. Internal consistency for the five factors ranged from α = 0.759 to α = 0. 949 in the Spanish sample and from α = 0.735 to α = 0.952 in the Luxembourgish sample. For both samples, inter-factor correlations were all reported significant and positive, except for Factor 5 where they were significant but negative. The high internal consistency of the subscales, the reported item test-retest reliabilities and the identical factor structure confirm the adequacy of the elaborated questionnaire for assessing the TPB-based constructs when used with a population of adolescents in Spain and Luxembourg. The results give some indication that they may have value in measuring the hypothesized TPB constructs for PA behavior in a cross-cultural context. Key pointsWhen using the structured alternative format, weak internal consistency was obtained. Rephrasing the items and scoring items on a Likert-type scale enhanced greatly the subscales reliability.Identical factorial structure was extracted for both culturally different samples.The obtained factors, namely perceived physical competence, parents' physical activity, perceived resources support, attitude toward physical activity and perceived parental support were hypothesized as for the original TPB constructs.
Validity of the Eating Attitude Test among Exercisers.
Lane, Helen J; Lane, Andrew M; Matheson, Hilary
2004-12-01
Theory testing and construct measurement are inextricably linked. To date, no published research has looked at the factorial validity of an existing eating attitude inventory for use with exercisers. The Eating Attitude Test (EAT) is a 26-item measure that yields a single index of disordered eating attitudes. The original factor analysis showed three interrelated factors: Dieting behavior (13-items), oral control (7-items), and bulimia nervosa-food preoccupation (6-items). The primary purpose of the study was to examine the factorial validity of the EAT among a sample of exercisers. The second purpose was to investigate relationships between eating attitudes scores and selected psychological constructs. In stage one, 598 regular exercisers completed the EAT. Confirmatory factor analysis (CFA) was used to test the single-factor, a three-factor model, and a four-factor model, which distinguished bulimia from food pre-occupation. CFA of the single-factor model (RCFI = 0.66, RMSEA = 0.10), the three-factor-model (RCFI = 0.74; RMSEA = 0.09) showed poor model fit. There was marginal fit for the 4-factor model (RCFI = 0.91, RMSEA = 0.06). Results indicated five-items showed poor factor loadings. After these 5-items were discarded, the three models were re-analyzed. CFA results indicated that the single-factor model (RCFI = 0.76, RMSEA = 0.10) and three-factor model (RCFI = 0.82, RMSEA = 0.08) showed poor fit. CFA results for the four-factor model showed acceptable fit indices (RCFI = 0.98, RMSEA = 0.06). Stage two explored relationships between EAT scores, mood, self-esteem, and motivational indices toward exercise in terms of self-determination, enjoyment and competence. Correlation results indicated that depressed mood scores positively correlated with bulimia and dieting scores. Further, dieting was inversely related with self-determination toward exercising. Collectively, findings suggest that a 21-item four-factor model shows promising validity coefficients among exercise participants, and that future research is needed to investigate eating attitudes among samples of exercisers. Key PointsValidity of psychometric measures should be thoroughly investigated. Researchers should not assume that a scale validation on one sample will show the same validity coefficients in a different population.The Eating Attitude Test is a commonly used scale. The present study shows a revised 21-item scale was suitable for exercisers.Researchers using the Eating Attitude Test should use subscales of Dieting, Oral control, Food pre-occupation, and Bulimia.Future research should involve qualitative techniques and interview exercise participants to explore the nature of eating attitudes.
N’Diaye, Khadim; Evans, D. Gareth; Harris, Hilary; Tibben, Aad; van Asperen, Christi; Schmidtke, Joerg; Nippert, Irmgard; Mancini, Julien; Julian-Reynier, Claire
2017-01-01
Objective To develop a generic scale for assessing attitudes towards genetic testing and to psychometrically assess these attitudes in the context of BRCA1/2 among a sample of French general practitioners, breast specialists and gyneco-obstetricians. Study design and setting Nested within the questionnaire developed for the European InCRisC (International Cancer Risk Communication Study) project were 14 items assessing expected benefits (8 items) and drawbacks (6 items) of the process of breast/ovarian genetic cancer testing (BRCA1/2). Another item assessed agreement with the statement that, overall, the expected health benefits of BRCA1/2 testing exceeded its drawbacks, thereby justifying its prescription. The questionnaire was mailed to a sample of 1,852 French doctors. Of these, 182 breast specialists, 275 general practitioners and 294 gyneco-obstetricians completed and returned the questionnaire to the research team. Principal Component Analysis, Cronbach’s α coefficient, and Pearson’s correlation coefficients were used in the statistical analyses of collected data. Results Three dimensions emerged from the respondents’ responses, and were classified under the headings: “Anxiety, Conflict and Discrimination”, “Risk Information”, and “Prevention and Surveillance”. Cronbach’s α coefficient for the 3 dimensions was 0.79, 0.76 and 0.62, respectively, and each dimension exhibited strong correlation with the overall indicator of agreement (criterion validity). Conclusions The validation process of the 15 items regarding BRCA1/2 testing revealed satisfactory psychometric properties for the creation of a new scale entitled the Attitudes Towards Genetic Testing for BRCA1/2 (ATGT-BRCA1/2) Scale. Further testing is required to confirm the validity of this tool which could be used generically in other genetic contexts. PMID:28570656
O'Connor, Teresia M; Cerin, Ester; Hughes, Sheryl O; Robles, Jessica; Thompson, Deborah I; Mendoza, Jason A; Baranowski, Tom; Lee, Rebecca E
2014-01-15
Latino preschoolers (3-5 year old children) have among the highest rates of obesity. Low levels of physical activity (PA) are a risk factor for obesity. Characterizing what Latino parents do to encourage or discourage their preschooler to be physically active can help inform interventions to increase their PA. The objective was therefore to develop and assess the psychometrics of a new instrument: the Preschooler Physical Activity Parenting Practices (PPAPP) among a Latino sample, to assess parenting practices used to encourage or discourage PA among preschool-aged children. Cross-sectional study of 240 Latino parents who reported the frequency of using PA parenting practices. 95% of respondents were mothers; 42% had more than a high school education. Child mean age was 4.5 (±0.9) years (52% male). Test-retest reliability was assessed in 20%, 2 weeks later. We assessed the fit of a priori models using Confirmatory factor analyses (CFA). In a separate sub-sample (35%), preschool-aged children wore accelerometers to assess associations with their PA and PPAPP subscales. The a-priori models showed poor fit to the data. A modified factor structure for encouraging PPAPP had one multiple-item scale: engagement (15 items), and two single-items (have outdoor toys; not enroll in sport-reverse coded). The final factor structure for discouraging PPAPP had 4 subscales: promote inactive transport (3 items), promote screen time (3 items), psychological control (4 items) and restricting for safety (4 items). Test-retest reliability (ICC) for the two scales ranged from 0.56-0.85. Cronbach's alphas ranged from 0.5-0.9. Several sub-factors correlated in the expected direction with children's objectively measured PA. The final models for encouraging and discouraging PPAPP had moderate to good fit, with moderate to excellent test-retest reliabilities. The PPAPP should be further evaluated to better assess its associations with children's PA and offers a new tool for measuring PPAPP among Latino families with preschool-aged children.
Pilkonis, Paul A.; Yu, Lan; Dodds, Nathan E.; Johnston, Kelly L.; Lawrence, Suzanne; Hilton, Thomas F.; Daley, Dennis C.; Patkar, Ashwin A.; McCarty, Dennis
2015-01-01
Background Two item banks for substance use were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®): severity of substance use and positive appeal of substance use. Methods Qualitative item analysis (including focus groups, cognitive interviewing, expert review, and item revision) reduced an initial pool of more than 5,300 items for substance use to 119 items included in field testing. Items were written in a first-person, past-tense format, with 5 response options reflecting frequency or severity. Both 30-day and 3-month time frames were tested. The calibration sample of 1,336 respondents included 875 individuals from the general population (ascertained through an internet panel) and 461patients from addiction treatment centers participating in the National Drug Abuse Treatment Clinical Trials Network. Results Final banks of 37 and 18 items were calibrated for severity of substance use and positive appeal of substance use, respectively, using the two-parameter graded response model from item response theory (IRT). Initial calibrations were similar for the 30-day and 3-month time frames, and final calibrations used data combined across the time frames, making the items applicable with either interval. Seven-item static short forms were also developed from each item bank. Conclusions Test information curves showed that the PROMIS item banks provided substantial information in a broad range of severity, making them suitable for treatment, observational, and epidemiological research in both clinical and community settings. PMID:26423364
Mechanical Drawing: Grades 7-12.
ERIC Educational Resources Information Center
Instructional Objectives Exchange, Los Angeles, CA.
Eighty-five behavioral objectives and related evaluation items for mechanical drawing in grades 7 through 12 are presented. Each sample contains the objective, test items, and means for judging the adequacy of the response. The following categories are included: (1) basic drafting skills; (2) beginning lettering; (3) drawing; (4) orthographic…
Maples-Keller, Jessica L; Williamson, Rachel L; Sleep, Chelsea E; Carter, Nathan T; Campbell, W Keith; Miller, Joshua D
2017-10-31
Given advantages of freely available and modifiable measures, an increase in the use of measures developed from the International Personality Item Pool (IPIP), including the 300-item representation of the Revised NEO Personality Inventory (NEO PI-R; Costa & McCrae, 1992a ) has occurred. The focus of this study was to use item response theory to develop a 60-item, IPIP-based measure of the Five-Factor Model (FFM) that provides equal representation of the FFM facets and to test the reliability and convergent and criterion validity of this measure compared to the NEO Five Factor Inventory (NEO-FFI). In an undergraduate sample (n = 359), scores from the NEO-FFI and IPIP-NEO-60 demonstrated good reliability and convergent validity with the NEO PI-R and IPIP-NEO-300. Additionally, across criterion variables in the undergraduate sample as well as a community-based sample (n = 757), the NEO-FFI and IPIP-NEO-60 demonstrated similar nomological networks across a wide range of external variables (r ICC = .96). Finally, as expected, in an MTurk sample the IPIP-NEO-60 demonstrated advantages over the Big Five Inventory-2 (Soto & John, 2017 ; n = 342) with regard to the Agreeableness domain content. The results suggest strong reliability and validity of the IPIP-NEO-60 scores.
Purcell, Susan E; Rhea, Karen; Maier, Philip; First, Michael; Zweede, Lisa; Sinisterra, Manuela; Nunn, M Brad; Austin, Marie-Paule; Brodey, Inger S
2018-01-01
Background The Structured Clinical Interview for DSM (SCID) is considered the gold standard assessment for accurate, reliable psychiatric diagnoses; however, because of its length, complexity, and training required, the SCID is rarely used outside of research. Objective This paper aims to describe the development and initial validation of a Web-based, self-report screening instrument (the Screening Assessment for Guiding Evaluation-Self-Report, SAGE-SR) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and the SCID-5-Clinician Version (CV) intended to make accurate, broad-based behavioral health diagnostic screening more accessible within clinical care. Methods First, study staff drafted approximately 1200 self-report items representing individual granular symptoms in the diagnostic criteria for the 8 primary SCID-CV modules. An expert panel iteratively reviewed, critiqued, and revised items. The resulting items were iteratively administered and revised through 3 rounds of cognitive interviewing with community mental health center participants. In the first 2 rounds, the SCID was also administered to participants to directly compare their Likert self-report and SCID responses. A second expert panel evaluated the final pool of items from cognitive interviewing and criteria in the DSM-5 to construct the SAGE-SR, a computerized adaptive instrument that uses branching logic from a screener section to administer appropriate follow-up questions to refine the differential diagnoses. The SAGE-SR was administered to healthy controls and outpatient mental health clinic clients to assess test duration and test-retest reliability. Cutoff scores for screening into follow-up diagnostic sections and criteria for inclusion of diagnoses in the differential diagnosis were evaluated. Results The expert panel reduced the initial 1200 test items to 664 items that panel members agreed collectively represented the SCID items from the 8 targeted modules and DSM criteria for the covered diagnoses. These 664 items were iteratively submitted to 3 rounds of cognitive interviewing with 50 community mental health center participants; the expert panel reviewed session summaries and agreed on a final set of 661 clear and concise self-report items representing the desired criteria in the DSM-5. The SAGE-SR constructed from this item pool took an average of 14 min to complete in a nonclinical sample versus 24 min in a clinical sample. Responses to individual items can be combined to generate DSM criteria endorsements and differential diagnoses, as well as provide indices of individual symptom severity. Preliminary measures of test-retest reliability in a small, nonclinical sample were promising, with good to excellent reliability for screener items in 11 of 13 diagnostic screening modules (intraclass correlation coefficient [ICC] or kappa coefficients ranging from .60 to .90), with mania achieving fair test-retest reliability (ICC=.50) and other substance use endorsed too infrequently for analysis. Conclusions The SAGE-SR is a computerized adaptive self-report instrument designed to provide rigorous differential diagnostic information to clinicians. PMID:29572204
2014-01-01
Background Self-compassion is a key psychological construct for assessing clinical outcomes in mindfulness-based interventions. The aim of this study was to validate the Spanish versions of the long (26 item) and short (12 item) forms of the Self-Compassion Scale (SCS). Methods The translated Spanish versions of both subscales were administered to two independent samples: Sample 1 was comprised of university students (n = 268) who were recruited to validate the long form, and Sample 2 was comprised of Aragon Health Service workers (n = 271) who were recruited to validate the short form. In addition to SCS, the Mindful Attention Awareness Scale (MAAS), the State-Trait Anxiety Inventory–Trait (STAI-T), the Beck Depression Inventory (BDI) and the Perceived Stress Questionnaire (PSQ) were administered. Construct validity, internal consistency, test-retest reliability and convergent validity were tested. Results The Confirmatory Factor Analysis (CFA) of the long and short forms of the SCS confirmed the original six-factor model in both scales, showing goodness of fit. Cronbach’s α for the 26 item SCS was 0.87 (95% CI = 0.85-0.90) and ranged between 0.72 and 0.79 for the 6 subscales. Cronbach’s α for the 12-item SCS was 0.85 (95% CI = 0.81-0.88) and ranged between 0.71 and 0.77 for the 6 subscales. The long (26-item) form of the SCS showed a test-retest coefficient of 0.92 (95% CI = 0.89–0.94). The Intraclass Correlation (ICC) for the 6 subscales ranged from 0.84 to 0.93. The short (12-item) form of the SCS showed a test-retest coefficient of 0.89 (95% CI: 0.87-0.93). The ICC for the 6 subscales ranged from 0.79 to 0.91. The long and short forms of the SCS exhibited a significant negative correlation with the BDI, the STAI and the PSQ, and a significant positive correlation with the MAAS. The correlation between the total score of the long and short SCS form was r = 0.92. Conclusion The Spanish versions of the long (26-item) and short (12-item) forms of the SCS are valid and reliable instruments for the evaluation of self-compassion among the general population. These results substantiate the use of this scale in research and clinical practice. PMID:24410742
Firmin, Ruth L; Lysaker, Paul H; McGrew, John H; Minor, Kyle S; Luther, Lauren; Salyers, Michelle P
2017-12-01
Although associated with key recovery outcomes, stigma resistance remains under-studied largely due to limitations of existing measures. This study developed and validated a new measure of stigma resistance. Preliminary items, derived from qualitative interviews of people with lived experience, were pilot tested online with people self-reporting a mental illness diagnosis (n = 489). Best performing items were selected, and the refined measure was administered to an independent sample of people with mental illness at two state mental health consumer recovery conferences (n = 202). Confirmatory factor analyses (CFA) guided by theory were used to test item fit, correlations between the refined stigma resistance measure and theoretically relevant measures were examined for validity, and test-retest correlations of a subsample were examined for stability. CFA demonstrated strong fit for a 5-factor model. The final 20-item measure demonstrated good internal consistency for each of the 5 subscales, adequate test-retest reliability at 3 weeks, and strong construct validity (i.e., positive associations with quality of life, recovery, and self-efficacy, and negative associations with overall symptoms, defeatist beliefs, and self-stigma). The new measure offers a more reliable and nuanced assessment of stigma resistance. It may afford greater personalization of interventions targeting stigma resistance. Copyright © 2017 Elsevier B.V. All rights reserved.
Applications of computerized adaptive testing (CAT) to the assessment of headache impact.
Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew
2003-12-01
To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.
Development of an item bank for computerized adaptive test (CAT) measurement of pain.
Petersen, Morten Aa; Aaronson, Neil K; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Hammerlid, Eva; Hjermstad, Marianne J; Kaasa, Stein; Loge, Jon H; Velikova, Galina; Young, Teresa; Groenvold, Mogens
2016-01-01
Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured by the EORTC QLQ-C30 questionnaire. The development process consisted of four steps: (1) literature search, (2) formulation of new items and expert evaluations, (3) pretesting and (4) field-testing and psychometric analyses for the final selection of items. In step 1, we identified 337 pain items from the literature. Twenty-nine new items fitting the QLQ-C30 item style were formulated in step 2 that were reduced to 26 items by expert evaluations. Based on interviews with 31 patients from Denmark, France and the UK, the list was further reduced to 21 items in step 3. In phase 4, responses were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25% compared to using the QLQ-C30 pain scale. We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ-C30 pain items. The EORTC pain CAT is currently available for "experimental" purposes.
Massey, Kevin; Barnes, Marilyn J D; Villines, Dana; Goldstein, Julie D; Pierson, Anna Lee Hisey; Scherer, Cheryl; Vander Laan, Betty; Summerfelt, Wm Thomas
2015-01-01
Chaplains are increasingly seen as key members of interdisciplinary palliative care teams, yet the specific interventions and hoped for outcomes of their work are poorly understood. This project served to develop a standard terminology inventory for the chaplaincy field, to be called the chaplaincy taxonomy. The research team used a mixed methods approach to generate, evaluate and validate items for the taxonomy. We conducted a literature review, retrospective chart review, focus groups, self-observation, experience sampling, concept mapping, and reliability testing. Chaplaincy activities focused primarily on palliative care in an intensive care unit setting in order to capture a broad cross section of chaplaincy activities. Literature and chart review resulted in 438 taxonomy items for testing. Chaplain focus groups generated an additional 100 items and removed 421 items as duplications. Self-Observation, Experience Sampling and Concept Mapping provided validity that the taxonomy items were actual activities that chaplains perform in their spiritual care. Inter-rater reliability for chaplains to identify taxonomy items from vignettes was 0.903. The 100 item chaplaincy taxonomy provides a strong foundation for a normative inventory of chaplaincy activities and outcomes. A deliberative process is proposed to further expand and refine the taxonomy to create a standard terminological inventory for the field of chaplaincy. A standard terminology could improve the ways inter-disciplinary palliative care teams communicate about chaplaincy activities and outcomes.
Development and evaluation of the Korean Health Literacy Instrument.
Kang, Soo Jin; Lee, Tae Wha; Paasche-Orlow, Michael K; Kim, Gwang Suk; Won, Hee Kwan
2014-01-01
The purpose of this study is to develop and validate the Korean Health Literacy Instrument, which measures the capacity to understand and use health-related information and make informed health decisions in Korean adults. In Phase 1, 33 initial items were generated to measure functional, interactive, and critical health literacy with prose, document, and numeracy tasks. These items included content from health promotion, disease management, and health navigation contexts. Content validity assessment was conducted by an expert panel, and 11 items were excluded. In Phase 2, the 22 remaining items were administered to a convenience sample of 292 adults from community and clinical settings. Exploratory factor and item difficulty and discrimination analyses were conducted and four items with low discrimination were deleted. In Phase 3, the remaining 18 items were administered to a convenience sample of 315 adults 40-64 years of age from community and clinical settings. A confirmatory factor analysis was performed to test the construct validity of the instrument. The Korean Health Literacy Instrument has a range of 0 to 18. The mean score in our validation study was 11.98. The instrument exhibited an internal consistency reliability coefficient of 0.82, and a test-retest reliability of 0.89. The instrument is suitable for screening individuals who have limited health literacy skills. Future studies are needed to further define the psychometric properties and predictive validity of the Korean Health Literacy Instrument.
INTRODUCTION TO PATIENT-REPORTED OUTCOME ITEM BANKS: ISSUES IN MINORITY AGING RESEARCH
Templin, Thomas N; Hays, Ron D; Gershon, Richard C; Rothrock, Nan; Jones, Richard N; Teresi, Jeanne A; Stewart, Anita; Weech-Maldonado, Robert; Wallace, Steve
2014-01-01
In 2004 NIH awarded contracts to initiate the development of high quality psychological and neuropsychological outcome measures for improved assessment of health-related outcomes. The workshop introduced these measurement development initiatives, the measures created, and the NIH supported resource (Assessment Center) for internet or tablet-based test administration and scoring. Presentation covered: (a) item response theory (IRT) and assessment of test bias, (b) construction of item banks and computerized adaptive testing, and (c) the different ways in which qualitative analyses contribute to the definition of construct domains and the refinement of outcome constructs. The panel discussion included questions about representativeness of samples, and assessment of cultural bias. PMID:23570428
Statistical power as a function of Cronbach alpha of instrument questionnaire items.
Heo, Moonseong; Kim, Namhee; Faith, Myles S
2015-10-14
In countless number of clinical trials, measurements of outcomes rely on instrument questionnaire items which however often suffer measurement error problems which in turn affect statistical power of study designs. The Cronbach alpha or coefficient alpha, here denoted by C(α), can be used as a measure of internal consistency of parallel instrument items that are developed to measure a target unidimensional outcome construct. Scale score for the target construct is often represented by the sum of the item scores. However, power functions based on C(α) have been lacking for various study designs. We formulate a statistical model for parallel items to derive power functions as a function of C(α) under several study designs. To this end, we assume fixed true score variance assumption as opposed to usual fixed total variance assumption. That assumption is critical and practically relevant to show that smaller measurement errors are inversely associated with higher inter-item correlations, and thus that greater C(α) is associated with greater statistical power. We compare the derived theoretical statistical power with empirical power obtained through Monte Carlo simulations for the following comparisons: one-sample comparison of pre- and post-treatment mean differences, two-sample comparison of pre-post mean differences between groups, and two-sample comparison of mean differences between groups. It is shown that C(α) is the same as a test-retest correlation of the scale scores of parallel items, which enables testing significance of C(α). Closed-form power functions and samples size determination formulas are derived in terms of C(α), for all of the aforementioned comparisons. Power functions are shown to be an increasing function of C(α), regardless of comparison of interest. The derived power functions are well validated by simulation studies that show that the magnitudes of theoretical power are virtually identical to those of the empirical power. Regardless of research designs or settings, in order to increase statistical power, development and use of instruments with greater C(α), or equivalently with greater inter-item correlations, is crucial for trials that intend to use questionnaire items for measuring research outcomes. Further development of the power functions for binary or ordinal item scores and under more general item correlation strutures reflecting more real world situations would be a valuable future study.
W-026, transuranic waste restricted waste management (TRU RWM) glovebox operational test report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leist, K.J.
1998-02-18
The TRU Waste/Restricted Waste Management (LLW/PWNP) Glovebox 401 is designed to accept and process waste from the Transuranic Process Glovebox 302. Waste is transferred to the glovebox via the Drath and Schraeder Bagless Transfer Port (DO-07401) on a transfer stand. The stand is removed with a hoist and the operator inspects the waste (with the aid of the Sampling and Treatment Director) to determine a course of action for each item. The waste is separated into compliant and non compliant. One Trip Port DO-07402A is designated as ``Compliant``and One Trip Port DO-07402B is designated as ``Non Compliant``. As the processingmore » (inspection, bar coding, sampling and treatment) of the transferred items takes place, residue is placed in the appropriate One Trip port. The status of the waste items is tracked by the Data Management System (DMS) via the Plant Control System (PCS) barcode interface. As an item is moved for sampling or storage or it`s state altered by treatment, the Operator will track an items location using a portable barcode reader and entry any required data on the DMS console. The Operational Test Procedure (OTP) will perform evolutions (described here) using the Plant Operating Procedures (POP) in order to verify that they are sufficient and accurate for controlled glovebox operation.« less
Kong, Angela; Vijayasiri, Ganga; Fitzgibbon, Marian L; Schiffer, Linda A; Campbell, Richard T
2015-07-01
Validation work of the Child Feeding Questionnaire (CFQ) in low-income minority samples suggests a need for further conceptual refinement of this instrument. Using confirmatory factor analysis, this study evaluated 5- and 6-factor models on a large sample of African-American and Hispanic mothers with preschool-age children (n = 962). The 5-factor model included: 'perceived responsibility', 'concern about child's weight', 'restriction', 'pressure to eat', and 'monitoring' and the 6-factor model also tested 'food as a reward'. Multi-group analysis assessed measurement invariance by race/ethnicity. In the 5-factor model, two low-loading items from 'restriction' and one low-variance item from 'perceived responsibility' were dropped to achieve fit. Only removal of the low-variance item was needed to achieve fit in the 6-factor model. Invariance analyses demonstrated differences in factor loadings. This finding suggests African-American and Hispanic mothers may vary in their interpretation of some CFQ items and use of cognitive interviews could enhance item interpretation. Our results also demonstrated that 'food as a reward' is a plausible construct among a low-income minority sample and adds to the evidence that this factor resonates conceptually with parents of preschoolers; however, further testing is needed to determine the validity of this factor with older age groups. Copyright © 2015 Elsevier Ltd. All rights reserved.
[Mokken scaling of the Cognitive Screening Test].
Diesfeldt, H F A
2009-10-01
The Cognitive Screening Test (CST) is a twenty-item orientation questionnaire in Dutch, that is commonly used to evaluate cognitive impairment. This study applied Mokken Scale Analysis, a non-parametric set of techniques derived from item response theory (IRT), to CST-data of 466 consecutive participants in psychogeriatric day care. The full item set and the standard short version of fourteen items both met the assumptions of the monotone homogeneity model, with scalability coefficient H = 0.39, which is considered weak. In order to select items that would fulfil the assumption of invariant item ordering or the double monotonicity model, the subjects were randomly partitioned into a training set (50% of the sample) and a test set (the remaining half). By means of an automated item selection eleven items were found to measure one latent trait, with H = 0.67 and item H coefficients larger than 0.51. Cross-validation of the item analysis in the remaining half of the subjects gave comparable values (H = 0.66; item H coefficients larger than 0.56). The selected items involve year, place of residence, birth date, the monarch's and prime minister's names, and their predecessors. Applying optimal discriminant analysis (ODA) it was found that the full set of twenty CST items performed best in distinguishing two predefined groups of patients of lower or higher cognitive ability, as established by an independent criterion derived from the Amsterdam Dementia Screening Test. The chance corrected predictive value or prognostic utility was 47.5% for the full item set, 45.2% for the fourteen items of the standard short version of the CST, and 46.1% for the homogeneous, unidimensional set of selected eleven items. The results of the item analysis support the application of the CST in cognitive assessment, and revealed a more reliable 'short' version of the CST than the standard short version (CST14).
ERIC Educational Resources Information Center
Instructional Objectives Exchange, Los Angeles, CA.
The woodworking collection is composed of 55 objectives and related evaluation items for use in grades 7 through 12. Each sample contains the objective, test items, and criteria for judging the adequacy of the response. Woodworking categories being measured include sharpening, adjusting, using and caring for tools; reading a working drawing; stock…
Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W
2015-05-01
To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
Kisala, Pamela A.; Tulsky, David S.; Pace, Natalie; Victorson, David; Choi, Seung W.; Heinemann, Allen W.
2015-01-01
Objective To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Design Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Stigma Item Bank Results A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. Conclusions The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications. PMID:26010973
Derivation and Applicability of Asymptotic Results for Multiple Subtests Person-Fit Statistics
Albers, Casper J.; Meijer, Rob R.; Tendeiro, Jorge N.
2016-01-01
In high-stakes testing, it is important to check the validity of individual test scores. Although a test may, in general, result in valid test scores for most test takers, for some test takers, test scores may not provide a good description of a test taker’s proficiency level. Person-fit statistics have been proposed to check the validity of individual test scores. In this study, the theoretical asymptotic sampling distribution of two person-fit statistics that can be used for tests that consist of multiple subtests is first discussed. Second, simulation study was conducted to investigate the applicability of this asymptotic theory for tests of finite length, in which the correlation between subtests and number of items in the subtests was varied. The authors showed that these distributions provide reasonable approximations, even for tests consisting of subtests of only 10 items each. These results have practical value because researchers do not have to rely on extensive simulation studies to simulate sampling distributions. PMID:29881053
Marfeo, Elizabeth E.; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Rasch, Elizabeth K.; Brandt, Diane E.; Jette, Alan M.
2014-01-01
Objectives To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Design Cross-sectional survey followed by item response theory (IRT) calibration data simulations Setting Community Participants A sample of individuals applying for SSA disability benefits, claimants (N=1015), and a normative comparative sample of US adults (N=1000) Interventions None. Main Outcome Measure Social Security Administration Behavioral Health Function (SSA-BH) measurement instrument Results Item response theory analyses supported the unidimensionality of four SSA-BH scales: Mood and Emotions (35 items), Self-Efficacy (23 items), Social Interactions (6 items), and Behavioral Control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10- item CATs with the full item bank indicated robust ability of the CAT approach to comprehensively characterize behavioral health function along four distinct dimensions. Conclusions Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all four scales. Behavioral function profiles of SSA claimants were generated and compared to age and sex matched norms along four scales: Mood and Emotions, Behavioral Control, Social Interactions, and Self-Efficacy. Utilizing the CAT based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the SSA’s work disability programs. PMID:23542404
Bodenburg, Sebastian; Dopslaff, Nina
2008-01-01
The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.
NASA Astrophysics Data System (ADS)
Marshall, Jill A.; Hagedorn, Eric A.; O'Connor, Jerry
2009-06-01
We report the results of an analysis of the Texas Assessment of Knowledge and Skills (TAKS) designed to determine whether the TAKS is a valid indicator of whether students know and can do physics at the level necessary for success in future coursework, STEM careers, and life in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam, performing full-information factor analysis, calculating classical test indices, and determining each item's response curve using item response theory. Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing.
Xiao, Yuan-mei; Wang, Zhi-ming; Wang, Mian-zhen; Lan, Ya-jia
2005-06-01
To test the reliability and validity of two mental workload assessment scales, i.e. subjective workload assessment technique (SWAT) and NASA task load index (NASA-TLX). One thousand two hundred and sixty-eight mental workers were sampled from various kinds of occupations, such as scientific research, education, administration and medicine, etc, with randomized cluster sampling. The re-test reliability, split-half reliability, Cronbach's alpha coefficient and correlation coefficients between item score and total score were adopted to test the reliability. The test of validity included structure validity. The re-test reliability coefficients of these two scales and their items were ranged from 0.516 to 0.753 (P < 0.01), indicating the two scales had good re-test reliability; the split-half reliability of SWAT was 0.645, and its Cronbach's alpha coefficient was more than 0.80, all the correlation coefficients between its items score and total score were more than 0.70; as for NASA-TLX, both the split-half reliability and Cronbach's alpha coefficient were more than 0.80, the correlation coefficients between its items score and total score were all more than 0.60 (P < 0.01) except the item of performance. Both scales had good inner consistency. The Pearson correlation coefficient between the two scales was 0.492 (P < 0.01), implying the results of the two scales had good consistency. Factor analysis showed that the two scales had good structure validity. Both SWAT and NASA-TLX have good reliability and validity and may be used as a valid tool to assess mental workload in China after being revised properly.
Development and initial evaluation of the SCI-FI/AT
Jette, Alan M.; Slavin, Mary D.; Ni, Pengsheng; Kisala, Pamela A.; Tulsky, David S.; Heinemann, Allen W.; Charlifue, Susie; Tate, Denise G.; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-01-01
Objectives To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Design Cross sectional survey followed by computerized adaptive test (CAT) simulations. Setting Inpatient and community settings. Participants A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. Interventions None Main outcome measure SCI-FI/AT Results Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. Conclusion With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI. PMID:26010975
Development and initial evaluation of the SCI-FI/AT.
Jette, Alan M; Slavin, Mary D; Ni, Pengsheng; Kisala, Pamela A; Tulsky, David S; Heinemann, Allen W; Charlifue, Susie; Tate, Denise G; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-05-01
To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Cross sectional survey followed by computerized adaptive test (CAT) simulations. Inpatient and community settings. A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. None SCI-FI/AT RESULTS: Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI.
Tarrant, Marie; Ware, James; Mohammed, Ahmed M
2009-07-07
Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
NAC Aftermarket Brake Components Project (Secondary Items)
2007-02-06
2. Appendix B. Test Plans and Sample Assignments for Disc Brake Pads and (Foundation) Drum Brake Shoes. 3. Appendix C. Test Plans and Sample...Assignments for Disc Brake Rotors and Drum Brake Drums . 4. Appendix D. Off-vehicle Inertia Dynamometer Test Procedures. 5. Appendix E. “Crack & Fatigue...apples” comparison testing processes, and require project outputs documents to be proofed by actual independent testing; HMMWV-ECV ( disc ) and HEMTT
Calibration of context-specific survey items to assess youth physical activity behaviour.
Saint-Maurice, Pedro F; Welk, Gregory J; Bartee, R Todd; Heelan, Kate
2017-05-01
This study tests calibration models to re-scale context-specific physical activity (PA) items to accelerometer-derived PA. A total of 195 4th-12th grades children wore an Actigraph monitor and completed the Physical Activity Questionnaire (PAQ) one week later. The relative time spent in moderate-to-vigorous PA (MVPA % ) obtained from the Actigraph at recess, PE, lunch, after-school, evening and weekend was matched with a respective item score obtained from the PAQ's. Item scores from 145 participants were calibrated against objective MVPA % using multiple linear regression with age, and sex as additional predictors. Predicted minutes of MVPA for school, out-of-school and total week were tested in the remaining sample (n = 50) using equivalence testing. The results showed that PAQ β-weights ranged from 0.06 (lunch) to 4.94 (PE) MVPA % (P < 0.05) and models root mean square error ranged from 4.2% (evening) to 20.2% (recess). When applied to an independent sample, differences between PAQ and accelerometer MVPA at school and out-of-school ranged from -15.6 to +3.8 min and the PAQ was within 10-15% of accelerometer measured activity. This study demonstrated that context-specific items can be calibrated to predict minutes of MVPA in groups of youth during in- and out-of-school periods.
Computer-adaptive test to measure community reintegration of Veterans.
Resnik, Linda; Tian, Feng; Ni, Pengsheng; Jette, Alan
2012-01-01
The Community Reintegration of Injured Service Members (CRIS) measure consists of three scales measuring extent of, perceived limitations in, and satisfaction with community reintegration. Length of the CRIS may be a barrier to its widespread use. Using item response theory (IRT) and computer-adaptive test (CAT) methodologies, this study developed and evaluated a briefer community reintegration measure called the CRIS-CAT. Large item banks for each CRIS scale were constructed. A convenience sample of 517 Veterans responded to all items. Exploratory and confirmatory factor analyses (CFAs) were used to identify the dimensionality within each domain, and IRT methods were used to calibrate items. Accuracy and precision of CATs of different lengths were compared with the full-item bank, and data were examined for differential item functioning (DIF). CFAs supported unidimensionality of scales. Acceptable item fit statistics were found for final models. Accuracy of 10-, 15-, 20-, and variable-item CATs for all three scales was 0.88 or above. CAT precision increased with number of items administered and decreased at the upper ranges of each scale. Three items exhibited moderate DIF by sex. The CRIS-CAT demonstrated promising measurement properties and is recommended for use in community reintegration assessment.
A large-scale, long-term study of scale drift: The micro view and the macro view
NASA Astrophysics Data System (ADS)
He, W.; Li, S.; Kingsbury, G. G.
2016-11-01
The development of measurement scales for use across years and grades in educational settings provides unique challenges, as instructional approaches, instructional materials, and content standards all change periodically. This study examined the measurement stability of a set of Rasch measurement scales that have been in place for almost 40 years. In order to investigate the stability of these scales, item responses were collected from a large set of students who took operational adaptive tests using items calibrated to the measurement scales. For the four scales that were examined, item samples ranged from 2183 to 7923 items. Each item was administered to at least 500 students in each grade level, resulting in approximately 3000 responses per item. Stability was examined at the micro level analysing change in item parameter estimates that have occurred since the items were first calibrated. It was also examined at the macro level, involving groups of items and overall test scores for students. Results indicated that individual items had changes in their parameter estimates, which require further analysis and possible recalibration. At the same time, the results at the total score level indicate substantial stability in the measurement scales over the span of their use.
Eigenhuis, Annemarie; Kamphuis, Jan H; Noordhof, Arjen
2017-09-01
A growing body of research suggests that the same general dimensions can describe normal and pathological personality, but most of the supporting evidence is exploratory. We aim to determine in a confirmatory framework the extent to which responses on the Multidimensional Personality Questionnaire (MPQ) are identical across general and clinical samples. We tested the Dutch brief form of the MPQ (MPQ-BF-NL) for measurement invariance across a general population subsample (N = 365) and a clinical sample (N = 365), using Multiple Group Confirmatory Factor Analysis (MGCFA) and Multiple Group Exploratory Structural Equation Modeling (MGESEM). As an omnibus personality test, the MPQ-BF-NL revealed strict invariance, indicating absence of bias. Unidimensional per scale tests for measurement invariance revealed that 10% of items appeared to contain bias across samples. Item bias only affected the scale interpretation of Achievement, with individuals from the clinical sample more readily admitting to put high demands on themselves than individuals from the general sample, regardless of trait level. This formal test of equivalence provides strong evidence for the common structure of normal and pathological personality and lends further support to the clinical utility of the MPQ. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
De La Rosa, Gabriel M; Webb-Murphy, Jennifer A; Johnston, Scott L
2016-03-01
Resilience helps determine how people respond to stress. The Response to Stressful Events Scale (RSES) is an existing 22-item measure of resilience. We investigate the psychometric properties of the RSES and develop a 4-item measure of resilience using the most discriminating items from the RSES. Among two samples of military personnel presenting to mental health clinics, we see that the abbreviated resilience measure displays comparable internal consistency and test-retest reliability (versus the existing RSES). Among a sample of deployed military personnel, the abbreviated scale relates to validated measures of psychological strain. The 4-item abbreviated RSES measure is a brief, reliable, and valid measure of resilience. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.
Factor Structure of the Internet Addiction Test in Online Gamers and Poker Players.
Khazaal, Yasser; Achab, Sophia; Billieux, Joel; Thorens, Gabriel; Zullino, Daniele; Dufour, Magali; Rothen, Stéphane
2015-01-01
The Internet Addiction Test (IAT) is the most widely used questionnaire to screen for problematic Internet use. Nevertheless, its factorial structure is still debated, which complicates comparisons among existing studies. Most previous studies were performed with students or community samples despite the probability of there being more problematic Internet use among users of specific applications, such as online gaming or gambling. To assess the factorial structure of a modified version of the IAT that addresses specific applications, such as video games and online poker. Two adult samples-one sample of Internet gamers (n=920) and one sample of online poker players (n=214)-were recruited and completed an online version of the modified IAT. Both samples were split into two subsamples. Two principal component analyses (PCAs) followed by two confirmatory factor analyses (CFAs) were run separately. The results of principal component analysis indicated that a one-factor model fit the data well across both samples. In consideration of the weakness of some IAT items, a 17-item modified version of the IAT was proposed. This study assessed, for the first time, the factorial structure of a modified version of an Internet-administered IAT on a sample of Internet gamers and a sample of online poker players. The scale seems appropriate for the assessment of such online behaviors. Further studies on the modified 17-item IAT version are needed.
Maïano, Christophe; Bégarie, Jérôme; Morin, Alexandre J S; Garbarino, Jean-Marie; Ninot, Grégory
2010-01-01
The purpose of this study was to test the reliability (i.e. internal consistency and test-retest reliability) and construct validity (i.e. content validity, factor validity, measurement invariance, and latent mean invariance) of the Nutrition and Activity Knowledge Scale (NAKS) in a sample of French adolescents with mild to moderate Intellectual Disability (ID). A total sample of 260 adolescents (144 boys and 116 girls), aged between 12 and 18 years old, with mild to moderate ID was involved in two studies. In the first study, analysis of items' content reveals that many words from the original version were not understood or induced confusion. These items were reworded and simplified while retaining their original meaning. In the second study, results provided support for: (i) the factor validity and reliability of a 15-item French version of the NAKS; (ii) the measurement invariance of the resulting NAKS across genders and ID levels; (iii) the partial measurement invariance of the resulting NAKS across age groups and type of school placement. In addition, the latent means of the 15-item French version of the NAKS proved to be invariant across gender, age categories, and ID levels, but to vary across type of school placement (with adolescents schooled in self-contained classes from regular schools presenting higher levels of NAK than adolescents placed in specialized establishments). The present results thus provide preliminary evidence regarding the construct validity of a 15-item French version of the NAKS in a sample of adolescents with ID.
Psychometric properties of the Arabic version of the 12-item diabetes fatalism scale
Abi Kharma, Joelle
2018-01-01
Background There are widespread fatalistic beliefs in Arab countries, especially among individuals with diabetes. However, there is no tool to assess diabetes fatalism in this population. This study describes the processes used to create an Arabic version of the Diabetes Fatalism Scale (DFS) and examine its psychometric properties. Methods A descriptive correlational design was used with a convenience sample of Lebanese adults (N = 274) with type 2 diabetes recruited from a major hospital in Beirut, Lebanon and by snowball sampling. The 12- item Diabetes Fatalism Scale- Arabic (12-item DFS-Ar) was back-translated from the original version, pilot tested on 22 adults with type 2 diabetes and then administered to 274 patients to assess the validity and reliability of the scale. Confirmatory factor analysis (CFA) was used to test the hypothesized factor structure. Cronbach’s alpha was used to test for reliability. Results CFA supported the existence of the three factor hypothesis of the original DFS scale. The five items measuring “emotional distress” loaded under Factor 1, the four items measuring “spiritual coping” loaded under factor 2 and the last three items measuring “perceived self-efficacy” of the original scale loaded under Factor 3 (p <0.001 for all three subscales). Goodness of fit indices confirmed adequateness of the CFA model (CFI = 0.97, TLI = 0.96, RMSEA = 0.067 and pclose = 0.05). The 12-item DFS-Ar showed good reliability (Cronbach’s alpha of 0.86) and significantly predicted HbA1c (β = 0.20, p < 0.01). After adjusting for the demographic characteristics and the number of diabetes comorbid conditions, the 12-item DFS-Ar score was independently associated with HbA1c in a multivariable model (β = 0.16, p < 0.05). Conclusions The 12-item DFS-Ar demonstrated good psychometric properties that are comparable to the original scale. It is a valid and reliable measure of diabetes fatalism. Further testing with larger and non-Lebanese Arabic population is needed. PMID:29324827
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.
Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi
2014-01-01
Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Similarity, not complexity, determines visual working memory performance.
Jackson, Margaret C; Linden, David E J; Roberts, Mark V; Kriegeskorte, Nikolaus; Haenschel, Corinna
2015-11-01
A number of studies have shown that visual working memory (WM) is poorer for complex versus simple items, traditionally accounted for by higher information load placing greater demands on encoding and storage capacity limits. Other research suggests that it may not be complexity that determines WM performance per se, but rather increased perceptual similarity between complex items as a result of a large amount of overlapping information. Increased similarity is thought to lead to greater comparison errors between items encoded into WM and the test item(s) presented at retrieval. However, previous studies have used different object categories to manipulate complexity and similarity, raising questions as to whether these effects are simply due to cross-category differences. For the first time, here the relationship between complexity and similarity in WM using the same stimulus category (abstract polygons) are investigated. The authors used a delayed discrimination task to measure WM for 1-4 complex versus simple simultaneously presented items and manipulated the similarity between the single test item at retrieval and the sample items at encoding. WM was poorer for complex than simple items only when the test item was similar to 1 of the encoding items, and not when it was dissimilar or identical. The results provide clear support for reinterpretation of the complexity effect in WM as a similarity effect and highlight the importance of the retrieval stage in governing WM performance. The authors discuss how these findings can be reconciled with current models of WM capacity limits. (c) 2015 APA, all rights reserved).
Validity and reliability of the Spanish version of the 10-item CD-RISC in patients with fibromyalgia
2014-01-01
Background No resilience scale has been validated in Spanish patients with fibromyalgia. The aim of this study was to evaluate the validity and reliability of the 10-item CD-RISC in a sample of Spanish patients with fibromyalgia. Methods Design: Observational prospective multicenter study. Sample: Patients with diagnoses of fibromyalgia recruited from primary care settings (N = 208). Instruments: In addition to sociodemographic data, the following questionnaires were administered: Pain Visual Analogue Scale (PVAS), the 10-item Connor-Davidson Resilience scale (10-item CD-RISC), the Fibromyalgia Impact Questionnaire (FIQ), the Hospital Anxiety and Depression Scale (HADS), the Pain Catastrophizing Scale (PCS), the Chronic Pain Acceptance Questionnaire (CPAQ), and the Mindful Attention Awareness Scale (MAAS). Results Regarding construct validity, the factor solution in the Principal Component Analysis (PCA) was considered adequate, so the KMO test had a value of 0.91, and the Barlett’s test of sphericity was significant (χ2 = 852.8; gl = 45; p < 0.001). Only one factor showed an eigenvalue greater than 1, and it explained 50.4% of the variance. PCA and Confirmatory Factor Analysis (CFA) results did not show significant differences between groups. The 10-item CD-RISC scale demonstrated good internal consistency (Cronbach’s alpha = 0.88) and test-retest reliability (r = 0.89 for a six-week interval). The 10-item CD-RISC score was significantly correlated with all of the other psychometric instruments in the expected direction, except for the PVAS (−0.115; p = 0.113). Conclusions Our study confirms that the Spanish version of the 10-item CD-RISC shows, in patients with fibromyalgia, acceptable psychometric properties, with a high level of reliability and validity. PMID:24484847
Li, Jie; Stroebe, Magaret; Chan, Cecilia L W; Chow, Amy Y M
2017-06-01
The rationale, development, and validation of the Bereavement Guilt Scale (BGS) are described in this article. The BGS was based on a theoretically developed, multidimensional conceptualization of guilt. Part 1 describes the generation of the item pool, derived from in-depth interviews, and review of the scientific literature. Part 2 details statistical analyses for further item selection (Sample 1, N = 273). Part 3 covers the psychometric properties of the emergent-BGS (Sample 2, N = 600, and Sample 3, N = 479). Confirmatory factor analysis indicated that a five-factor model fit the data best. Correlations of BGS scores with depression, anxiety, self-esteem, self-forgiveness, and mode of death were consistent with theoretical predictions, supporting the construct validity of the measure. The internal consistency and test-retest reliability were also supported. Thus, initial testing or examination suggests that the BGS is a valid tool to assess multiple components of bereavement guilt. Further psychometric testing across cultures is recommended.
Development of the Attributed Dignity Scale.
Jacelon, Cynthia S; Dixon, Jane; Knafl, Kathleen A
2009-07-01
A sequential, multi-method approach to instrument development beginning with concept analysis, followed by (a) item generation from qualitative data, (b) review of items by expert and lay person panels, (c) cognitive appraisal interviews, (d) pilot testing, and (e) evaluating construct validity was used to develop a measure of attributed dignity in older adults. The resulting positively scored, 23-item scale has three dimensions: Self-Value, Behavioral Respect-Self, and Behavioral Respect-Others. Item-total correlations in the pilot study ranged from 0.39 to 0.85. Correlations between the Attributed Dignity Scale (ADS) and both Rosenberg's Self-Esteem Scale (0.17) and Crowne and Marlowe's Social Desirability Scale (0.36) were modest and in the expected direction, indicating attributed dignity is a related but independent concept. Next steps include testing the ADS with a larger sample to complete factor analysis, test-retest stability, and further study of the relationships between attributed dignity and other concepts.
Validation of Physics Standardized Test Items
NASA Astrophysics Data System (ADS)
Marshall, Jill
2008-10-01
The Texas Physics Assessment Team (TPAT) examined the Texas Assessment of Knowledge and Skills (TAKS) to determine whether it is a valid indicator of physics preparation for future course work and employment, and of the knowledge and skills needed to act as an informed citizen in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam using standard statistical methods employed by test developers (factor analysis and Item Response Theory). Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing..
Hertzog, Christopher; Smith, R Marit; Ariel, Robert
2018-01-01
Background/Study Context: This study evaluated adult age differences in the original three-item Cognitive Reflection Test (CRT; Frederick, 2005, The Journal of Economic Perspectives, 19, 25-42) and an expanded seven-item version of that test (Toplak et al., 2013, Thinking and Reasoning, 20, 147-168). The CRT is a numerical problem-solving test thought to capture a disposition towards either rapid, intuition-based problem solving (Type I reasoning) or a more thoughtful, analytical problem-solving approach (Type II reasoning). Test items are designed to induce heuristically guided errors that can be avoided if using an appropriate numerical representation of the test problems. We evaluated differences between young adults and old adults in CRT performance and correlates of CRT performance. Older adults (ages 60 to 80) were paid volunteers who participated in experiments assessing age differences in self-regulated learning. Young adults (ages 17 to 35) were students participating for pay as part of a project assessing measures of critical thinking skills or as a young comparison group in the self-regulated learning study. There were age differences in the number of CRT correct responses in two independent samples. Results with the original three-item CRT found older adults to have a greater relative proportion of errors based on providing the intuitive lure. However, younger adults actually had a greater proportion of intuitive errors on the long version of the CRT, relative to older adults. Item analysis indicated a much lower internal consistency of CRT items for older adults. These outcomes do not offer full support for the argument that older adults are higher in the use of a "Type I" cognitive style. The evidence was also consistent with an alternative hypothesis that age differences were due to lower levels of numeracy in the older samples. Alternative process-oriented evaluations of how older adults solve CRT items will probably be needed to determine conditions under which older adults manifest an increase in the Type I dispositional tendency to opt for superficial, heuristically guided problem representations in numerical problem-solving tasks.
Hagell, Peter; Westergren, Albert
Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).
Kim, Miyong; Han, Hae-Ra; Phillips, Linda
2003-01-01
Metric equivalence is a quantitative way to assess cross-cultural equivalences of translated instruments by examining the patterns of psychometric properties based on cross-cultural data derived from both versions of the instrument. Metric equivalence checks at item and instrument levels can be used as a valuable tool to refine cross-cultural instruments. Korean and English versions of the Center for Epidemiological Studies-Depression Scale (CES-D) were administered to 154 Korean Americans and 151 Anglo Americans to illustrate approaches to assessing their metric equivalence. Inter-item and item-total correlations, Cronbach's alpha coefficients, and factor analysis were used for metric equivalence checks. The alpha coefficient for the Korean-American sample was 0.85 and 0.92 for the Anglo American sample. Although all items of the CES-D surpassed the desirable minimum of 0.30 in the Anglo American sample, four items did not meet the standard in the Korean American sample. Differences in average inter-item correlations were also noted between the two groups (0.25 for Korean Americans and 0.37 for Anglo Americans). Factor analysis identified two factors for both groups, and factor loadings showed similar patterns and congruence coefficients. Results of the item analysis procedures suggest the possibility of bias in certain items that may influence the sensitivity of the Korean version of the CES-D. These item biases also provide a possible explanation for the alpha differences. Although factor loadings showed similar patterns for the Korean and English versions of the CES-D, factorial similarity alone is not sufficient for testing the universality of the structure underlying an instrument.
PSSA Released Reading Items, 2000-2001. The Pennsylvania System of School Assessment.
ERIC Educational Resources Information Center
Pennsylvania State Dept. of Education, Harrisburg. Bureau of Curriculum and Academic Services.
This document contains materials directly related to the actual reading test of the Pennsylvania System of School Assessment (PSSA), including the reading rubric, released passages, selected-response questions with answer keys, performance tasks, and scored samples of students' responses to the tasks. All of these items may be duplicated to…
A Comparison of Lord's Chi Square and Raju's Area Measures in Detection of DIF.
ERIC Educational Resources Information Center
Cohen, Allan S.; Kim, Seock-Ho
1993-01-01
The effectiveness of two statistical tests of the area between item response functions (exact signed area and exact unsigned area) estimated in different samples, a measure of differential item functioning (DIF), was compared with Lord's chi square. Lord's chi square was found the most effective in determining DIF. (SLD)
Leadership: validation of a self-report scale.
Dussault, Marc; Frenette, Eric; Fernet, Claude
2013-04-01
The aim of this paper was to propose and test the factor structure of a new self-report questionnaire on leadership. A sample of 373 school principals in the Province of Quebec, Canada completed the initial 46-item version of the questionnaire. In order to obtain a questionnaire of minimal length, a four-step procedure was retained. First, items analysis was performed using Classical Test Theory. Second, Rasch analysis was used to identify non-fitting or overlapping items. Third, a confirmatory factor analysis (CFA) using structural equation modelling was performed on the 21 remaining items to verify the factor structure of the scale. Results show that the model with a single third-order dimension (leadership), two second-order dimensions (transactional and transformational leadership), and one first-order dimension (laissez-faire leadership) provides a good fit to the data. Finally, invariance of factor structure was assessed with a second sample of 222 vice-principals in the Province of Quebec, Canada. This model is in agreement with the theoretical model developed by Bass (1985), upon which the questionnaire is based.
An Investigation of Sample Size Splitting on ATFIND and DIMTEST
ERIC Educational Resources Information Center
Socha, Alan; DeMars, Christine E.
2013-01-01
Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…
Analogical reasoning in amazons.
Obozova, Tanya; Smirnova, Anna; Zorina, Zoya; Wasserman, Edward
2015-11-01
Two juvenile orange-winged amazons (Amazona amazonica) were initially trained to match visual stimuli by color, shape, and number of items, but not by size. After learning these three identity matching-to-sample tasks, the parrots transferred discriminative responding to new stimuli from the same categories that had been used in training (other colors, shapes, and numbers of items) as well as to stimuli from a different category (stimuli varying in size). In the critical testing phase, both parrots exhibited reliable relational matching-to-sample (RMTS) behavior, suggesting that they perceived and compared the relationship between objects in the sample stimulus pair to the relationship between objects in the comparison stimulus pairs, even though no physical matches were possible between items in the sample and comparison pairs. The parrots spontaneously exhibited this higher-order relational responding without having ever before been trained on RMTS tasks, therefore joining apes and crows in displaying this abstract cognitive behavior.
Brodey, Benjamin; Purcell, Susan E; Rhea, Karen; Maier, Philip; First, Michael; Zweede, Lisa; Sinisterra, Manuela; Nunn, M Brad; Austin, Marie-Paule; Brodey, Inger S
2018-03-23
The Structured Clinical Interview for DSM (SCID) is considered the gold standard assessment for accurate, reliable psychiatric diagnoses; however, because of its length, complexity, and training required, the SCID is rarely used outside of research. This paper aims to describe the development and initial validation of a Web-based, self-report screening instrument (the Screening Assessment for Guiding Evaluation-Self-Report, SAGE-SR) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and the SCID-5-Clinician Version (CV) intended to make accurate, broad-based behavioral health diagnostic screening more accessible within clinical care. First, study staff drafted approximately 1200 self-report items representing individual granular symptoms in the diagnostic criteria for the 8 primary SCID-CV modules. An expert panel iteratively reviewed, critiqued, and revised items. The resulting items were iteratively administered and revised through 3 rounds of cognitive interviewing with community mental health center participants. In the first 2 rounds, the SCID was also administered to participants to directly compare their Likert self-report and SCID responses. A second expert panel evaluated the final pool of items from cognitive interviewing and criteria in the DSM-5 to construct the SAGE-SR, a computerized adaptive instrument that uses branching logic from a screener section to administer appropriate follow-up questions to refine the differential diagnoses. The SAGE-SR was administered to healthy controls and outpatient mental health clinic clients to assess test duration and test-retest reliability. Cutoff scores for screening into follow-up diagnostic sections and criteria for inclusion of diagnoses in the differential diagnosis were evaluated. The expert panel reduced the initial 1200 test items to 664 items that panel members agreed collectively represented the SCID items from the 8 targeted modules and DSM criteria for the covered diagnoses. These 664 items were iteratively submitted to 3 rounds of cognitive interviewing with 50 community mental health center participants; the expert panel reviewed session summaries and agreed on a final set of 661 clear and concise self-report items representing the desired criteria in the DSM-5. The SAGE-SR constructed from this item pool took an average of 14 min to complete in a nonclinical sample versus 24 min in a clinical sample. Responses to individual items can be combined to generate DSM criteria endorsements and differential diagnoses, as well as provide indices of individual symptom severity. Preliminary measures of test-retest reliability in a small, nonclinical sample were promising, with good to excellent reliability for screener items in 11 of 13 diagnostic screening modules (intraclass correlation coefficient [ICC] or kappa coefficients ranging from .60 to .90), with mania achieving fair test-retest reliability (ICC=.50) and other substance use endorsed too infrequently for analysis. The SAGE-SR is a computerized adaptive self-report instrument designed to provide rigorous differential diagnostic information to clinicians. ©Benjamin Brodey, Susan E Purcell, Karen Rhea, Philip Maier, Michael First, Lisa Zweede, Manuela Sinisterra, M Brad Nunn, Marie-Paule Austin, Inger S Brodey. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 23.03.2018.
Development and Validity Testing of an Arthritis Self-Management Assessment Tool.
Oh, HyunSoo; Han, SunYoung; Kim, SooHyun; Seo, WhaSook
Because of the chronic, progressive nature of arthritis and the substantial effects it has on quality of life, patients may benefit from self-management. However, no valid, reliable self-management assessment tool has been devised for patients with arthritis. This study was conducted to develop a comprehensive self-management assessment tool for patients with arthritis, that is, the Arthritis Self-Management Assessment Tool (ASMAT). To develop a list of qualified items corresponding to the conceptual definitions and attributes of arthritis self-management, a measurement model was established on the basis of theoretical and empirical foundations. Content validity testing was conducted to evaluate whether listed items were suitable for assessing arthritis self-management. Construct validity and reliability of the ASMAT were tested. Construct validity was examined using confirmatory factor analysis and nomological validity. The 32-item ASMAT was developed with a sample composed of patients in a clinic in South Korea. Content validity testing validated the 32 items, which comprised medical (10 items), behavioral (13 items), and psychoemotional (9 items) management subscales. Construct validity testing of the ASMAT showed that the 32 items properly corresponded with conceptual constructs of arthritis self-management, and were suitable for assessing self-management ability in patients with arthritis. Reliability was also well supported. The ASMAT devised in the present study may aid the evaluation of patient self-management ability and the effectiveness of self-management interventions. The authors believe the developed tool may also aid the identification of problems associated with the adoption of self-management practice, and thus improve symptom management, independence, and quality of life of patients with arthritis.
Developing an item bank and short forms that assess the impact of asthma on quality of life.
Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena
2014-02-01
The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. Copyright © 2013 Elsevier Ltd. All rights reserved.
Development of the PROMIS coping expectancies of smoking item banks.
Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li
2014-09-01
Smoking is a coping strategy for many smokers who then have difficulty finding new ways to cope with negative affect when they quit. This paper describes analyses conducted to develop and evaluate item banks for assessing the coping expectancies of smoking for daily and nondaily smokers. Using data from a large sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning (DIF) analyses (according to gender, age, and ethnicity) to arrive at a unidimensional set of items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) for assessing coping expectancies of smoking. For both daily and nondaily smokers, the unidimensional Coping Expectancies item banks (21 items) are relatively DIF free and are highly reliable (0.96 and 0.97, respectively). A common 4-item SF for daily and nondaily smokers also showed good reliability (0.85). Adaptive tests required an average of 4.3 and 3.7 items for simulated daily and nondaily respondents, respectively, and achieved reliabilities of 0.91 for both when the maximum test length was 10 items. This research provides a new set of items that can be used to reliably assess coping expectancies of smoking, through a SF, CAT, or a tailored set selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Measurement invariance across Genders on the Childhood Illness Attitude Scales (CIAS).
Thorisdottir, Audur S; Villadsen, Anna; LeBouthillier, Daniel M; Rask, Charlotte Ulrikka; Wright, Kristi D; Walker, John R; Feldgaier, Steven; Asmundson, Gordon J G
2017-07-01
The Childhood Illness Attitude Scales (CIAS) were created as a developmentally appropriate measure for symptoms of health anxiety (HA) in school-aged children. Despite overall sound psychometric properties reported in previous studies, more comprehensive examination of the latent structure and potential response bias in the CIAS is needed. The purpose of the present study was to cross-validate the latent structure of the CIAS across genders and to examine gender-specific variations in CIAS scores. The sample comprised data from 602 Canadian and Danish school-aged children (M age =10.54, SD=0.99; 52.5% girls). Confirmatory factor analyses were conducted to test 3-, modified 3-, and 4-factor models in both samples. Multigroup confirmatory factor analysis was performed to test factor structure invariance across boys and girls in a combined sample. Differential Item Functioning (DIF) was assessed using test characteristic curves. A modified 3-factor solution (i.e., fears=11 items, help-seeking=6 items, and symptom effects=4 items) provided the best fit to the data (χ 2 (364, N=602)=681.7, p<0.001; χ 2 /df=1.803; RMSEA=0.037; CFI=0.926). The factor structure was stable, well-fitting, and indicated measurement invariance across groups. DIF analyses revealed no gender-based response bias at the scale level. Results support a revised 3-factor version of the CIAS that can be used with confidence to assess symptoms of HA in school-aged boys and girls. Copyright © 2017 Elsevier Inc. All rights reserved.
Maples, Jessica L; Carter, Nathan T; Few, Lauren R; Crego, Cristina; Gore, Whitney L; Samuel, Douglas B; Williamson, Rachel L; Lynam, Donald R; Widiger, Thomas A; Markon, Kristian E; Krueger, Robert F; Miller, Joshua D
2015-12-01
The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) includes an alternative model of personality disorders (PDs) in Section III, consisting in part of a pathological personality trait model. To date, the 220-item Personality Inventory for DSM-5 (PID-5; Krueger, Derringer, Markon, Watson, & Skodol, 2012) is the only extant self-report instrument explicitly developed to measure this pathological trait model. The present study used item response theory-based analyses in a large sample (n = 1,417) to investigate whether a reduced set of 100 items could be identified from the PID-5 that could measure the 25 traits and 5 domains. This reduced set of PID-5 items was then tested in a community sample of adults currently receiving psychological treatment (n = 109). Across a wide range of criterion variables including NEO PI-R domains and facets, DSM-5 Section II PD scores, and externalizing and internalizing outcomes, the correlational profiles of the original and reduced versions of the PID-5 were nearly identical (rICC = .995). These results provide strong support for the hypothesis that an abbreviated set of PID-5 items can be used to reliably, validly, and efficiently assess these personality disorder traits. The ability to assess the DSM-5 Section III traits using only 100 items has important implications in that it suggests these traits could still be measured in settings in which assessment-related resources (e.g., time, compensation) are limited. (c) 2015 APA, all rights reserved).
2017-01-01
Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. Objective The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. Methods We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima’s graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). Results Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. Conclusions Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment. PMID:28931496
2015-01-01
Purpose: The situational judgment test (SJT) shows promise for assessing the non-cognitive skills of medical school applicants, but has only been used in Europe. Since the admissions processes and education levels of applicants to medical school are different in the United States and in Europe, it is necessary to obtain validity evidence of the SJT based on a sample of United States applicants. Methods: Ninety SJT items were developed and Kane’s validity framework was used to create a test blueprint. A total of 489 applicants selected for assessment/interview day at the University of Utah School of Medicine during the 2014-2015 admissions cycle completed one of five SJTs, which assessed professionalism, coping with pressure, communication, patient focus, and teamwork. Item difficulty, each item’s discrimination index, internal consistency, and the categorization of items by two experts were used to create the test blueprint. Results: The majority of item scores were within an acceptable range of difficulty, as measured by the difficulty index (0.50-0.85) and had fair to good discrimination. However, internal consistency was low for each domain, and 63% of items appeared to assess multiple domains. The concordance of categorization between the two educational experts ranged from 24% to 76% across the five domains. Conclusion: The results of this study will help medical school admissions departments determine how to begin constructing a SJT. Further testing with a more representative sample is needed to determine if the SJT is a useful assessment tool for measuring the non-cognitive skills of medical school applicants. PMID:26582629
Criterion-Referenced Testing: A Critical Analysis of Selected Models
1978-08-01
158025 .159372 4 (all .5 .5 0 0 .5 .5 fai l) a, bme probability that a master will be misclassified when the cutoff score is set at 2 correct equals...used the 45-item spiral - omnibus intelligence test for screening applicants to the Australian Army or Royal Australian Navy. Samples of 608 recruit...applicants to the Citizen Military Force (CM?) and 874 recruit applicants to the Royal Australian Navy were studied. Twelve items were deleted for zero
Park, Myung Sook; Kang, Kyung Ja; Jang, Sun Joo; Lee, Joo Yun; Chang, Sun Ju
2018-03-01
This study aimed to evaluate the components of test-retest reliability including time interval, sample size, and statistical methods used in patient-reported outcome measures in older people and to provide suggestions on the methodology for calculating test-retest reliability for patient-reported outcomes in older people. This was a systematic literature review. MEDLINE, Embase, CINAHL, and PsycINFO were searched from January 1, 2000 to August 10, 2017 by an information specialist. This systematic review was guided by both the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist and the guideline for systematic review published by the National Evidence-based Healthcare Collaborating Agency in Korea. The methodological quality was assessed by the Consensus-based Standards for the selection of health Measurement Instruments checklist box B. Ninety-five out of 12,641 studies were selected for the analysis. The median time interval for test-retest reliability was 14days, and the ratio of sample size for test-retest reliability to the number of items in each measure ranged from 1:1 to 1:4. The most frequently used statistical methods for continuous scores was intraclass correlation coefficients (ICCs). Among the 63 studies that used ICCs, 21 studies presented models for ICC calculations and 30 studies reported 95% confidence intervals of the ICCs. Additional analyses using 17 studies that reported a strong ICC (>0.09) showed that the mean time interval was 12.88days and the mean ratio of the number of items to sample size was 1:5.37. When researchers plan to assess the test-retest reliability of patient-reported outcome measures for older people, they need to consider an adequate time interval of approximately 13days and the sample size of about 5 times the number of items. Particularly, statistical methods should not only be selected based on the types of scores of the patient-reported outcome measures, but should also be described clearly in the studies that report the results of test-retest reliability. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D
2014-05-01
The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
ERIC Educational Resources Information Center
Casas, Ferran; Baltatescu, Sergiu; Bertran, Irma; Gonzalez, Monica; Hatos, Adrian
2013-01-01
This paper presents results from two samples of adolescents aged 13-16 from Romania and Spain (N = 930 + 1,945 = 2,875). The original 7-item version of the Personal Well-Being Index (PWI) was used, together with an item on overall life satisfaction (OLS) and a set of six items related to satisfaction with school. A confirmatory factor analysis of…
An Investigation of the Sampling Distributions of Equating Coefficients.
ERIC Educational Resources Information Center
Baker, Frank B.
1996-01-01
Using the characteristic curve method for dichotomously scored test items, the sampling distributions of equating coefficients were examined. Simulations indicate that for the equating conditions studied, the sampling distributions of the equating coefficients appear to have acceptable characteristics, suggesting confidence in the values obtained…
A signal detection-item response theory model for evaluating neuropsychological measures.
Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G
2018-02-05
Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
Answering the call: a tool that measures functional breast cancer literacy.
Williams, Karen Patricia; Templin, Thomas N; Hines, Resche D
2013-01-01
There is a need for health care providers and health care educators to ensure that the messages they communicate are understood. The purpose of this research was to test the reliability and validity, in a culturally diverse sample of women, of a revised Breast Cancer Literacy Assessment Tool (Breast-CLAT) designed to measure functional understanding of breast cancer in English, Spanish, and Arabic. Community health workers verbally administered the 35-item Breast-CLAT to 543 Black, Latina, and Arab American women. A confirmatory factor analysis using a 2-parameter item response theory model was used to test the proposed 3-factor Breast-CLAT (awareness, screening and knowledge, and prevention and control). The confirmatory factor analysis using a 2-parameter item response theory model had a good fit (TLI = .91, RMSEA = .04) to the proposed 3-factor structure. The total scale reliability ranged from .80 for Black participants to .73 for total culturally diverse sample. The three subscales were differentially predictive of family history of cancer. The revised Breast-CLAT scales demonstrated internal consistency reliability and validity in this multiethnic, community-based sample.
Some Improved Diagnostics for Failure of The Rasch Model.
ERIC Educational Resources Information Center
Molenaar, Ivo W.
1983-01-01
Goodness of fit tests for the Rasch model are typically large-sample, global measures. This paper offers suggestions for small-sample exploratory techniques for examining the fit of item data to the Rasch model. (Author/JKS)
Rose, M; Bjorner, J B; Becker, J; Fries, J F; Ware, J E
2008-01-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.
Development and Testing of the Church Environment Audit Tool.
Kaczynski, Andrew T; Jake-Schoffman, Danielle E; Peters, Nathan A; Dunn, Caroline G; Wilcox, Sara; Forthofer, Melinda
2018-05-01
In this paper, we describe development and reliability testing of a novel tool to evaluate the physical environment of faith-based settings pertaining to opportunities for physical activity (PA) and healthy eating (HE). Tool development was a multistage process including a review of similar tools, stakeholder review, expert feedback, and pilot testing. Final tool sections included indoor opportunities for PA, outdoor opportunities for PA, food preparation equipment, kitchen type, food for purchase, beverages for purchase, and media. Two independent audits were completed at 54 churches. Interrater reliability (IRR) was determined with Kappa and percent agreement. Of 218 items, 102 were assessed for IRR and 116 could not be assessed because they were not present at enough churches. Percent agreement for all 102 items was over 80%. For 42 items, the sample was too homogeneous to assess Kappa. Forty-six of the remaining items had Kappas greater than 0.60 (25 items 0.80-1.00; 21 items 0.60-0.79), indicating substantial to almost perfect agreement. The tool proved reliable and efficient for assessing church environments and identifying potential intervention points. Future work can focus on applications within faith-based partnerships to understand how church environments influence diverse health outcomes.
Commutability of food microbiology proficiency testing samples.
Abdelmassih, M; Polet, M; Goffaux, M-J; Planchon, V; Dierick, K; Mahillon, J
2014-03-01
Food microbiology proficiency testing (PT) is a useful tool to assess the analytical performances among laboratories. PT items should be close to routine samples to accurately evaluate the acceptability of the methods. However, most PT providers distribute exclusively artificial samples such as reference materials or irradiated foods. This raises the issue of the suitability of these samples because the equivalence-or 'commutability'-between results obtained on artificial vs. authentic food samples has not been demonstrated. In the clinical field, the use of noncommutable PT samples has led to erroneous evaluation of the performances when different analytical methods were used. This study aimed to provide a first assessment of the commutability of samples distributed in food microbiology PT. REQUASUD and IPH organized 13 food microbiology PTs including 10-28 participants. Three types of PT items were used: genuine food samples, sterile food samples and reference materials. The commutability of the artificial samples (reference material or sterile samples) was assessed by plotting the distribution of the results on natural and artificial PT samples. This comparison highlighted matrix-correlated issues when nonfood matrices, such as reference materials, were used. Artificially inoculated food samples, on the other hand, raised only isolated commutability issues. In the organization of a PT-scheme, authentic or artificially inoculated food samples are necessary to accurately evaluate the analytical performances. Reference materials, used as PT items because of their convenience, may present commutability issues leading to inaccurate penalizing conclusions for methods that would have provided accurate results on food samples. For the first time, the commutability of food microbiology PT samples was investigated. The nature of the samples provided by the organizer turned out to be an important factor because matrix effects can impact on the analytical results. © 2013 The Society for Applied Microbiology.
Boston, Raymond C.; Coyne, James C.; Farrar, John T.
2010-01-01
Objective To develop and psychometrically test an owner self-administered questionnaire designed to assess severity and impact of chronic pain in dogs with osteoarthritis. Sample Population 70 owners of dogs with osteoarthritis and 50 owners of clinically normal dogs. Procedures Standard methods for the stepwise development and testing of instruments designed to assess subjective states were used. Items were generated through focus groups and an expert panel. Items were tested for readability and ambiguity, and poorly performing items were removed. The reduced set of items was subjected to factor analysis, reliability testing, and validity testing. Results Severity of pain and interference with function were 2 factors identified and named on the basis of the items contained in them. Cronbach’s α was 0.93 and 0.89, respectively, suggesting that the items in each factor could be assessed as a group to compute factor scores (ie, severity score and interference score). The test-retest analysis revealed κ values of 0.75 for the severity score and 0.81 for the interference score. Scores correlated moderately well (r = 0.51 and 0.50, respectively) with the overall quality-of-life (QOL) question, such that as severity and interference scores increased, QOL decreased. Clinically normal dogs had significantly lower severity and interference scores than dogs with osteoarthritis. Conclusions and Clinical Relevance A psychometrically sound instrument was developed. Responsiveness testing must be conducted to determine whether the questionnaire will be useful in reliably obtaining quantifiable assessments from owners regarding the severity and impact of chronic pain and its treatment on dogs with osteoarthritis. PMID:17542696
The role of difficulty and gender in numbers, algebra, geometry and mathematics achievement
NASA Astrophysics Data System (ADS)
Rabab'h, Belal Sadiq Hamed; Veloo, Arsaythamby; Perumal, Selvan
2015-05-01
This study aims to identify the role of difficulty and gender in numbers, algebra, geometry and mathematics achievement among secondary schools students in Jordan. The respondent of the study were 337 students from eight public secondary school in Alkoura district by using stratified random sampling. The study comprised of 179 (53%) males and 158 (47%) females students. The mathematics test comprises of 30 items which has eight items for numbers, 14 items for algebra and eight items for geometry. Based on difficulties among male and female students, the findings showed that item 4 (fractions - 0.34) was most difficult for male students and item 6 (square roots - 0.39) for females in numbers. For the algebra, item 11 (inequality - 0.23) was most difficult for male students and item 6 (algebraic expressions - 0.35) for female students. In geometry, item 3 (reflection - 0.34) was most difficult for male students and item 8 (volume - 0.33) for female students. Based on gender differences, female students showed higher achievement in numbers and algebra compare to male students. On the other hand, there was no differences between male and female students achievement in geometry test. This study suggest that teachers need to give more attention on numbers and algebra when teaching mathematics.
2014-01-01
Background Latino preschoolers (3-5 year old children) have among the highest rates of obesity. Low levels of physical activity (PA) are a risk factor for obesity. Characterizing what Latino parents do to encourage or discourage their preschooler to be physically active can help inform interventions to increase their PA. The objective was therefore to develop and assess the psychometrics of a new instrument: the Preschooler Physical Activity Parenting Practices (PPAPP) among a Latino sample, to assess parenting practices used to encourage or discourage PA among preschool-aged children. Methods Cross-sectional study of 240 Latino parents who reported the frequency of using PA parenting practices. 95% of respondents were mothers; 42% had more than a high school education. Child mean age was 4.5 (±0.9) years (52% male). Test-retest reliability was assessed in 20%, 2 weeks later. We assessed the fit of a priori models using Confirmatory factor analyses (CFA). In a separate sub-sample (35%), preschool-aged children wore accelerometers to assess associations with their PA and PPAPP subscales. Results The a-priori models showed poor fit to the data. A modified factor structure for encouraging PPAPP had one multiple-item scale: engagement (15 items), and two single-items (have outdoor toys; not enroll in sport-reverse coded). The final factor structure for discouraging PPAPP had 4 subscales: promote inactive transport (3 items), promote screen time (3 items), psychological control (4 items) and restricting for safety (4 items). Test-retest reliability (ICC) for the two scales ranged from 0.56-0.85. Cronbach’s alphas ranged from 0.5-0.9. Several sub-factors correlated in the expected direction with children’s objectively measured PA. Conclusion The final models for encouraging and discouraging PPAPP had moderate to good fit, with moderate to excellent test-retest reliabilities. The PPAPP should be further evaluated to better assess its associations with children’s PA and offers a new tool for measuring PPAPP among Latino families with preschool-aged children. PMID:24428935
Berman, Rebecca L; Iris, Madelyn; Conrad, Kendon J; Robinson, Carrie
2018-01-01
Older adults taking multiple prescription and nonprescription drugs are at risk for medication use problems, yet there are few brief, self-administered screening tools designed specifically for them. The study objective was to develop and validate a patient-centered screener for community-dwelling older adults. In phase 1, a convenience sample of 57 stakeholders (older adults, pharmacists, nurses, and physicians) participated in concept mapping, using Concept System® Global MAX TM , to identify items for a questionnaire. In phase 2, a 40-item questionnaire was tested with a convenience sample of 377 adults and a 24-item version was tested with 306 older adults, aged 55 and older, using Rasch methodology. In phase 3, stakeholder focus groups provided feedback on the format of questionnaire materials and recommended strategies for addressing problems. The concept map contained 72 statements organized into 6 conceptual clusters or domains. The 24-item screener was unidimensional. Cronbach's alpha was .87, person reliability was acceptable (.74), and item reliability was high (.96). The MedUseQ is a validated, patient-centered tool targeting older adults that can be used to assess a wide range of medication use problems in clinical and community settings and to identify areas for education, intervention, or further assessment.
Rose, Matthias; Bjorner, Jakob B; Gandek, Barbara; Bruce, Bonnie; Fries, James F; Ware, John E
2014-05-01
To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n>2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range. Copyright © 2014. Published by Elsevier Inc.
Bernstein, Ira H.; Rush, A. John; Carmody, Thomas J.; Woo, Ada; Trivedi, Madhukar H.
2007-01-01
Objectives Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR16) and clinician-rated (QIDS-C16) versions of the 16-item Quick Inventory of Depressive Symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. Methods The QIDS-SR16 and QIDS-C16 were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR16 and QIDS-C16. Results The nine symptom domains in the QIDS-SR16 and QIDS-C16 related well to overall depression. The slopes of the item response functions a), which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, bi) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C16 and QIDS-SR16. Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. Conclusion In this less educated, socially disadvantaged sample, differences between the QIDS-C16 and QIDS-SR16 were minor. The QIDS-SR16 is a satisfactory substitute for the more time-consuming QIDS-C16 in a broad range of adult, nonpsychotic, depressed outpatients. PMID:16716351
Bernstein, Ira H; Rush, A John; Carmody, Thomas J; Woo, Ada; Trivedi, Madhukar H
2007-01-01
Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR(16)) and clinician-rated (QIDS-C(16)) versions of the 16-item quick inventory of depressive symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. The QIDS-SR(16) and QIDS-C(16) were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR(16) and QIDS-C(16). The nine symptom domains in the QIDS-SR(16) and QIDS-C(16) related well to overall depression. The slopes of the item response functions, a, which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, b(i) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C(16) and QIDS-SR(16). Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. In this less educated, socially disadvantaged sample, differences between the QIDS-C(16) and QIDS-SR(16) were minor. The QIDS-SR(16) is a satisfactory substitute for the more time-consuming QIDS-C(16) in a broad range of adult, nonpsychotic, depressed outpatients.
ERIC Educational Resources Information Center
Randall, Jennifer; Engelhard, George, Jr.
2010-01-01
The psychometric properties and multigroup measurement invariance of scores across subgroups, items, and persons on the "Reading for Meaning" items from the Georgia Criterion Referenced Competency Test (CRCT) were assessed in a sample of 778 seventh-grade students. Specifically, we sought to determine the extent to which score-based…
Inclusion of Community in Self Scale: A Single-Item Pictorial Measure of Community Connectedness
ERIC Educational Resources Information Center
Mashek, Debra; Cannaday, Lisa W.; Tangney, June P.
2007-01-01
We developed a single-item pictorial measure of community connectedness, building on the theoretical and methodological traditions of the self-expansion model (Aron & Aron, 1986). The Inclusion of Community in the Self (ICS) Scale demonstrated excellent test-retest reliability, convergent validity, and discriminant validity in a sample of 190…
Validity Evidence for Eating Attitudes Test Scores in a Sample of Female College Athletes
ERIC Educational Resources Information Center
Doninger, Gretchen L.; Enders, Craig K.; Burnett, Kent F.
2005-01-01
The purpose of this study was to examine the psychometric properties of the 26-item Eating Attitudes Test (EAT-26; Garner, Olmsted, Bohr, & Garfinkel, 1982) using a sample of 207 female college athletes. Previous studies using nonathlete populations have supported a number of factor structures, but a series of confirmatory factor analyses…
Ruiz, Miguel A; González-Porras, José Ramón; Aranguren, José Luis; Franco, Eduardo; Villasante, Fernando; Tuñón, José; González-López, Tomás José; de Salas-Cansado, Marina; Soto, Javier
2017-03-01
To develop a new questionnaire with good psychometric properties to measure satisfaction with medical care in patients with non-valvular atrial fibrillation. The initial instrument was composed of 37 items, arranged in 6 dimensions: efficacy, ease and convenience, impact on daily activities, satisfaction with medical care, undesired effects of medication, and overall satisfaction. Items and dimensions were extracted from reviewing existing instruments, 3 focus groups with chronic patients, and a panel of 8 experts. Additionally, 3 visual analog scales measuring quality of life, effectiveness, and overall satisfaction were administered. A convenience sample of 119 patients was used for item reduction. Classic psychometric theory and item analysis techniques were used (exploratory factor and confirmatory factor analysis, test-retest, and correlation with visual scales). A validation sample of 230 patients was used to assess convergent validity, and an additional 220 patients sample was used to discriminate between treatment and compliance groups. The questionnaire was reduced in length to 25 items, but the impact dimension had split in treatment inconvenience and treatment control. Overall reliability was high (α = 0.861) with acceptable dimensional reliabilities (α = 0.764-0.908). Individual dimensions correlated to varying degrees. Test-retest correlations were high (r = 0.784-0.965), and correlations with visual and already validated scales were substantial. Differences were detected between antivitamin K and new-oral-anticoagulant treatments in several dimensions (p < 0.05). Treatment satisfaction was related with compliance. This new 25-item questionnaire has good psychometric properties for measuring satisfaction with medical care in patients with this condition. It is capable of detecting differences between different treatments.
Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C
2015-12-01
The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Garcia-Martinez, Irma; Weiss, Theresa R; Yousaf, Muhammad N; Ali, Ather; Mehal, Wajahat Z
2018-01-01
Leukocyte activation (LA) testing identifies food items that induce a patient specific cellular response in the immune system, and has recently been shown in a randomized double blinded prospective study to reduce symptoms in patients with irritable bowel syndrome (IBS). We hypothesized that test reactivity to particular food items, and the systemic immune response initiated by these food items, is due to the release of cellular DNA from blood immune cells. We tested this by quantifying total DNA concentration in the cellular supernatant of immune cells exposed to positive and negative foods from 20 healthy volunteers. To establish if the DNA release by positive samples is a specific phenomenon, we quantified myeloperoxidase (MPO) in cellular supernatants. We further assessed if a particular immune cell population (neutrophils, eosinophils, and basophils) was activated by the positive food items by flow cytometry analysis. To identify the signaling pathways that are required for DNA release we tested if specific inhibitors of key signaling pathways could block DNA release. Foods with a positive LA test result gave a higher supernatant DNA content when compared to foods with a negative result. This was specific as MPO levels were not increased by foods with a positive LA test. Protein kinase C (PKC) inhibitors resulted in inhibition of positive food stimulated DNA release. Positive foods resulted in CD63 levels greater than negative foods in eosinophils in 76.5% of tests. LA test identifies food items that result in release of DNA and activation of peripheral blood innate immune cells in a PKC dependent manner, suggesting that this LA test identifies food items that result in release of inflammatory markers and activation of innate immune cells. This may be the basis for the improvement in symptoms in IBS patients who followed an LA test guided diet.
Development and Validation of a Computerized-Adaptive Test for PTSD (P-CAT).
Eisen, Susan V; Schultz, Mark R; Ni, Pengsheng; Haley, Stephen M; Smith, Eric G; Spiro, Avron; Osei-Bonsu, Princess E; Nordberg, Sam; Jette, Alan M
2016-10-01
The primary purpose was to develop, field test, and validate a computerized-adaptive test (CAT) for posttraumatic stress disorder (PTSD) to enhance PTSD assessment and decrease the burden of symptom monitoring. Data sources included self-report and interviewer-administered diagnostic interviews. The sample included 1,288 veterans. In phase 1, 89 items from a previously developed PTSD item pool were administered to a national sample of 1,085 veterans. A multidimensional graded-response item response theory model was used to calibrate items for incorporation into a CAT for PTSD (P-CAT). In phase 2, in a separate sample of 203 veterans, the P-CAT was validated against three other self-report measures (PTSD Checklist, Civilian Version; Mississippi Scale for Combat-Related PTSD; and Primary Care PTSD Screen) and the PTSD module of the Structured Clinical Interview for DSM-IV. A bifactor model with one general PTSD factor and four subfactors consistent with DSM-5 (reexperiencing, avoidance, negative mood-cognitions, and arousal), yielded good fit. The P-CAT discriminated veterans with PTSD from those with other mental health conditions and those with no mental health conditions (Cohen's d effect sizes >.90). The P-CAT also discriminated those with and without a PTSD diagnosis and those who screened positive versus negative for PTSD. Concurrent validity was supported by high correlations (r=.85-.89) with the validation measures. The P-CAT appears to be a promising tool for efficient and accurate assessment of PTSD symptomatology. Further testing is needed to evaluate its responsiveness to change. With increasing availability of computers and other technologies, CAT may be a viable and efficient assessment method.
Development and psychometric testing of the Nursing Workplace Relational Environment Scale (NWRES).
Duddle, Maree; Boughton, Maureen
2009-03-01
The aim of this study was to develop and test the psychometric properties of the Nursing Workplace Relational Environment Scale (NWRES). A positive relational environment in the workplace is characterised by a sense of connectedness and belonging, support and cooperation among colleagues, open communication and effectively managed conflict. A poor relational environment in the workplace may contribute to job dissatisfaction and early turnover of staff. Quantitative survey. A three-stage process was used to design and test the NWRES. In Stage 1, an extensive literature review was conducted on professional working relationships and the nursing work environment. Three key concepts; collegiality, workplace conflict and job satisfaction were identified and defined. In Stage 2, a pool of items was developed from the dimensions of each concept and formulated into a 35-item scale which was piloted on a convenience sample of 31 nurses. In Stage 3, the newly refined 28-item scale was administered randomly to a convenience sample of 150 nurses. Psychometric testing was conducted to establish the construct validity and reliability of the scale. Exploratory factor analysis resulted in a 22-item scale. The factor analysis indicated a four-factor structure: collegial behaviours, relational atmosphere, outcomes of conflict and job satisfaction which explained 68.12% of the total variance. Cronbach's alpha coefficient for the NWRES was 0.872 and the subscales ranged from 0.781-0.927. The results of the study confirm the reliability and validity of the NWRES. Replication of this study with a larger sample is indicated to determine relationships among the subscales. The results of this study have implications for health managers in terms of understanding the impact of the relational environment of the workplace on job satisfaction and retention.
Beitra, Danette; El-Behadli, Ana F; Faith, Melissa A
2018-01-01
The aim of this study is to conduct a multimethod psychometric reduction in the Parents' Beliefs about Children's Emotions (PBCE) questionnaire using an item response theory framework with a pediatric oncology sample. Participants were 216 pediatric oncology caregivers who completed the PBCE. The PBCE contains 105 items (11 subscales) rated on a 6-point Likert-type scale. We evaluated the PBCE subscale performance by applying a partial credit model in WINSTEPS. Sixty-six statistically weak items were removed, creating a 44-item PBCE questionnaire with 10 subscales and 3 response options per item. The refined scale displayed good psychometric properties and correlated .910 with the original PBCE. Additional analyses examined dimensionality, item-level (e.g. difficulty), and person-level (e.g. ethnicity) characteristics. The refined PBCE questionnaire provides better test information, improves instrument reliability, and reduces burden on families, providers, and researchers. With this improved measure, providers can more easily identify families who may benefit from psychosocial interventions targeting emotion socialization. The results of the multistep approach presented should be considered preliminary, given the limited sample size.
Rhodes, Alison M; Tran, Thanh V
2013-02-01
This study examined the equivalence or comparability of the measurement properties of seven selected items measuring posttraumatic growth among self-identified Black (n = 270) and White (n = 707) adult survivors of Hurricane Katrina, using data from the Baseline Survey of the Hurricane Katrina Community Advisory Group Study. Internal consistency reliability was equally good for both groups (Cronbach's alphas = .79), as were correlations between individual scale items and their respective overall scale. Confirmatory factor analysis of a congeneric measurement model of seven selected items of posttraumatic growth showed adequate measures of fit for both groups. The results showed only small variation in magnitude of factor loadings and measurement errors between the two samples. Tests of measurement invariance showed mixed results, but overall indicated that factor loading, error variance, and factor variance were similar between the two samples. These seven selected items can be useful for future large-scale surveys of posttraumatic growth.
Using a psychometric lens to examine gender differences on the FCI
NASA Astrophysics Data System (ADS)
Lindell, Rebecca; Papak, Alexis; Stewart, John; Traxler, Adrienne
2017-01-01
Multiple research studies show that there appears to be an inherent difference between male and female students' performance on the Force Concept Inventory (FCI). Unlike these studies, we chose to create two different samples, one with only female students and the other with only male students, to reduce the effects of the gender-imbalance inherent in a single sample of all physics students. Using a psychometric lens, we evaluate the differences between the male and female students' performance on the FCI. We utilized classical test theory to flag 13 items on the FCI that were poorly functioning for female students. Notably, most of these items were not flagged when the dataset was aggregated across genders. In the next stage of the research, we utilized Item Response Theory (IRT) to discover if the remaining 17 items on the FCI are also poorly functioning for female students. By eliminating the poorly functioning items on the FCI, we further examined the gender difference of the Force Concept Inventory.
Victoria Symptom Validity Test performance in children and adolescents with neurological disorders.
Brooks, Brian L
2012-12-01
It is becoming increasingly more important to study, use, and promote the utility of measures that are designed to detect non-compliance with testing (i.e., poor effort, symptom non-validity, response bias) as part of neuropsychological assessments with children and adolescents. Several measures have evidence for use in pediatrics, but there is a paucity of published support for the Victoria Symptom Validity Test (VSVT) in this population. The purpose of this study was to examine the performance on the VSVT in a sample of pediatric patients with known neurological disorders. The sample consisted of 100 consecutively referred children and adolescents between the ages of 6 and 19 years (mean = 14.0, SD = 3.1) with various neurological diagnoses. On the VSVT total items, 95% of the sample had performance in the "valid" range, with 5% being deemed "questionable" and 0% deemed "invalid". On easy items, 97% were "valid", 2% were "questionable", and 1% was "invalid." For difficult items, 84% were "valid," 16% were "questionable," and 0% was "invalid." For those patients given two effort measures (i.e., VSVT and Test of Memory Malingering; n = 65), none was identified as having poor test-taking compliance on both measures. VSVT scores were significantly correlated with age, intelligence, processing speed, and functional ratings of daily abilities (attention, executive functioning, and adaptive functioning), but not objective performance on the measure of sustained attention, verbal memory, or visual memory. The VSVT has potential to be used in neuropsychological assessments with pediatric patients.
Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee
2013-07-01
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
ERIC Educational Resources Information Center
Li, Yuan H.; Yang, Yu N.; Tompkins, Leroy J.; Modarresi, Shahpar
2005-01-01
The statistical technique, "Zero-One Linear Programming," that has successfully been used to create multiple tests with similar characteristics (e.g., item difficulties, test information and test specifications) in the area of educational measurement, was deemed to be a suitable method for creating multiple sets of matched samples to be…
Scale Development for Measuring and Predicting Adolescents’ Leisure Time Physical Activity Behavior
Ries, Francis; Romero Granados, Santiago; Arribas Galarraga, Silvia
2009-01-01
The aim of this study was to develop a scale for assessing and predicting adolescents’ physical activity behavior in Spain and Luxembourg using the Theory of Planned Behavior as a framework. The sample was comprised of 613 Spanish (boys = 309, girls = 304; M age =15.28, SD =1.127) and 752 Luxembourgish adolescents (boys = 343, girls = 409; M age = 14.92, SD = 1.198), selected from students of two secondary schools in both countries, with a similar socio-economic status. The initial 43-items were all scored on a 4-point response format using the structured alternative format and translated into Spanish, French and German. In order to ensure the accuracy of the translation, standardized parallel back-translation techniques were employed. Following two pilot tests and subsequent revisions, a second order exploratory factor analysis with oblimin direct rotation was used for factor extraction. Internal consistency and test-retest reliabilities were also tested. The 4-week test-retest correlations confirmed the items’ time stability. The same five factors were obtained, explaining 63.76% and 63.64% of the total variance in both samples. Internal consistency for the five factors ranged from α = 0.759 to α = 0. 949 in the Spanish sample and from α = 0.735 to α = 0.952 in the Luxembourgish sample. For both samples, inter-factor correlations were all reported significant and positive, except for Factor 5 where they were significant but negative. The high internal consistency of the subscales, the reported item test-retest reliabilities and the identical factor structure confirm the adequacy of the elaborated questionnaire for assessing the TPB-based constructs when used with a population of adolescents in Spain and Luxembourg. The results give some indication that they may have value in measuring the hypothesized TPB constructs for PA behavior in a cross-cultural context. Key points When using the structured alternative format, weak internal consistency was obtained. Rephrasing the items and scoring items on a Likert-type scale enhanced greatly the subscales reliability. Identical factorial structure was extracted for both culturally different samples. The obtained factors, namely perceived physical competence, parents’ physical activity, perceived resources support, attitude toward physical activity and perceived parental support were hypothesized as for the original TPB constructs. PMID:24149606
Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj
2016-12-01
The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
Job Specific Tests and an Overview of Research on Alternatives.
ERIC Educational Resources Information Center
MacLane, Charles N.; O'Leary, Brian S.
The development of job-specific tests (JSTs) for two occupations is discussed. A reading comprehension test and a mathematical reasoning test were developed for Customs Inspectors, and a reading comprehension test was developed for Social Security Claims workers. JST items incorporated reading samples or math problems from those found on the job.…
Ni, Pengsheng; McDonough, Christine M.; Jette, Alan M.; Bogusz, Kara; Marfeo, Elizabeth E.; Rasch, Elizabeth K.; Brandt, Diane E.; Meterko, Mark; Chan, Leighton
2014-01-01
Objectives To develop and test an instrument to assess physical function (PF) for Social Security Administration (SSA) disability programs, the SSA-PF. Item Response Theory (IRT) analyses were used to 1) create a calibrated item bank for each of the factors identified in prior factor analyses, 2) assess the fit of the items within each scale, 3) develop separate Computer-Adaptive Test (CAT) instruments for each scale, and 4) conduct initial psychometric testing. Design Cross-sectional data collection; IRT analyses; CAT simulation. Setting Telephone and internet survey. Participants Two samples: 1,017 SSA claimants, and 999 adults from the US general population. Interventions None. Main Outcome Measure Model fit statistics, correlation and reliability coefficients, Results IRT analyses resulted in five unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. Comparing the simulated CATs to the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared to those of a sample of US adults. Conclusions The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. PMID:23578594
Ni, Pengsheng; McDonough, Christine M; Jette, Alan M; Bogusz, Kara; Marfeo, Elizabeth E; Rasch, Elizabeth K; Brandt, Diane E; Meterko, Mark; Haley, Stephen M; Chan, Leighton
2013-09-01
To develop and test an instrument to assess physical function for Social Security Administration (SSA) disability programs, the SSA-Physical Function (SSA-PF) instrument. Item response theory (IRT) analyses were used to (1) create a calibrated item bank for each of the factors identified in prior factor analyses, (2) assess the fit of the items within each scale, (3) develop separate computer-adaptive testing (CAT) instruments for each scale, and (4) conduct initial psychometric testing. Cross-sectional data collection; IRT analyses; CAT simulation. Telephone and Internet survey. Two samples: SSA claimants (n=1017) and adults from the U.S. general population (n=999). None. Model fit statistics, correlation, and reliability coefficients. IRT analyses resulted in 5 unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. On comparing the simulated CATs with the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared with those of a sample of U.S. adults. The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
[A test to measure the degree of knowledge on food and nutrition at the onset of elementary school].
Ivanovic Marincovich, D; Castro Gómez, C G; Ivanovic Marincovich, R
1997-06-01
The objective of this work was to design a test to measure the degree of knowledge on food and nutrition in school-age children from elementary first and second grades. A graphic instrument was designed according to the psychological child development and was based on the specific objectives pursued by the curriculum programs of the Ministry of Education. The test was developed around the following topics through 15 items: Area 1: Basic Concepts on Food and Nutrition (9 items) and Area 2: Food, Personal and Environmental Hygiene (9 items). The test was pilot tested on 103 school-age children of both grades (1:1), of both sexes (1:1), belonging to Peñalolén and Las Condes counties from Chile's Metropolitan Region and from high and low socioeconomic status (SES) (1:1), measured through the Graffar's Modified Method. The final version of the test was applied in a representative sample of 1.482 school-age children from Chile's Metropolitan Region from elementary first and second grades during 1986-1987. Content validity was assured by a team of judges and by the curriculum programs. Reliability was assessed by the Spearman correlation with the Spearman-Brown correction. Item-test consistency was determined by the Pearson correlation coefficient. Data were processed by the statistical analysis system (SAS) package. Results showed that reliability coefficient was 0.84 and item-test consistency was equal or above 0.25 in all items. It can be concluded that this test can be useful to determine the degree of knowledge on food and nutrition at the onset of elementary school, both in Chile and in other countries.
Toward a More Systematic Assessment of Smoking: Development of a Smoking Module for PROMIS®
Tucker, Joan S.; Shadel, William G.; Stucky, Brian D.; Cai, Li
2012-01-01
Introduction The aim of the PROMIS® Smoking Initiative is to develop, evaluate, and standardize item banks to assess cigarette smoking behavior and biopsychosocial constructs associated with smoking for both daily and non-daily smokers. Methods We used qualitative methods to develop the item pool (following the PROMIS® approach: e.g., literature search, “binning and winnowing” of items, and focus groups and cognitive interviews to finalize wording and format), and quantitative methods (e.g., factor analysis) to develop the item banks. Results We considered a total of 1622 extant items, and 44 new items for inclusion in the smoking item banks. A final set of 277 items representing 11 conceptual domains was selected for field testing in a national sample of smokers. Using data from 3021 daily smokers in the field test, an iterative series of exploratory factor analyses and project team discussions resulted in six item banks: Positive Consequences of Smoking (40 items), Smoking Dependence/Craving (55 items), Health Consequences of Smoking (26 items), Psychosocial Consequences of Smoking (37 items), Coping Aspects of Smoking (30 items), and Social Factors of Smoking (23 items). Conclusions Inclusion of a smoking domain in the PROMIS® framework will standardize measurement of key smoking constructs using state-of-the-art psychometric methods, and make them widely accessible to health care providers, smoking researchers and the large community of researchers using PROMIS® who might not otherwise include an assessment of smoking in their design. Next steps include reducing the number of items in each domain, conducting confirmatory analyses, and duplicating the process for non-daily smokers. PMID:22770824
Toward a more systematic assessment of smoking: development of a smoking module for PROMIS®.
Edelen, Maria O; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cai, Li
2012-11-01
The aim of the PROMIS® Smoking Initiative is to develop, evaluate, and standardize item banks to assess cigarette smoking behavior and biopsychosocial constructs associated with smoking for both daily and non-daily smokers. We used qualitative methods to develop the item pool (following the PROMIS® approach: e.g., literature search, "binning and winnowing" of items, and focus groups and cognitive interviews to finalize wording and format), and quantitative methods (e.g., factor analysis) to develop the item banks. We considered a total of 1622 extant items, and 44 new items for inclusion in the smoking item banks. A final set of 277 items representing 11 conceptual domains was selected for field testing in a national sample of smokers. Using data from 3021 daily smokers in the field test, an iterative series of exploratory factor analyses and project team discussions resulted in six item banks: Positive Consequences of Smoking (40 items), Smoking Dependence/Craving (55 items), Health Consequences of Smoking (26 items), Psychosocial Consequences of Smoking (37 items), Coping Aspects of Smoking (30 items), and Social Factors of Smoking (23 items). Inclusion of a smoking domain in the PROMIS® framework will standardize measurement of key smoking constructs using state-of-the-art psychometric methods, and make them widely accessible to health care providers, smoking researchers and the large community of researchers using PROMIS® who might not otherwise include an assessment of smoking in their design. Next steps include reducing the number of items in each domain, conducting confirmatory analyses, and duplicating the process for non-daily smokers. Copyright © 2012 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Haydel, Angela Michelle
The purpose of this dissertation was to advance theoretical understanding about fit between the personal resources of individuals and the characteristics of science achievement tasks. Testing continues to be pervasive in schools, yet we know little about how students perceive tests and what they think and feel while they are actually working on test items. This study focused on both the personal (cognitive and motivational) and situational factors that may contribute to individual differences in achievement-related outcomes. 387 eighth grade students first completed a survey including measures of science achievement goals, capability beliefs, efficacy related to multiple-choice items and performance assessments, validity beliefs about multiple-choice items and performance assessments, and other perceptions of these item formats. Students then completed science achievement tests including multiple-choice items and two performance assessments. A sample of students was asked to verbalize both thoughts and feelings as they worked through the test items. These think-alouds were transcribed and coded for evidence of cognitive, metacognitive and motivational engagement. Following each test, all students completed measures of effort, mood, energy level and strategy use during testing. Students reported that performance assessments were more challenging, authentic, interesting and valid than multiple-choice tests. They also believed that comparisons between students were easier using multiple-choice items. Overall, students tried harder, felt better, had higher levels of energy and used more strategies while working on performance assessments. Findings suggested that performance assessments might be more congruent with a mastery achievement goal orientation, while multiple-choice tests might be more congruent with a performance achievement goal orientation. A variable-centered analytic approach including regression analyses provided information about how students, on average, who differed in terms of their teachers' ratings of their science ability, achievement goals, capability beliefs and experiences with science achievement tasks perceived, engaged in, and performed on multiple-choice items and performance assessments. Person-centered analyses provided information about the perceptions, engagement and performance of subgroups of individuals who had different motivational characteristics. Generally, students' personal goals and capability beliefs related more strongly to test perceptions, but not performance, while teacher ratings of ability and test-specific beliefs related to performance.
Psychometric properties of the Spanish version of the Resilience Scale.
Heilemann, MarySue V; Lee, Kathryn; Kury, Felix Salvador
2003-01-01
The purpose of this study is to test the reliability and validity of a Spanish translation of the Resilience Scale (RS), which was originally created in English by Wagnild and Young (1993). A team of bilingual, bicultural translators participated in the translation process to enhance the linguistic accuracy and cultural appropriateness of the Spanish translation. As part of the convenience sample of 315 women of Mexican descent who participated in the larger study, data from 147 women who preferred to read and write in Spanish were used in this analysis. The English version of the RS consists of a 17-item "Personal Competence" subscale and an 8-item "Acceptance of Self and Life" subscale for a total of 25 items. However, two items had low item-total loadings and were removed to form a modified 23-item RS. The exploratory principal components factor analysis, varimax rotation, and subsequent goodness of fit indices were ambivalent on whether a one or two-factor solution was appropriate, but the chi-square difference test clearly demonstrated that the two-factor solution of the Spanish version was more useful in explaining variance than a one-factor solution. Internal consistency reliability was estimated with Cronbach's alpha (alpha = 0.93) which was acceptable for the 23-item RS as well as its subscales. Construct validity was demonstrated by a significant positive correlation between resilience and life satisfaction (r = 0.36; p < 0.001), and a significant negative correlation between resilience and depressive symptoms (r = -0.29; p < 0.01). This analysis ultimately supports the appropriateness of the modified 23-item Spanish translation of the RS and its subscales in a sample of urban, low-income women of Mexican descent in the U.S.
Nelson, Melissa C; Lytle, Leslie A
2009-04-01
Sweetened beverage and fast-food intake have been identified as important targets for obesity prevention. However, there are few brief dietary assessment tools available to evaluate these behaviors among adolescents. The objective of this research was to examine reliability and validity of a 22-item dietary screener assessing adolescent consumption of specific energy-containing and non-energy-containing beverages (nine items) and fast food (13 items). The screener was administered to adolescents (ages 11 to 18 years) recruited from the Minneapolis/St Paul, MN, metro region. One sample of adolescents completed test-retest reliability of the screener (n=33, primarily white adolescents). Another adolescent sample completed the screener along with three 24-hour dietary recalls to assess criterion validity (n=59 white adolescents). Test-retest assessments were completed approximately 7 to 14 days apart, and agreement between the two administrations of the screener was substantial, with most items yielding Spearman correlations and kappa statistics that were >0.60. When compared to the gold standard dietary recall data, findings indicate that the validity of the screener items assessing adolescents' intake of regular soda, sports drinks, milk, and water was fair. However, the differential assessment periods captured by the two methods (ie, 1 month for the screener vs 3 days for the recalls) posed challenges in analysis and made it impossible to assess the validity of some screener items. Overall while these screener items largely represent reliable measures with fair validity, our findings highlight the challenges inherent in the validation of brief dietary assessment tools.
Bjorner, Jakob Bue; Pejtersen, Jan Hyld
2010-02-01
To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.
Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A
2006-11-01
To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.
Factor Structure of the Internet Addiction Test in Online Gamers and Poker Players
Achab, Sophia; Billieux, Joel; Thorens, Gabriel; Zullino, Daniele; Dufour, Magali; Rothen, Stéphane
2015-01-01
Background The Internet Addiction Test (IAT) is the most widely used questionnaire to screen for problematic Internet use. Nevertheless, its factorial structure is still debated, which complicates comparisons among existing studies. Most previous studies were performed with students or community samples despite the probability of there being more problematic Internet use among users of specific applications, such as online gaming or gambling. Objective To assess the factorial structure of a modified version of the IAT that addresses specific applications, such as video games and online poker. Methods Two adult samples—one sample of Internet gamers (n=920) and one sample of online poker players (n=214)—were recruited and completed an online version of the modified IAT. Both samples were split into two subsamples. Two principal component analyses (PCAs) followed by two confirmatory factor analyses (CFAs) were run separately. Results The results of principal component analysis indicated that a one-factor model fit the data well across both samples. In consideration of the weakness of some IAT items, a 17-item modified version of the IAT was proposed. Conclusions This study assessed, for the first time, the factorial structure of a modified version of an Internet-administered IAT on a sample of Internet gamers and a sample of online poker players. The scale seems appropriate for the assessment of such online behaviors. Further studies on the modified 17-item IAT version are needed. PMID:26543917
Inchausti, Felix; Mole, Joe; Fonseca-Pedrero, Eduardo; Ortuño-Sierra, Javier
2015-01-01
The aim of this study was to analyse the psychometric properties of the Spanish NEO Five Factor Inventory–Revised (NEO-FFI-R) using Rasch analyses, in order to test its rating scale functioning, the reliability of scores, internal structure, and differential item functioning (DIF) by gender in a psychiatric sample. The NEO-FFI-R responses of 433 Spanish adults (154 males) with an anxiety disorder as primary diagnosis were analysed using the Rasch model for rating scales. Two intermediate categories of response (‘neutral’ and ‘agree’) malfunctioned in the Neuroticism and Conscientiousness scales. In addition, model reliabilities were lower than expected in Agreeableness and Neuroticism, and the item fit values indicated each scale had items that did not achieve moderate to high discrimination on its dimension, particularly in the Agreeableness scale. Concerning unidimensionality, the five NEO-FFI-R scales showed large first components of unexplained variance. Finally, DIF by gender was detected in many items. The results suggest that the scores of the Spanish NEO-FFI-R are unreliable in psychiatric samples and cannot be generalized between males and females, especially in the Openness, Conscientiousness, and Agreeableness scales. Future directions for testing and refinement should be developed before the NEO-FFI-R can be used reliably in clinical samples. PMID:25954224
Soviet and American ASTP crew sample candidate food items
NASA Technical Reports Server (NTRS)
1974-01-01
Candidate food items being considered for the joint U.S.-USSR Apollo Soyuz Test Project (ASTP) mission are sampled by three ASTP crewmen in bldg 4 at JSC. They are, left to right, Cosmonaut Valeriy N. Kubasov, engineer on the Soviet ASTP crew; Astronaut Vance D. Brand, command module pilot of the American ASTP crew; and Cosmonaut Aleksey A. Leonov, commander of the Soviet ASTP crew. Kubasov is marking a food rating chart on which the crewmen mark their choices, likes and dislikes of the food being sampled. Brand is drinking orange juice from an accordian-like dispenser. Leonov is eating butter cookies.
Bravini, Elisabetta; Franchignoni, Franco; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano; Foti, Calogero
2015-01-01
To perform a comprehensive analysis of the psychometric properties and dimensionality of the Upper Limb Functional Index (ULFI) using both classical test theory and Rasch analysis (RA). Prospective, single-group observational design. Freestanding rehabilitation center. Convenience sample of Italian-speaking subjects with upper limb musculoskeletal disorders (N=174). Not applicable. The Italian version of the ULFI. Data were analyzed using parallel analysis, exploratory factor analysis, and RA for evaluating dimensionality, functioning of rating scale categories, item fit, hierarchy of item difficulties, and reliability indices. Parallel analysis revealed 2 factors explaining 32.5% and 10.7% of the response variance. RA confirmed the failure of the unidimensionality assumption, and 6 items out of the 25 misfitted the Rasch model. When the analysis was rerun excluding the misfitting items, the scale showed acceptable fit values, loading meaningfully to a single factor. Item separation reliability and person separation reliability were .98 and .89, respectively. Cronbach alpha was .92. RA revealed weakness of the scale concerning dimensionality and internal construct validity. However, a set of 19 ULFI items defined through the statistical process demonstrated a unidimensional structure, good psychometric properties, and clinical meaningfulness. These findings represent a useful starting point for further analyses of the tool (based on modern psychometric approaches and confirmatory factor analysis) in larger samples, including different patient populations and nationalities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The Effect of Number of Ability Intervals on the Stability of Item Bias Detection.
ERIC Educational Resources Information Center
Loyd, Brenda
The chi-square procedure has been suggested as a viable index of test bias because it provides the best agreement with the three parameter item characteristic curve without the large sample requirement, computer complexity, and cost. This study examines the effect of using different numbers of ability intervals on the reliability of chi-square…
Comparison of Autism Screening in Younger and Older Toddlers
ERIC Educational Resources Information Center
Sturner, Raymond; Howard, Barbara; Bergmann, Paul; Stewart, Lydia; Afarian, Talin E.
2017-01-01
This study examined the effect of age at completion of an autism screening test on item failure rates contrasting older (>20 months) with younger (<20 months) toddlers in a community primary care sample of 73,564 children. Items related to social development were categorized into one of three age sets per criteria from Inada et al.…
Marsh, Herbert W; Martin, Andrew J; Jackson, Susan
2010-08-01
Based on the Physical Self Description Questionnaire (PSDQ) normative archive (n = 1,607 Australian adolescents), 40 of 70 items were selected to construct a new short form (PSDQ-S). The PSDQ-S was evaluated in a new cross-validation sample of 708 Australian adolescents and four additional samples: 349 Australian elite-athlete adolescents, 986 Spanish adolescents, 395 Israeli university students, 760 Australian older adults. Across these six groups, the 11 PSDQ-S factors had consistently high reliabilities and invariant factor structures. Study 1, using a missing-by-design variation of multigroup invariance tests, showed invariance across 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance over a 1-year interval (test-retest correlations .57-.90; Mdn = .77), and good convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to two other physical self-concept instruments.
Food Adulteration and Consumer Awareness in Dhaka City, 1995-2011
Ahmed, Tahmeed
2014-01-01
ABSTRACT We conducted this study to investigate the magnitude of food adulteration during 1995–2011 and consumer awareness in Dhaka city. We reviewed results of food sample testing by Public Health Food Laboratory of Dhaka City Corporation, Bangladesh Standards and Testing Institution, Consumers Association of Bangladesh publications, reports from lay press, including those on mobile magistrate court operations. We conducted a cross-sectional survey among 96 residents of Dhaka city, using a structured questionnaire in 2006. The overall proportion of food samples adulterated decreased during 2001-2005, and 40-54% of daily-consumed food was adulterated during 1995-2011. More than 35 food items were commonly adulterated. Consumers considered expiry date and quality or freshness as the best criteria while buying packaged and open food items respectively; only 11 (12%) respondents considered approval of regulatory authority for buying packaged food items. More than half of the food consumed in Dhaka city is adulterated, which warrants actions by the Government, the industry, and the consumers. PMID:25395908
Food adulteration and consumer awareness in Dhaka City, 1995-2011.
Nasreen, Sharifa; Ahmed, Tahmeed
2014-09-01
We conducted this study to investigate the magnitude of food adulteration during 1995-2011 and consumer awareness in Dhaka city. We reviewed results of food sample testing by Public Health Food Laboratory of Dhaka City Corporation, Bangladesh Standards and Testing Institution, Consumers Association of Bangladesh publications, reports from lay press, including those on mobile magistrate court operations. We conducted a cross-sectional survey among 96 residents of Dhaka city, using a structured questionnaire in 2006. The overall proportion of food samples adulterated decreased during 2001-2005, and 40-54% of daily-consumed food was adulterated during 1995-2011. More than 35 food items were commonly adulterated. Consumers considered expiry date and quality or freshness as the best criteria while buying packaged and open food items respectively; only 11 (12%) respondents considered approval of regulatory authority for buying packaged food items. More than half of the food consumed in Dhaka city is adulterated, which warrants actions by the Government, the industry, and the consumers.
Pilkonis, Paul A.; Choi, Seung W.; Reise, Steven P.; Stover, Angela M.; Riley, William T.; Cella, David
2011-01-01
The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately −1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items. PMID:21697139
Pilkonis, Paul A; Choi, Seung W; Reise, Steven P; Stover, Angela M; Riley, William T; Cella, David
2011-09-01
The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately -1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items.
High explosive spot test analyses of samples from Operable Unit (OU) 1111
DOE Office of Scientific and Technical Information (OSTI.GOV)
McRae, D.; Haywood, W.; Powell, J.
1995-01-01
A preliminary evaluation has been completed of environmental contaminants at selected sites within the Group DX-10 (formally Group M-7) area. Soil samples taken from specific locations at this detonator facility were analyzed for harmful metals and screened for explosives. A sanitary outflow, a burn pit, a pentaerythritol tetranitrate (PETN) production outflow field, an active firing chamber, an inactive firing chamber, and a leach field were sampled. Energy dispersive x-ray fluorescence (EDXRF) was used to obtain semi-quantitative concentrations of metals in the soil. Two field spot-test kits for explosives were used to assess the presence of energetic materials in the soilmore » and in items found at the areas tested. PETN is the major explosive in detonators manufactured and destroyed at Los Alamos. No measurable amounts of PETN or other explosives were detected in the soil, but items taken from the burn area and a high-energy explosive (HE)/chemical sump were contaminated. The concentrations of lead, mercury, and uranium are given.« less
Saltovic, Ema; Lajnert, Vlatka; Saltovic, Sabina; Kovacevic Pavicic, Daniela; Pavlic, Andrej; Spalj, Stjepan
2018-03-01
Orofacial esthetics raises psychosocial issues. The purpose was to create and validate new short instrument for psychosocial impacts of altered smile esthetics. A team of an orthodontist, two prosthodontists, psychologist, and a dental student generated items that could draw up specific hypothetical psychosocial dimensions (69 items initially, 39 in final analysis). The sample consisted of 261 Caucasian subjects attending local high schools and university (26% male) aged 14 to 28 years that have self-administrated the designed questionnaire. Factorial analysis, Cronbach's alpha, Pearson correlation, paired samples t-test and analysis of variance were used for analyses of internal consistency, construct validity, responsiveness, and test-retest. Three dimensions of psychosocial impacts of altered smile esthetics were identified: dental self-consciousness, dental self-confidence and social contacts that can be best fitted by 12 items, 4 items in each dimension. Internal consistency was good (α in range 0.85-0.89). Good stability in test-retest was confirmed. In responsiveness testing, tooth whitening induced increase in dental self-confidence (P = 0.002), but no significant changes in other dimensions. The new instrument, Smile Esthetics-Related Quality of Life (SERQoL), is short and has proven to be a good indicator of psychosocial dimensions related to perception of smile esthetics. Smile Esthetics-Related Quality of Life questionnaire might have practical validity when applied in esthetic dental clinical procedures. © 2017 Wiley Periodicals, Inc.
Development and Validation of Scores from an Instrument Measuring Student Test-Taking Motivation
ERIC Educational Resources Information Center
Eklof, Hanna
2006-01-01
Using the expectancy-value model of achievement motivation as a basis, this study's purpose is to develop, apply, and validate scores from a self-report instrument measuring student test-taking motivation. Sampled evidence of construct validity for the present sample indicates that a number of the items in the instrument could be used as an…
Zimprich, Daniel; Allemand, Mathias; Lachman, Margie E.
2014-01-01
The present study addresses issues of measurement invariance and comparability of factor parameters of Big Five personality adjective items across age. Data from the Midlife in the United States (MIDUS) survey were used to investigate age-related developmental psychometrics of the MIDUS personality adjective items in two large cross-sectional samples (exploratory sample: N = 862; analysis sample: N = 3,000). After having established and replicated a comprehensive five-factor structure of the measure, increasing levels of measurement invariance were tested across ten age groups. Results indicate that the measure demonstrates strict measurement invariance in terms of number of factors and factor loadings. Also, we found that factor variances and covariances were equal across age groups. By contrast, a number of age-related factor mean differences emerged. The practical implications of these results are discussed and future research is suggested. PMID:21910548
Psychometric evaluation of the Dutch version of the Subjective Opiate Withdrawal Scale (SOWS).
Dijkstra, Boukje A G; Krabbe, Paul F M; Riezebos, Truus G M; van der Staak, Cees P F; De Jong, Cor A J
2007-01-01
To evaluate the psychometric properties of the Dutch version of the 16-item Subjective Opiate Withdrawal Scale (SOWS). The SOWS measures withdrawal symptoms at the time of assessment. The Dutch SOWS was repeatedly administered to a sample of 272 opioid-dependent inpatients of four addiction treatment centers during rapid detoxification with or without general anesthesia. Examination of the psychometric properties of the SOWS included exploratory factor analysis, internal consistency, test-retest reliability, and criterion validity. Exploratory factor analysis of the SOWS revealed a general pattern of four factors with three items not always clustered in the same factors at different points of measurement. After excluding these items from factor analysis four factors were identified during detoxification (temperature dysregulation, tractus locomotorius, tractus gastro-intestinalis and facial disinhibition). The 13-item SOWS shows high internal consistency and test-retest reliability and good validity at different stages of withdrawal. The 13-item SOWS is a reliable and valid instrument to assess opioid withdrawal during rapid detoxification. Three items were deleted because their content does not correspond directly with opioid withdrawal symptoms. Copyright (c) 2007 S. Karger AG, Basel.
Berman, Anne H; Liu, Bojing; Ullman, Sara; Jadbäck, Isabel; Engström, Karin
2016-01-01
The KIDSCREEN-27 is a measure of child and adolescent quality of life (QoL), with excellent psychometric properties, available in child-report and parent-rating versions in 38 languages. This study provides child-reported and parent-rated norms for the KIDSCREEN-27 among Swedish 11-16 year-olds, as well as child-parent agreement. Sociodemographic correlates of self-reported wellbeing and parent-rated wellbeing were also measured. A random population sample consisting of 600 children aged 11-16, 100 per age group and one of their parents (N = 1200), were approached for response to self-reported and parent-rated versions of the KIDSCREEN-27. Parents were also asked about their education, employment status and their own QoL based on the 26-item WHOQOL-Bref. Based on the final sampling pool of 1158 persons, a 34.8% response rate of 403 individuals was obtained, including 175 child-parent pairs, 27 child singleton responders and 26 parent singletons. Gender and age differences for parent ratings and child-reported data were analyzed using t-tests and the Mann-Whitney U-test. Post-hoc Dunn tests were conducted for pairwise comparisons when the p-value for specific subscales was 0.05 or lower. Child-parent agreement was tested item-by-item, using the Prevalence- and Bias-Adjusted Kappa (PABAK) coefficient for ordinal data (PABAK-OS); dimensional and total score agreement was evaluated based on dichotomous cut-offs for lower well-being, using the PABAK and total, continuous scores were evaluated using Bland-Altman plots. Compared to European norms, Swedish children in this sample scored lower on Physical wellbeing (48.8 SE/49.94 EU) but higher on the other KIDSCREEN-27 dimensions: Psychological wellbeing (53.4/49.77), Parent relations and autonomy (55.1/49.99), Social Support and peers (54.1/49.94) and School (55.8/50.01). Older children self-reported lower wellbeing than younger children. No significant self-reported gender differences occurred and parent ratings showed no gender or age differences. Item-by-item child-parent agreement was slight for 14 items (51.9%), fair for 12 items (44.4%), and less than chance for one item (3.7%), but agreement on all dimensions as well as the total score was substantial according to the PABAK-OS. Visual interpretation of the Bland-Altman plot suggested that when children's average wellbeing score was lower parents seemed to rate their children as having relatively higher total wellbeing, but as children's average wellbeing score increased, parents tended to rate their children as having relatively lower total wellbeing. Children living with both parents had higher wellbeing than those who lived with only one parent. Results agreed with European findings that adolescent wellbeing decreases with age but contrasted with some prior Swedish research identifying better wellbeing for boys on all dimensions but Social support and peers. The study suggests the importance of considering children's own reports and not only parental or other informant ratings. Future research should be conducted at regular intervals and encompass larger samples.
Influence of item distribution pattern and abundance on efficiency of benthic core sampling
Behney, Adam C.; O'Shaughnessy, Ryan; Eichholz, Michael W.; Stafford, Joshua D.
2014-01-01
ore sampling is a commonly used method to estimate benthic item density, but little information exists about factors influencing the accuracy and time-efficiency of this method. We simulated core sampling in a Geographic Information System framework by generating points (benthic items) and polygons (core samplers) to assess how sample size (number of core samples), core sampler size (cm2), distribution of benthic items, and item density affected the bias and precision of estimates of density, the detection probability of items, and the time-costs. When items were distributed randomly versus clumped, bias decreased and precision increased with increasing sample size and increased slightly with increasing core sampler size. Bias and precision were only affected by benthic item density at very low values (500–1,000 items/m2). Detection probability (the probability of capturing ≥ 1 item in a core sample if it is available for sampling) was substantially greater when items were distributed randomly as opposed to clumped. Taking more small diameter core samples was always more time-efficient than taking fewer large diameter samples. We are unable to present a single, optimal sample size, but provide information for researchers and managers to derive optimal sample sizes dependent on their research goals and environmental conditions.
Oude Voshaar, Martijn Ah; Ten Klooster, Peter M; Taal, Erik; Krishnan, Eswar; van de Laar, Mart Afj
2012-03-05
Patient-reported physical function is an established outcome domain in clinical studies in rheumatology. To overcome the limitations of the current generation of questionnaires, the Patient-Reported Outcomes Measurement Information System (PROMIS®) project in the USA has developed calibrated item banks for measuring several domains of health status in people with a wide range of chronic diseases. The aim of this study was to translate and cross-culturally adapt the PROMIS physical function item bank to the Dutch language and to pretest it in a sample of patients with arthritis. The items of the PROMIS physical function item bank were translated using rigorous forward-backward protocols and the translated version was subsequently cognitively pretested in a sample of Dutch patients with rheumatoid arthritis. Few issues were encountered in the forward-backward translation. Only 5 of the 124 items to be translated had to be rewritten because of culturally inappropriate content. Subsequent pretesting showed that overall, questions of the Dutch version were understood as they were intended, while only one item required rewriting. Results suggest that the translated version of the PROMIS physical function item bank is semantically and conceptually equivalent to the original. Future work will be directed at creating a Dutch-Flemish final version of the item bank to be used in research with Dutch speaking populations.
The development and psychometric validation of the Ethical Awareness Scale.
Milliken, Aimee; Ludlow, Larry; DeSanto-Madeya, Susan; Grace, Pamela
2018-04-19
To develop and psychometrically assess the Ethical Awareness Scale using Rasch measurement principles and a Rasch item response theory model. Critical care nurses must be equipped to provide good (ethical) patient care. This requires ethical awareness, which involves recognizing the ethical implications of all nursing actions. Ethical awareness is imperative in successfully addressing patient needs. Evidence suggests that the ethical import of everyday issues may often go unnoticed by nurses in practice. Assessing nurses' ethical awareness is a necessary first step in preparing nurses to identify and manage ethical issues in the highly dynamic critical care environment. A cross-sectional design was used in two phases of instrument development. Using Rasch principles, an item bank representing nursing actions was developed (33 items). Content validity testing was performed. Eighteen items were selected for face validity testing. Two rounds of operational testing were performed with critical care nurses in Boston between February-April 2017. A Rasch analysis suggests sufficient item invariance across samples and sufficient construct validity. The analysis further demonstrates a progression of items uniformly along a hierarchical continuum; items that match respondent ability levels; response categories that are sufficiently used; and adequate internal consistency. Mean ethical awareness scores were in the low/moderate range. The results suggest the Ethical Awareness Scale is a psychometrically sound, reliable and valid measure of ethical awareness in critical care nurses. © 2018 John Wiley & Sons Ltd.
Hierarchical screening for multiple mental disorders.
Batterham, Philip J; Calear, Alison L; Sunderland, Matthew; Carragher, Natacha; Christensen, Helen; Mackinnon, Andrew J
2013-10-01
There is a need for brief, accurate screening when assessing multiple mental disorders. Two-stage hierarchical screening, consisting of brief pre-screening followed by a battery of disorder-specific scales for those who meet diagnostic criteria, may increase the efficiency of screening without sacrificing precision. This study tested whether more efficient screening could be gained using two-stage hierarchical screening than by administering multiple separate tests. Two Australian adult samples (N=1990) with high rates of psychopathology were recruited using Facebook advertising to examine four methods of hierarchical screening for four mental disorders: major depressive disorder, generalised anxiety disorder, panic disorder and social phobia. Using K6 scores to determine whether full screening was required did not increase screening efficiency. However, pre-screening based on two decision tree approaches or item gating led to considerable reductions in the mean number of items presented per disorder screened, with estimated item reductions of up to 54%. The sensitivity of these hierarchical methods approached 100% relative to the full screening battery. Further testing of the hierarchical screening approach based on clinical criteria and in other samples is warranted. The results demonstrate that a two-phase hierarchical approach to screening multiple mental disorders leads to considerable increases efficiency gains without reducing accuracy. Screening programs should take advantage of prescreeners based on gating items or decision trees to reduce the burden on respondents. © 2013 Elsevier B.V. All rights reserved.
Duncan, Mitch J; Rashid, Mahbub; Vandelanotte, Corneel; Cutumisu, Nicoleta; Plotnikoff, Ronald C
2013-02-04
Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach's α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. The number of items on all scales were reduced, Chronbach's α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys.
2013-01-01
Background Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. Methods The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach’s α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. Results The number of items on all scales were reduced, Chronbach’s α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). Conclusion All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys. PMID:23379485
Sandilos, Lia E.; Lewis, Kandia; Komaroff, Eugene; Hammer, Carol Scheffner; Scarpino, Shelley E.; Lopez, Lisa; Rodriguez, Barbara; Goldstein, Brian
2015-01-01
The purpose of this study was to investigate the way in which items on the Woodcock-Muñoz Language Survey Revised (WMLS-R) Spanish and English versions function for bilingual children from different ethnic subgroups who speak different dialects of Spanish. Using data from a sample of 324 bilingual Hispanic families and their children living on the United States mainland, differential item functioning (DIF) was conducted to determine if test items in English and Spanish functioned differently for Mexican, Cuban, and Puerto Rican bilingual children. Data on child and parent language characteristics and children’s scores on Picture Vocabulary and Story Recall subtests in English and Spanish were collected. DIF was not detected for items on the Spanish subtests. Results revealed that some items on English subtests displayed statistically and practically significant DIF. The findings indicate that there are differences in the difficulty level of WMLS-R English-form test items depending on the examinees’ ethnic subgroup membership. This outcome suggests that test developers need to be mindful of potential differences in performance based on ethnic subgroup and dialect when developing standardized language assessments that may be administered to bilingual students. PMID:26705400
The development of Metacognition test in genetics laboratory for undergraduate students
NASA Astrophysics Data System (ADS)
A-nongwech, Nattapong; Pruekpramool, Chaninan
2018-01-01
The purpose of this research was to develop a Metacognition test in a Genetics Laboratory for undergraduate students. The participants were 30 undergraduate students of a Rajabhat university in Rattanakosin group in the second semester of the 2016 academic year using purposive sampling. The research instrument consisted of 1) Metacognition test and 2) a Metacognition test evaluation form for experts focused on three main points which were an accurate evaluation form of content, a consistency between Metacognition experiences and questions and the appropriateness of the test. The quality of the test was analyzed by using the Index of Consistency (IOC), discrimination and reliability. The results of developing Metacognition test were summarized as 1) The result of developing Metacognition test in a Genetics Laboratory for undergraduate students found that the Metacognition test contained 56 items of open - ended questions. The test composed of 1) four scientific situations, 2) fourteen items of open - ended questions in each scientific situation for evaluating components of Metacognition. The components of Metacognition consisted of Metacognitive knowledge, which were divided into person knowledge, task knowledge and strategy knowledge and Metacognitive experience, which were divided into planning, monitoring and evaluating, and 3) fourteen items of scoring criteria divided into four scales. 2) The results of the item analysis of Metacognition in Genetics Laboratory for undergraduate students found that Index of Consistency between Metacognitive experiences and questions were in the range between 0.75 - 1.00. An accuracy of content equaled 1.00. The appropriateness of the test equaled 1.00 in all situations and items. The discrimination of the test was in the range between 0.00 - 0.73. Furthermore, the reliability of the test equaled 0.97.
Development of six PROMIS pediatrics proxy-report item banks
2012-01-01
Background Pediatric self-report should be considered the standard for measuring patient reported outcomes (PRO) among children. However, circumstances exist when the child is too young, cognitively impaired, or too ill to complete a PRO instrument and a proxy-report is needed. This paper describes the development process including the proxy cognitive interviews and large-field-test survey methods and sample characteristics employed to produce item parameters for the Patient Reported Outcomes Measurement Information System (PROMIS) pediatric proxy-report item banks. Methods The PROMIS pediatric self-report items were converted into proxy-report items before undergoing cognitive interviews. These items covered six domains (physical function, emotional distress, social peer relationships, fatigue, pain interference, and asthma impact). Caregivers (n = 25) of children ages of 5 and 17 years provided qualitative feedback on proxy-report items to assess any major issues with these items. From May 2008 to March 2009, the large-scale survey enrolled children ages 8-17 years to complete the self-report version and caregivers to complete the proxy-report version of the survey (n = 1548 dyads). Caregivers of children ages 5 to 7 years completed the proxy report survey (n = 432). In addition, caregivers completed other proxy instruments, PedsQL™ 4.0 Generic Core Scales Parent Proxy-Report version, PedsQL™ Asthma Module Parent Proxy-Report version, and KIDSCREEN Parent-Proxy-52. Results Item content was well understood by proxies and did not require item revisions but some proxies clearly noted that determining an answer on behalf of their child was difficult for some items. Dyads and caregivers of children ages 5-17 years old were enrolled in the large-scale testing. The majority were female (85%), married (70%), Caucasian (64%) and had at least a high school education (94%). Approximately 50% had children with a chronic health condition, primarily asthma, which was diagnosed or treated within 6 months prior to the interview. The PROMIS proxy sample scored similar or better on the other proxy instruments compared to normative samples. Conclusions The initial calibration data was provided by a diverse set of caregivers of children with a variety of common chronic illnesses and racial/ethnic backgrounds. The PROMIS pediatric proxy-report item banks include physical function (mobility n = 23; upper extremity n = 29), emotional distress (anxiety n = 15; depressive symptoms n = 14; anger n = 5), social peer relationships (n = 15), fatigue (n = 34), pain interference (n = 13), and asthma impact (n = 17). PMID:22357192
The CAT: A Gender-Inclusive Measure of Controlling and Abusive Tactics.
Hamel, John; Jones, Daniel N; Dutton, Donald G; Graham-Kevan, Nicola
2015-01-01
Research has consistently found that partner violence, defined as physical abuse between married, cohabitating, or dating partners, is not the only type of abuse with long-term deleterious effects on victims. Male and female victims alike report that emotional abuse, along with controlling behaviors, are often as or more traumatic. Existing instruments used to measure emotional abuse and control have either been limited to male-perpetrated behaviors, as conceived in the well-known Duluth "Power and Control" wheel, or field tested on dating or general population samples. This study discusses the genesis and evolution of a gender-inclusive instrument, the Controlling and Abusive Tactics (CAT) Questionnaire, which was field tested on males and females with both a clinical and general population sample. For perpetration, a preliminary comparison across gender found no significant differences across gender for the great majority of items, with women reporting significantly higher rates on 9 items, and men reporting significantly higher rates on 6 items. Women reported higher rates of received abuse than men on 28 of 30 items in which gender differences were found to be significant, but both males and females reported higher victimization than perpetration rates on all items. Exploratory and confirmatory factor analyses resulted in the CAT-2, a valid and reliable instrument appropriate for clinical use by treatment providers as well as for research purposes.
Rosenfeld, Barry; Pessin, Hayley; Lewis, Charles; Abbey, Jennifer; Olden, Megan; Sachs, Emily; Amakawa, Lia; Kolva, Elissa; Brescia, Robert; Breitbart, William
2013-01-01
Hopelessness has become an increasingly important construct in palliative care research, yet concerns exist regarding the utility of existing measures when applied to patients with a terminal illness. This article describes a series of studies focused on the exploration, development, and analysis of a measure of hopelessness specifically intended for use with terminally ill cancer patients. The 1st stage of measure development involved interviews with 13 palliative care experts and 30 terminally ill patients. Qualitative analysis of the patient interviews culminated in the development of a set of potential questionnaire items. In the 2nd study phase, we evaluated these preliminary items with a sample of 314 participants, using item response theory and classical test theory to identify optimal items and response format. These analyses generated an 8-item measure that we tested in a final study phase, using a 3rd sample (n = 228) to assess reliability and concurrent validity. These analyses demonstrated strong support for the Hopelessness Assessment in Illness Questionnaire providing greater explanatory power than existing measures of hopelessness and found little evidence that this assessment was confounded by illness-related variables (e.g., prognosis). In summary, these 3 studies suggest that this brief measure of hopelessness is particularly useful for palliative care settings. Further research is needed to assess the applicability of the measure to other populations and contexts. PMID:21443366
Yount, Kathryn M; VanderEnde, Kristin; Zureick-Brown, Sarah; Minh, Tran Hung; Schuler, Sidney Ruth; Anh, Hoang Tu
2014-06-01
Attitudes about intimate partner violence (IPV) against women are widely surveyed, but attitudes about women's recourse after exposure to IPV are understudied, despite their importance for intervention. Designed through qualitative research and administered in a probability sample of 1,054 married men and women 18 to 50 years in My Hao District, Vietnam, the ATT-RECOURSE scale measures men's and women's attitudes about a wife's recourse after exposure to physical IPV. Data were initially collected for nine items. Exploratory factor analysis (EFA) with one random split-half sample (N 1 = 526) revealed a one-factor model with significant loadings (0.316-0.686) for six items capturing a wife's silence, informal recourse, and formal recourse. A confirmatory factor analysis (CFA) with the other random split-half sample (N 2 = 528) showed adequate fit for the six-item model and significant factor loadings of similar magnitude to the EFA results (0.412-0.669). For the six items retained, men consistently favored recourse more often than did women (52.4%-66.0% of men vs. 41.9%-55.2% of women). Tests for uniform differential item functioning (DIF) by gender revealed one item with significant uniform DIF, and adjusting for this revealed an even larger gap in men's and women's attitudes, with men favoring recourse, on average, more than women. The six-item ATT-RECOURSE scale is reliable across independent samples and exhibits little uniform DIF by gender, supporting its use in surveys of men and women. Further methodological research is discussed. Research is needed in Vietnam about why women report less favorable attitudes than men regarding women's recourse after physical IPV.
Crogan, Neva L; Evans, Bronwynne C
2006-11-01
Lack of nursing home resident satisfaction with meals often results in reduced food intake, leading to poor nutritional status, weight loss, functional decline, and depression. The purpose of this article is to describe the development and initial testing of the 28-item revised Food Expectations-Long-Term Care (FoodEx-LTC) questionnaire with a convenience sample of nursing home residents (N = 61). Because of possible respondent burden, the original 44-item, five-domain FoodEx-LTC was revised, resulting in the deletion of 16 redundant items and those with inter-item correlations less than .25. Coefficient alpha scores ranged from .65 to .82, and test-retest correlations ranged from .79 to .88, dependent on domain. This revised instrument has good initial validity and reliability, resulting in a shorter instrument that accurately assesses nursing home resident satisfaction with food and food service.
Gavett, Brandon E; Horwitz, Julie E
2012-03-01
The serial position effect shows that two interrelated cognitive processes underlie immediate recall of a supraspan word list. The current study used item response theory (IRT) methods to determine whether the serial position effect poses a threat to the construct validity of immediate list recall as a measure of verbal episodic memory. Archival data were obtained from a national sample of 4,212 volunteers aged 28-84 in the Midlife Development in the United States study. Telephone assessment yielded item-level data for a single immediate recall trial of the Rey Auditory Verbal Learning Test (RAVLT). Two parameter logistic IRT procedures were used to estimate item parameters and the Q(1) statistic was used to evaluate item fit. A two-dimensional model better fit the data than a unidimensional model, supporting the notion that list recall is influenced by two underlying cognitive processes. IRT analyses revealed that 4 of the 15 RAVLT items (1, 12, 14, and 15) were misfit (p < .05). Item characteristic curves for items 14 and 15 decreased monotonically, implying an inverse relationship between the ability level and the probability of recall. Elimination of the four misfit items provided better fit to the data and met necessary IRT assumptions. Performance on a supraspan list learning test is influenced by multiple cognitive abilities; failure to account for the serial position of words decreases the construct validity of the test as a measure of episodic memory and may provide misleading results. IRT methods can ameliorate these problems and improve construct validity.
ERIC Educational Resources Information Center
Colorado State Dept. of Education, Denver.
This document contains released reading comprehension passages, test items, and writing prompts from the Colorado Student Assessment Program for 2001. The sample questions and prompts are included without answers or examples of student responses. Test materials are included for: (1) Grade 4 Reading and Writing; (2) Grade 4 Lectura y Escritura…
Herschbach, Peter; Berg, Petra; Dankert, Andrea; Duran, Gabriele; Engst-Hastreiter, Ursula; Waadt, Sabine; Keller, Monika; Ukat, Robert; Henrich, Gerhard
2005-06-01
The aim of this study was the development and psychometric testing of a new psychological questionnaire to measure the fear of progression (FoP) in chronically ill patients (cancer, diabetes mellitus and rheumatic diseases). The Fear of Progression Questionnaire (FoP-Q) was developed in four phases: (1) generation of items (65 interviews); (2) reduction of items--the initial version of the questionnaire (87 items) was presented to 411 patients, to construct subscales and test the reliability; (3) testing the convergent and discriminative validity of the reduced test version (43 items) within a new sample (n=439); (4) translation--German to English. The scale comprised five factors (Cronbach's alpha >.70): affective reactions (13 items), partnership/family (7), occupation (7), loss of autonomy (7) and coping with anxiety (9). The test-retest reliability coefficients varied between .77 and .94. There was only a medium relationship to traditional anxiety scales. This is an indication of the independence of the FoP. Significant relationships between the FoP-Q and the patient's illness behaviour indicate discriminative validity. The FoP-Q is a new and unique questionnaire developed for the chronically ill. A major problem and source of stress for this patient group has been measuring both specifically and economically the FoP of an illness. The FoP-Q was designed to resolve this problem, fulfill this need and reduce this stress.
Measurement properties of the Spinal Cord Injury-Functional Index (SCI-FI) short forms.
Heinemann, Allen W; Dijkers, Marcel P; Ni, Pengsheng; Tulsky, David S; Jette, Alan
2014-07-01
To evaluate the psychometric properties of the Spinal Cord Injury-Functional Index (SCI-FI) short forms (basic mobility, self-care, fine motor, ambulation, manual wheelchair, and power wheelchair) based on internal consistency; correlations between short forms banks, full item bank forms, and a 10-item computer adaptive test version; magnitude of ceiling and floor effects; and test information functions. Cross-sectional cohort study. Six rehabilitation hospitals in the United States. Individuals with traumatic spinal cord injury (N=855) recruited from 6 national Spinal Cord Injury Model Systems facilities. Not applicable. SCI-FI full item bank, 10-item computer adaptive test, and parallel short form scores. The SCI-FI short forms (with separate versions for individuals with paraplegia and tetraplegia) demonstrate very good internal consistency, group-level reliability, excellent correlations between short forms and scores based on the total item bank, and minimal ceiling and floor effects (except ceiling effects for persons with paraplegia on self-care, fine motor, and power wheelchair ability and floor effects for persons with tetraplegia on self-care, fine motor, and manual wheelchair ability). The test information functions are acceptable across the range of scores where most persons in the sample performed. Clinicians and researchers should consider the SCI-FI short forms when computer adaptive testing is not feasible. Copyright © 2014 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire
Akmaz, Hazel Ekin; Uyar, Meltem; Kuzeyli Yıldırım, Yasemin; Akın Korhan, Esra
2018-05-29
Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Methodological and cross sectional study. A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance of chronic pain.
Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire
Akmaz, Hazel Ekin; Uyar, Meltem; Kuzeyli Yıldırım, Yasemin; Akın Korhan, Esra
2018-01-01
Background: Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. Aims: To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Study Design: Methodological and cross sectional study. Methods: A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. Results: The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. Conclusion: The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance of chronic pain. PMID:29843496
ERIC Educational Resources Information Center
Cadet, Myriam Jean
2016-01-01
This study investigated the relationship between readiness to learn and self-efficacy among newly enrolled BSN students in an online program. A sample of 27 students completed the 45-item Test of Online Learning Success (ToOLS) and 10-item General Self Efficacy (GSE) scales via Survey Monkey. Knowles' (1980) adult learning theory and Bandura's…
ERIC Educational Resources Information Center
Aguado, Jaume; Campbell, Alistair; Ascaso, Carlos; Navarro, Purificacion; Garcia-Esteve, Lluisa; Luciano, Juan V.
2012-01-01
In this study, the authors tested alternative factor models of the 12-item General Health Questionnaire (GHQ-12) in a sample of Spanish postpartum women, using confirmatory factor analysis. The authors report the results of modeling three different methods for scoring the GHQ-12 using estimation methods recommended for categorical and binary data.…
Testing of the SEE and OEE post-hip fracture.
Resnick, Barbara; Orwig, Denise; Zimmerman, Sheryl; Hawkes, William; Golden, Justine; Werner-Bronzert, Michelle; Magaziner, Jay
2006-08-01
The purpose of this study was to test the reliability and validity of the Self-Efficacy for Exercise (SEE) and the Outcome Expectations for Exercise (OEE) scales in a sample of 166 older women post-hip fracture. There was some evidence of validity of the SEE and OEE based on confirmatory factor analysis and Rasch model testing, criterion based and convergent validity, and evidence of internal consistency based on alpha coefficients and separation indices and reliability based on R2 estimates. Rasch model testing demonstrated that some items had high variability. Based on these findings suggestions are made for how items could be revised and the scales improved for future use.
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
2015-08-19
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
The Curiosity and Exploration Inventory-II: Development, Factor Structure, and Psychometrics
Kashdan, Todd B.; Gallagher, Matthew W.; Silvia, Paul J.; Winterstein, Beate P.; Breen, William E.; Terhar, Daniel; Steger, Michael F.
2009-01-01
Given curiosity’s fundamental role in motivation, learning, and well-being, we sought to refine the measurement of trait curiosity with an improved version of the Curiosity and Exploration Inventory (CEI; Kashdan, Rose, & Fincham, 2004). A preliminary pool of 36 items was administered to 311 undergraduate students, who also completed measures of emotion, emotion regulation, personality, and well-being. Factor analyses indicated a two factor model—motivation to seek out knowledge and new experiences (Stretching; 5 items) and a willingness to embrace the novel, uncertain, and unpredictable nature of everyday life (Embracing; 5 items). In two additional samples (ns = 150 and 119), we cross-validated this factor structure and provided initial evidence for construct validity. This includes positive correlations with personal growth, openness to experience, autonomy, purpose in life, self-acceptance, psychological flexibility, positive affect, and positive social relations, among others. Applying item response theory (IRT) to these samples (n = 578), we showed that the items have good discrimination and a desirable breadth of difficulty. The item information functions and test information function were centered near zero, indicating that the scale assesses the mid-range of the latent curiosity trait most reliably. The findings thus far provide good evidence for the psychometric properties of the 10-item CEI-II. PMID:20160913
Lambert, Matthew C; Cress, Cynthia J; Epstein, Michael H
2015-01-01
In a previous study with a nationally representative sample, researchers found that the items of the Preschool Behavioral and Emotional Rating Scale can best be described by a four-factor structure model (Emotional Regulation, School Readiness, Social Confidence, and Family Involvement). The findings of this investigation replicate and extend these previous results with a national sample of children (N = 1,075) with disabilities enrolled in early childhood special education programs. Data were analyzed using classical tests theory, Rasch modeling, and confirmatory factor analysis. Results confirmed that for the most part, individual items were internally consistent within a four-factor model and showed consistent item difficulty, discrimination, and fit relative to their respective subscale scores. © 2015 Michigan Association for Infant Mental Health.
Hong, Ickpyo; Lee, Mi Jung; Kim, Moon Young; Park, Hae Yean
2017-10-01
The aim of this study is to investigate the psychometrics of the 12 items of an instrument assessing activities of daily living (ADL) using an item response theory model. A total of 648 adults with physical disabilities and having difficulties in ADLs were retrieved from the 2014 Korean National Survey on People with Disabilities. The psychometric testing included factor analysis, internal consistency, precision, and differential item functioning (DIF) across categories including sex, older age, marital status, and physical impairment area. The sample had a mean age of 69.7 years old (SD = 13.7). The majority of the sample had lower extremity impairments (62.0%) and had at least 2.1 chronic conditions. The instrument demonstrated unidimensional construct and good internal consistency (Cronbach's alpha = 0.95). The instrument precisely estimated person measures within a wide range of theta values (-2.22 logits < θ < 0.27 logits) with a reliability of 0.9. Only the changing position item demonstrated misfit (χ 2 = 36.6, df = 17, p = 0.0038), and the dressing item demonstrated DIF on the impairment type (upper extremity/others, McFadden's Pseudo R 2 > 5.0%). Our findings indicate that the dressing item would need to be modified to improve its psychometrics. Overall, the ADL instrument demonstrates good psychometrics, and thus, it may be used as a standardized instrument for measuring disability in rehabilitation contexts. However, the findings are limited to adults with physical disabilities. Future studies should replicate psychometric testing for survey respondents with other disorders and for children.
2010-01-01
Background Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared. Methods Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified. Results When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods. Conclusion Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula. PMID:20338031
Glassmire, David M; Tarescavage, Anthony M; Burchett, Danielle; Martinez, Jennifer; Gomez, Anthony
2016-11-01
In this study, we examined whether the 5 Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011) Suicidal/Death Ideation (SUI) items (93, 120, 164, 251, and 334) would provide incremental suicide-risk assessment information after accounting for information garnered from clinical interview questions. Among 229 forensic inpatients (146 men, 83 women) who were administered the MMPI-2-RF, 34.9% endorsed at least 1 SUI item. We found that patients who endorsed SUI items on the MMPI-2-RF concurrently denied conceptually related suicide-risk information during the clinical interview. For instance, 8% of the sample endorsed Item 93 (indicating recent suicidal ideation), yet denied current suicidal ideation upon interview. Conversely, only 2.2% of the sample endorsed current suicidal ideation during the interview, yet denied recent suicidal ideation on Item 93. The SUI scale, as well as the MMPI-2-RF Demoralization (RCd) and Low Positive Emotions (RC2) scales, correlated significantly and meaningfully with conceptually related suicide-risk information from the interview, including history of suicide attempts, history of suicidal ideation, current suicidal ideation, and months since last suicide attempt. We also found that the SUI scale added incremental variance (after accounting for information garnered from the interview and after accounting for scores on RCd and RC2) to predictions of future suicidal behavior within 1 year of testing. Relative risk ratios indicated that both SUI-item endorsement and the presence of interview-reported risk information significantly and meaningfully increased the risk of suicidal behavior in the year following testing, particularly when endorsement of suicidal ideation occurred for both methods of self-report. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A
2018-06-01
The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Health and role functioning: the use of focus groups in the development of an item bank.
Anatchkova, Milena D; Bjorner, Jakob B
2010-02-01
Role functioning is an important part of health-related quality of life. However, assessment of role functioning is complicated by the wide definition of roles and by fluctuations in role participation across the life-span. The aim of this study is to explore variations in role functioning across the lifespan using qualitative approaches, to inform the development of a role functioning item bank and to pilot test sample items from the bank. Eight focus groups were conducted with a convenience sample of 38 English-speaking adults recruited in Rhode Island. Participants were stratified by gender and four age groups. Focus groups were taped, transcribed, and analyzed for thematic content. Participants of all ages identified family roles as the most important. There was age variation in the importance of social life roles, with younger and older adults rating them as more important. Occupational roles were identified as important by younger and middle-aged participants. The potential of health problems to affect role participation was recognized. Participants found the sample items easy to understand, response options identical in meaning and preferred five response choices. Participants identified key aspects of role functioning and provided insights on their perception of the impact of health on their role participation. These results will inform item bank generation.
Reliability and validity of the Dutch version of the Readiness to Change Questionnaire.
Defuentes-Merillas, L; Dejong, C A J; Schippers, G M
2002-01-01
The aim of the present study was to evaluate the psychometric properties of the Dutch version of the Readiness to Change Questionnaire (RCQ-D). The subjects were 246 excessive drinkers admitted to an addiction treatment centre and 54 offenders convicted of an alcohol-related crime in The Netherlands. The factor structure of the RCQ-D for the two samples combined was found to be consistent with the three-factor structure established for the original RCQ. The reliability of the items for each scale was found to be satisfactory. Allocated stage of change showed significant differences between the different subsamples. As expected, the scale scores for adjacent stages of change showed significantly higher inter-correlations than the scale scores for non-adjacent stages. Additionally, the negatively formulated items from the pre-contemplation scale were reformulated positively and their internal consistency tested among the offender sample. The positively formulated pre-contemplation items showed a higher alpha value than the negatively formulated items. We therefore suggest that the positively formulated items should replace the negatively formulated ones.
Shinya, Sugimoto; Masaru, Akimoto; Akira, Hayakawa; Eisaku, Hokazono; Susumu, Osawa
2012-01-18
Lifestyle-related diseases in Japan account for 30% of the entire medical expenditure of the country and cause 60% of all deaths. For the prevention of lifestyle-related diseases, medical examination by laboratory tests on metabolic syndrome is important. To undertake examination by collection of blood from a fingertip, we developed the "Well Kit". About 65 μl of blood collected from a fingertip was diluted with buffer solution, which contained two internal standard materials. The kit also separated corpuscles and diluted plasma with a special filter. It measured the obtained diluted plasma using the JCA-BM2250. This measurement system was evaluated for the quantitative analysis of 8 items. The uncertainties of tested items of this measurement system were 1.7% to 6.4%. The coefficients of correlation of all tested items between this measurement value and the venous plasma sample value were 0.876-0.991, and hematocrit was 0.958. This system for testing blood collected from a fingertip is simple to use and can be applied in testing for metabolic syndrome. In addition, this testing system is useful in the medical examination of the personal healthcare and inhabitants. Copyright © 2011 Elsevier B.V. All rights reserved.
Paige, Samantha R; Krieger, Janice L; Stellefson, Michael; Alber, Julia M
2017-02-01
Chronic disease patients are affected by low computer and health literacy, which negatively affects their ability to benefit from access to online health information. To estimate reliability and confirm model specifications for eHealth Literacy Scale (eHEALS) scores among chronic disease patients using Classical Test (CTT) and Item Response Theory techniques. A stratified sample of Black/African American (N=341) and Caucasian (N=343) adults with chronic disease completed an online survey including the eHEALS. Item discrimination was explored using bi-variate correlations and Cronbach's alpha for internal consistency. A categorical confirmatory factor analysis tested a one-factor structure of eHEALS scores. Item characteristic curves, in-fit/outfit statistics, omega coefficient, and item reliability and separation estimates were computed. A 1-factor structure of eHEALS was confirmed by statistically significant standardized item loadings, acceptable model fit indices (CFI/TLI>0.90), and 70% variance explained by the model. Item response categories increased with higher theta levels, and there was evidence of acceptable reliability (ω=0.94; item reliability=89; item separation=8.54). eHEALS scores are a valid and reliable measure of self-reported eHealth literacy among Internet-using chronic disease patients. Providers can use eHEALS to help identify patients' eHealth literacy skills. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Jones, L. D.
1979-01-01
The Space Environment Test Division Post-Test Data Reduction Program processes data from test history tapes generated on the Flexible Data System in the Space Environment Simulation Laboratory at the National Aeronautics and Space Administration/Lyndon B. Johnson Space Center. The program reads the tape's data base records to retrieve the item directory conversion file, the item capture file and the process link file to determine the active parameters. The desired parameter names are read in by lead cards after which the periodic data records are read to determine parameter data level changes. The data is considered to be compressed rather than full sample rate. Tabulations and/or a tape for generating plots may be output.
Better assessment of physical function: item improvement is neglected but essential
2009-01-01
Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.
Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E
2009-01-01
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
2014-01-01
Background Nudging is an approach to environmental change that alters social and physical environments to shift behaviors in positive, self-interested directions. Evidence indicates that eating is largely an automatic behavior governed by environmental cues, suggesting that it might be possible to nudge healthier dietary behaviors. This study assessed the comparative and additive efficacy of two nudges and an economic incentive in supporting healthy food purchases by patrons at a recreational swimming pool. Methods An initial pre-intervention period was followed by three successive and additive interventions that promoted sales of healthy items through: signage, taste testing, and 30% price reductions; concluding with a return to baseline conditions. Each period was 8 days in length. The primary outcome was the change in the proportion of healthy items sold in the intervention periods relative to pre- and post-intervention in the full sample, and in a subsample of patrons whose purchases were directly observed. Secondary outcomes included change in the caloric value of purchases, change in revenues and gross profits, and qualitative process observations. Data were analyzed using analysis of covariance, chi-square tests and thematic content analysis. Results Healthy items represented 41% of sales and were significantly lower than sales of unhealthy items (p < 0.0001). In the full sample, sales of healthy items did not differ across periods, whereas in the subsample, sales of healthy items increased by 30% when a signage + taste testing intervention was implemented (p < 0.01). This increase was maintained when prices of healthy items were reduced by 30%, and when all interventions were removed. When adults were alone they purchased more healthy items compared to when children were present during food purchases (p < 0.001), however parental choices were not substantially better than choices made by children alone. Conclusions This study found mixed evidence for the efficacy of nudging in cueing healthier dietary behaviors. Moreover, price reductions appeared ineffectual in this setting. Our findings point to complex, context-specific patterns of effectiveness and suggest that nudging should not supplant the use of other strategies that have proven to promote healthier dietary behaviors. PMID:24450763
Olstad, Dana Lee; Goonewardene, Laksiri A; McCargar, Linda J; Raine, Kim D
2014-01-22
Nudging is an approach to environmental change that alters social and physical environments to shift behaviors in positive, self-interested directions. Evidence indicates that eating is largely an automatic behavior governed by environmental cues, suggesting that it might be possible to nudge healthier dietary behaviors. This study assessed the comparative and additive efficacy of two nudges and an economic incentive in supporting healthy food purchases by patrons at a recreational swimming pool. An initial pre-intervention period was followed by three successive and additive interventions that promoted sales of healthy items through: signage, taste testing, and 30% price reductions; concluding with a return to baseline conditions. Each period was 8 days in length. The primary outcome was the change in the proportion of healthy items sold in the intervention periods relative to pre- and post-intervention in the full sample, and in a subsample of patrons whose purchases were directly observed. Secondary outcomes included change in the caloric value of purchases, change in revenues and gross profits, and qualitative process observations. Data were analyzed using analysis of covariance, chi-square tests and thematic content analysis. Healthy items represented 41% of sales and were significantly lower than sales of unhealthy items (p < 0.0001). In the full sample, sales of healthy items did not differ across periods, whereas in the subsample, sales of healthy items increased by 30% when a signage + taste testing intervention was implemented (p < 0.01). This increase was maintained when prices of healthy items were reduced by 30%, and when all interventions were removed. When adults were alone they purchased more healthy items compared to when children were present during food purchases (p < 0.001), however parental choices were not substantially better than choices made by children alone. This study found mixed evidence for the efficacy of nudging in cueing healthier dietary behaviors. Moreover, price reductions appeared ineffectual in this setting. Our findings point to complex, context-specific patterns of effectiveness and suggest that nudging should not supplant the use of other strategies that have proven to promote healthier dietary behaviors.
Arensman, Remco M; Pisters, Martijn F; de Man-van Ginkel, Janneke M; Schuurmans, Marieke J; Jette, Alan M; de Bie, Rob A
2016-09-01
Adequate and user-friendly instruments for assessing physical function and disability in older adults are vital for estimating and predicting health care needs in clinical practice. The Late-Life Function and Disability Instrument Computer Adaptive Test (LLFDI-CAT) is a promising instrument for assessing physical function and disability in gerontology research and clinical practice. The aims of this study were: (1) to translate the LLFDI-CAT to the Dutch language and (2) to investigate its validity and reliability in a sample of older adults who spoke Dutch and dwelled in the community. For the assessment of validity of the LLFDI-CAT, a cross-sectional design was used. To assess reliability, measurement of the LLFDI-CAT was repeated in the same sample. The item bank of the LLFDI-CAT was translated with a forward-backward procedure. A sample of 54 older adults completed the LLFDI-CAT, World Health Organization Disability Assessment Schedule 2.0, RAND 36-Item Short-Form Health Survey physical functioning scale (10 items), and 10-Meter Walk Test. The LLFDI-CAT was repeated in 2 to 8 days (mean=4.5 days). Pearson's r and the intraclass correlation coefficient (ICC) (2,1) were calculated to assess validity, group-level reliability, and participant-level reliability. A correlation of .74 for the LLFDI-CAT function scale and the RAND 36-Item Short-Form Health Survey physical functioning scale (10 items) was found. The correlations of the LLFDI-CAT disability scale with the World Health Organization Disability Assessment Schedule 2.0 and the 10-Meter Walk Test were -.57 and -.53, respectively. The ICC (2,1) of the LLFDI-CAT function scale was .84, with a group-level reliability score of .85. The ICC (2,1) of the LLFDI-CAT disability scale was .76, with a group-level reliability score of .81. The high percentage of women in the study and the exclusion of older adults with recent joint replacement or hospitalization limit the generalizability of the results. The Dutch LLFDI-CAT showed strong validity and high reliability when used to assess physical function and disability in older adults dwelling in the community. © 2016 American Physical Therapy Association.
Lai, Jin-Shei; Cella, David; Choi, Seung; Junghaenel, Doerte U; Christodoulou, Christopher; Gershon, Richard; Stone, Arthur
2011-10-01
To illustrate how measurement practices can be advanced by using as an example the fatigue item bank (FIB) and its applications (short forms and computerized adaptive testing [CAT]) that were developed through the National Institutes of Health Patient Reported Outcomes Measurement Information System (PROMIS) Cooperative Group. Psychometric analysis of data collected by an Internet survey company using item response theory-related techniques. A U.S. general population representative sample collected through the Internet. Respondents used for dimensionality evaluation of the PROMIS FIB (N=603) and item calibrations (N=14,931). Not applicable. Fatigue items (112) developed by the PROMIS fatigue domain working group, 13-item Functional Assessment of Chronic Illness Therapy-Fatigue, and 4-item Medical Outcomes Study 36-Item Short Form Health Survey Vitality scale. The PROMIS FIB version 1, which consists of 95 items, showed acceptable psychometric properties. CAT showed consistently better precision than short forms. However, all 3 short forms showed good precision for most participants in that more than 95% of the sample could be measured precisely with reliability greater than 0.9. Measurement practice can be advanced by using a psychometrically sound measurement tool and its applications. This example shows that CAT and short forms derived from the PROMIS FIB can reliably estimate fatigue reported by the U.S. general population. Evaluation in clinical populations is warranted before the item bank can be used for clinical trials. Copyright © 2011 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Shen, Minxue; Hu, Ming; Sun, Zhenqiu
2017-01-01
Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A
2018-06-01
Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018, 11: 928-941. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical techniques were applied to identify the best final items. The EDI is unique because it captures common emotional problems in ASD and is appropriate for both nonverbal and verbal youth. It is an efficient and sensitive measure for use in clinical assessments, monitoring, and research with youth with ASD. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
Giuffrida, Michelle A; Brown, Dorothy Cimino; Ellenberg, Susan S; Farrar, John T
2018-05-01
OBJECTIVE To describe development and initial psychometric testing of an owner-reported questionnaire designed to standardize measurement of general quality of life (QOL) in dogs with cancer. DESIGN Key-informant interviews, questionnaire development, and field trial. SAMPLE Owners of 25 dogs with cancer for item development and pretesting and owners of 90 dogs with cancer for reliability and validity testing. PROCEDURES Standard methods for development and testing of questionnaire instruments intended to measure subjective states were used. Items were generated, selected, scaled, and pretested for content, meaning, and readability. Response items were evaluated with exploratory factor analysis and by assessing internal consistency (Cronbach α) and convergence with global QOL as determined with a visual analog scale. Preliminary tests of stability and responsiveness were performed. RESULTS The final questionnaire-which was named the Canine Owner-Reported Quality of Life (CORQ) questionnaire-contained 17 items related to observable behaviors commonly used by owners to evaluate QOL in their dogs. Several items pertaining to physical symptoms performed poorly and were omitted. The 17 items were assigned to 4 factors-vitality, companionship, pain, and mobility-on the basis of the items they contained. The CORQ questionnaire and its factors had high internal consistency (Cronbach α = 0.68 to 0.90) and moderate to strong correlations (r = 0.49 to 0.71) with global QOL as measured on a visual analog scale. Preliminary testing indicated good test-retest reliability and responsiveness to improvements in overall QOL. CONCLUSIONS AND CLINICAL RELEVANCE The CORQ questionnaire was a valid, reliable owner-reported questionnaire that measured general QOL in dogs with cancer and showed promise as a clinical trial outcome measure for quantifying changes in individual dog QOL occurring in response to cancer treatment and progression.
Construction of Valid and Reliable Test for Assessment of Students
ERIC Educational Resources Information Center
Osadebe, P. U.
2015-01-01
The study was carried out to construct a valid and reliable test in Economics for secondary school students. Two research questions were drawn to guide the establishment of validity and reliability for the Economics Achievement Test (EAT). It is a multiple choice objective test of five options with 100 items. A sample of 1000 students was randomly…
Construction of Economics Achievement Test for Assessment of Students
ERIC Educational Resources Information Center
Osadebe, P. U.
2014-01-01
The study was carried out to construct a valid and reliable test in Economics for secondary school students. Two research questions were drawn to guide the establishment of validity and reliability for the Economics Achievement Test (EAT). It is a multiple choice objective test of five options with 100 items. A sample of 1000 students was randomly…
The Childhood Asperger Syndrome Test (CAST): Test-Retest Reliability in a High Scoring Sample
ERIC Educational Resources Information Center
Allison, Carrie; Williams, Jo; Scott, Fiona; Stott, Carol; Bolton, Patrick; Baron-Cohen, Simon; Brayne, Carol
2007-01-01
The Childhood Asperger Syndrome Test (CAST) is a 37-item parental self-completion questionnaire designed to screen for high-functioning autism spectrum conditions in epidemiological research. The CAST has previously demonstrated good accuracy for use as a screening test, with high sensitivity in studies with primary school aged children in…
American Sign Language Comprehension Test: A Tool for Sign Language Researchers
ERIC Educational Resources Information Center
Hauser, Peter C.; Paludneviciene, Raylene; Riddle, Wanda; Kurz, Kim B.; Emmorey, Karen; Contreras, Jessica
2016-01-01
The American Sign Language Comprehension Test (ASL-CT) is a 30-item multiple-choice test that measures ASL receptive skills and is administered through a website. This article describes the development and psychometric properties of the test based on a sample of 80 college students including deaf native signers, hearing native signers, deaf…
Tool for Evaluating the Ways Nurses Assess Pain (TENAP): psychometric properties assessment.
Ng, Siok Qi; Brammer, Jillian; Creedy, Debra K; Klainin-Yobas, Piyanee
2014-12-01
Elderly people with cognitive impairment are at risk for under-treatment of pain due to their inability to communicate. Poor knowledge and attitudes of nurses toward pain in this population may result in inadequate pain assessment. This study used a descriptive correlational design to develop and validate a tool to assess nurses' knowledge, attitudes, and reported practice of pain assessment in cognitively impaired elderly patients in acute care settings. The Tool for Evaluating the ways Nurses Assess Pain (TENAP) has two sections: (1) nurses' knowledge and attitudes about pain assessment and management and (2) two vignettes to assess reported practice. Content validity was established by an expert panel of three geriatric-trained nurse clinicians, and pilot tested with a convenience sample of 10 nurses. The psychometric properties were tested with a sample of 263 Registered and Enrolled nurses working in medical wards of two public hospitals in Singapore. The final version of TENAP comprised 29 items. Content validity index ranged from 0.84 to 1.00. The scale took 10 to 15 minutes to complete and items were easily understood. Results from the factor analysis suggested that Section A demonstrated one factor (13 items) while Section B had two distinct factors (16 items), one for each vignette, supporting construct validity of the scale. Cronbach's alphas for all factors were acceptable. TENAP was feasible, valid, and reliable for assessing nurses' knowledge, attitudes, and reported practice of pain assessment in cognitively-impaired elderly patients. Further testing of the tool with a larger sample of nurses in other practice contexts is needed. Copyright © 2014 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
Jeong, Eunju; Lesiuk, Teresa L
2011-01-01
Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.
ERIC Educational Resources Information Center
Kimonis, Eva R.; Branch, Jessica; Hagman, Brett; Graham, Nicole; Miller, Cailey
2013-01-01
In the present study, the psychometric properties and factor structure of the 24-item Inventory of Callous-Unemotional Traits (ICU) were tested in a sample of 687 college students. Results support a similar 3-factor structure to that identified in samples of youths, in whom this measure was previously validated. Correlations with external…
Development and validation of a measure of workplace climate for healthy weight maintenance.
Sliter, Katherine A
2013-07-01
Due to the obesity epidemic, an increasing amount of research is being conducted to better understand the antecedents and consequences of excess employee weight. One construct often of interest to researchers in this area is organizational climate. Unfortunately, a viable measure of climate, as related to employee weight, does not exist. The purpose of this study was to remedy this by developing and validating a concise, psychometrically sound measure of climate for healthy weight. An item pool was developed based on surveys of full-time employees, and a sorting task was used to eliminate ambiguous items. Items were pilot tested by a sample of 338 full-time employees, and the item pool was reduced through item response theory (IRT) and reliability analyses. Finally, the retained 14 items, comprising 3 subscales, were completed by a sample of 360 full-time employees, representing 26 different organizations from across the United States. Multilevel modeling indicated that sufficient variance was explained by group membership to support aggregation, and confirmatory factor analysis (CFA) supported the hypothesized model of 3 subscale factors and an overall climate factor. Nine hypotheses specific to construct validation were tested. Scores on the new scale correlated significantly with individual-level reports of psychological constructs (e.g., health motivation, general leadership support for health) and physiological phenomena (e.g., body mass index [BMI], physical health problems) to which they should theoretically relate, supporting construct validity. Implications for the use of this scale in both applied and research settings are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Mathur, Vijay Prakash; Dhillon, Jatinder Kaur; Logani, Ajay; Agarwal, Ramesh
2014-01-01
The purpose of this study was to develop a reliable instrument [Oral Health related Early Childhood Quality of Life (OH- ECQOL) scale] for measuring oral health related quality of life (OHrQoL) in preschool children in North Indian population. Four pediatric dentists evaluated a pool of 65 items from various QoL questionnaires to assess their relevance to Indian population. These items were discussed with eight independent pediatric dentists and two community dentists who were not a part of this study to assess relevance of these items to preschool age children based on their comprehensiveness and clarity. Based on their responses and feedback a modified pool of items was developed and administered to a convenience sample of 20 parents who rated these items according to their relevance. The test retest reliability was evaluated on another sample of 20 parents of 2-5 year old children. The final questionnaire comprised of 16 items (12 child and 4 family). This was administered to 300 parents of 24-71 months old children divided on the basis of early childhood caries to assess its reliability and validity. OH-ECQOL scores were significantly associated with parental ratings of their child's general and oral health, and the presence of dental disease in the child. Cronbach's alpha was 0.862, and the ICC for test-retest reliability was 0.94. The OH-ECQOL proved reliable and valid tool for assessing the impact of oral disorders on the quality of life of preschool children in Northern India.
Newman-Beinart, Naomi A; Norton, Sam; Dowling, Dominic; Gavriloff, Dimitri; Vari, Chiara; Weinman, John A; Godfrey, Emma L
2017-06-01
There is no gold standard for measuring adherence to prescribed home exercise. Self-report diaries are commonly used however lack of standardisation, inaccurate recall and self-presentation bias limit their validity. A valid and reliable tool to assess exercise adherence behaviour is required. Consequently, this article reports the development and psychometric evaluation of the Exercise Adherence Rating Scale (EARS). Development of a questionnaire. Secondary care in physiotherapy departments of three hospitals. A focus group consisting of 8 patients with chronic low back pain (CLBP) and 2 physiotherapists was conducted to generate qualitative data. Following on from this, a convenience sample of 224 people with CLBP completed the initial 16-item EARS for purposes of subsequent validity and reliability analyses. Construct validity was explored using exploratory factor analysis and item response theory. Test-retest reliability was assessed 3 weeks later in a sub-sample of patients. An item pool consisting of 6 items was found suitable for factor analysis. Examination of the scale structure of these 6 items revealed a one factor solution explaining a total of 71% of the variance in adherence to exercise. The six items formed a unidimensional scale that showed good measurement properties, including acceptable internal consistency and high test-retest reliability. The EARS enables the measurement of adherence to prescribed home exercise. This may facilitate the evaluation of interventions promoting self-management for both the prevention and treatment of chronic conditions. Copyright © 2017 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
Development and validation of the Child Oral Health Impact Profile - Preschool version.
Ruff, R R; Sischo, L; Chinn, C H; Broder, H L
2017-09-01
The Child Oral Health Impact Profile (COHIP) is a validated instrument created to measure the oral health-related quality of life of school-aged children. The purpose of this study was to develop and validate a preschool version of the COHIP (COHIP-PS) for children aged 2-5. The COHIP-PS was developed and validated using a multi-stage process consisting of item selection, face validity testing, item impact testing, reliability and validity testing, and factor analysis. A cross-sectional convenience sample of caregivers having children 2-5 years old from four groups completed item clarity and impact forms. Groups were recruited from pediatric health clinics or preschools/daycare centers, speech clinics, dental clinics, or cleft/craniofacial centers. Participants had a variety of oral health-related conditions, including caries, congenital orofacial anomalies, and speech/language deficiencies such as articulation and language disorders. COHIP-PS. The COHIP-PS was found to have acceptable internal validity (a = 0.71) and high test-retest reliability (0.87), though internal validity was below the accepted threshold for the community sample. While discriminant validity results indicated significant differences across study groups, the overall magnitude of differences was modest. Results from confirmatory factor analyses support the use of a four-factor model consisting of 11 items across oral health, functional well-being, social-emotional well-being, and self-image domains. Quality of life is an integral factor in understanding and assessing children's well-being. The COHIP-PS is a validated oral health-related quality of life measure for preschool children with cleft or other oral conditions. Copyright© 2017 Dennis Barber Ltd.
van Ballegooijen, Wouter; Riper, Heleen; Donker, Tara; Martin Abello, Katherina; Marks, Isaac; Cuijpers, Pim
2012-01-01
The advent of web-based treatments for anxiety disorders creates a need for quick and valid online screening instruments, suitable for a range of social groups. This study validates a single-item multimedia screening instrument for agoraphobia, part of the Visual Screener for Common Mental Disorders (VS-CMD), and compares it with the text-based agoraphobia items of the PDSS-SR. The study concerned 85 subjects in an RCT of the effects of web-based therapy for panic symptoms. The VS-CMD item and items 4 and 5 of the PDSS-SR were validated by comparing scores to the outcomes of the CIDI diagnostic interview. Screening for agoraphobia was found moderately valid for both the multimedia item (sensitivity.81, specificity.66, AUC.734) and the text-based items (AUC.607–.697). Single-item multimedia screening for anxiety disorders should be further developed and tested in the general population and in patient, illiterate and immigrant samples. PMID:22844391
Humor and Anxiety: Effects on Class Test Performance.
ERIC Educational Resources Information Center
Townsend, Michael A. R.; Mahoney, Peggy
The roles of humor and anxiety in test performance were investigated. Measures of trait anxiety, state anxiety and achievement were obtained on a sample of undergraduate students; the A-Trait and A-State scales of the State-Trait Anxiety Inventory were used. Half of the students received additional humorous items in the achievement test. The…
Bi-Factor MIRT Observed-Score Equating for Mixed-Format Tests
ERIC Educational Resources Information Center
Lee, Guemin; Lee, Won-Chan
2016-01-01
The main purposes of this study were to develop bi-factor multidimensional item response theory (BF-MIRT) observed-score equating procedures for mixed-format tests and to investigate relative appropriateness of the proposed procedures. Using data from a large-scale testing program, three types of pseudo data sets were formulated: matched samples,…
Semsick, Gretchen R.
2016-01-01
Objective. Identify behaviors that can compose a measure of organizational citizenship by pharmacy faculty. Methods. A four-round, modified Delphi procedure using open-ended questions (Round 1) was conducted with 13 panelists from pharmacy academia. The items generated were evaluated and refined for inclusion in subsequent rounds. A consensus was reached after completing four rounds. Results. The panel produced a set of 26 items indicative of extra-role behaviors by faculty colleagues considered to compose a measure of citizenship, which is an expressed manifestation of collegiality. Conclusions. The items generated require testing for validation and reliability in a large sample to create a measure of organizational citizenship. Even prior to doing so, the list of items can serve as a resource for mentorship of junior and senior faculty alike. PMID:28179717
Malec, James F; Kean, Jacob; Altman, Irwin M; Swick, Shannon
2012-12-01
(1) To evaluate the measurement reliability and construct validity of the Mayo-Portland Adaptability Inventory, 4th revision (MPAI-4) in a sample consisting exclusively of patients with cerebrovascular accident (CVA) using single parameter (Rasch) item-response methods; (2) to examine the differential item functioning (DIF) by sex within the CVA population; and (3) to examine DIF and differential test functioning (DTF) across traumatic brain injury (TBI) and CVA samples. Retrospective psychometric analysis of rating scale data. Home- and community-based brain injury rehabilitation program. Individuals post-CVA (n=861) and individuals with TBI (n=603). Not applicable. MPAI-4. Item data on admission to community-based rehabilitation were submitted to Rasch, DIF, and DTF analyses. The final calibration in the CVA sample revealed satisfactory reliability/separation for persons (.91/3.16) and items (1.00/23.64). DIF showed that items for pain, anger, audition, and memory were associated with higher levels of disability for CVA than TBI patients; whereas, self-care, mobility, and use of hands indicated greater overall disability for TBI patients. DTF analyses showed a high degree of association between the 2 sets of items (R=.92; R(2)=.85) and, at most, a 3.7 point difference in raw scores. The MPAI-4 demonstrates satisfactory psychometric properties for use with individuals with CVA applying for interdisciplinary posthospital rehabilitation. DIF reveals clinically meaningful differences between CVA and TBI groups that should be considered in results at the item and subscale level. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Teacher Burnout: A Comparison of Two Cultures Using Confirmatory Factor and Item Response Models
Denton, Ellen-ge; Chaplin, William F.; Wall, Melanie
2014-01-01
The present study addresses teacher burnout and in particular cultural differences and similarities in burnout. We used the Maslach Burnout Inventory Education Survey (MBI-ES) as the starting point for developing a latent model of burnout in two cultures; Jamaica W.I. teachers (N= 150) and New York City teachers (N= 150). We confirm a latent 3 factor structure, using a subset of the items from the MBI-ES that adequately fit both samples. We tested different degrees of measurement invariance (model fit statistics, scale reliabilities, residual variances, item thresholds, and total variance) to describe and compare cultural differences. Results indicate some differences between the samples at the structure and item levels. We found that factor variances were slightly higher in the New York City teacher sample. Emotional Exhaustion (EE) was a more informative construct for differentiating among teachers at moderate levels of burnout, as opposed to extreme high or low levels of burnout, in both cultures. In contrast, Depersonalization in the Workplace (DW) was more informative at the more extreme levels of burnout among both teacher samples. By studying the influence of culture on the experience of burnout we can further our understanding of burnout and potentially discover factors that might prevent burnout among primary and secondary school teachers. PMID:25729572
Testing for lead in toys at day care centers.
Sanders, Martha; Stolz, Julie; Chacon-Baker, Ashley
2013-01-01
Exposure to lead-based paint or material has been found to impact children's cognitive and behavioral development at blood lead levels far below current standards. The purpose of the project was to screen for lead in toy items in daycare centers in order to raise awareness of inside environmental lead exposures and minimize lead-based exposures for children. Occupational therapy students in a service learning class tested for lead in ten daycare or public centers using the XRF Thermo Scientific Niton XL3t, a method accepted by the Consumer Product Safety Commission (CPSC). A total of 460 items were tested over a two-month period for an average of 66 toys per setting. Fifty six (56) items tested > 100 ppm, which represented 12% of the entire sample. Items with high lead levels included selected toys constructed with lead-based paint, lead metals, plastics using lead as a color enhancer, and decorative objects. While the actual number of lead-based products is small, the cumulative exposure or habitual use may pose an unnecessary risk to children. Indoor exposures occurred for all day care centers regardless of socio-economic levels. Recommendations to minimize exposures are provided.
Standards in C.S.E. and G.C.E.: English and Mathematics. Working Paper No. 9.
ERIC Educational Resources Information Center
Schools Council, London (England).
Attainment tests in English and mathematics were administered to a total sample of 2,011/15-year old students. The English test consisted of a composition and a test battery of objective items. Marking of the composition was made by the test designer on a rapid first-impression reading. The objective test battery consisted of a comprehension test,…
Development of the Attitudes to Domestic Violence Questionnaire for Children and Adolescents.
Fox, Claire L; Gadd, David; Sim, Julius
2015-09-01
To provide a more robust assessment of the effectiveness of a domestic abuse prevention education program, a questionnaire was developed to measure children's attitudes to domestic violence. The aim was to develop a short questionnaire that would be easy to use for practitioners but, at the same time, sensitive enough to pick up on subtle changes in young people's attitudes. We therefore chose to ask children about different situations in which they might be willing to condone domestic violence. In Study 1, we tested a set of 20 items, which we reduced by half to a set of 10 items. The factor structure of the scale was explored and its internal consistency was calculated. In Study 2, we tested the factor structure of the 10-item Attitudes to Domestic Violence (ADV) Scale in a separate calibration sample. Finally, in Study 3, we then assessed the test-retest reliability of the 10-item scale. The ADV Questionnaire is a promising tool to evaluate the effectiveness of domestic abuse education prevention programs. However, further development work is necessary. © The Author(s) 2014.
Gilkison, C R; Fenton, M V; Lester, J W
1992-05-01
This study was designed to establish the reliability of a health history questionnaire used as a screening tool for incoming university students. The authors used a test-retest design, with a test interval of 6 months, on a sample of medical and nursing students. The analysis focused on overall reliability of the questionnaire and reproducibility of specific items, based on question format. Questionnaire items of specific interest were those with dichotomous yes/no response options versus open-ended format questions, those using the words frequently or recently, or those that asked multiple questions. Demographic characteristics of the subjects were considered in the evaluation of reliability. Overall reliability of the questionnaire (93.6%) was above the anticipated level of 90%, and subject sex or program of study did not show any significant differences in reproducibility of responses. Although wording of questions did not affect item reliability, dichotomous format questions demonstrated a higher degree of reliability (96.4%) than the overall reliability of the questionnaire. Recommendations for enhancing the reliability of the questionnaire are based on item analysis and information gathered from interviews with subjects.
Development and validation of the Current Opioid Misuse Measure.
Butler, Stephen F; Budman, Simon H; Fernandez, Kathrine C; Houle, Brian; Benoit, Christine; Katz, Nathaniel; Jamison, Robert N
2007-07-01
Clinicians recognize the importance of monitoring aberrant medication-related behaviors of chronic pain patients while being prescribed opioid therapy. The purpose of this study was to develop and validate the Current Opioid Misuse Measure (COMM) for those pain patients already on long-term opioid therapy. An initial pool of 177 items was developed with input from 26 pain management and addiction specialists. Concept mapping identified six primary concepts underlying medication misuse, which were used to develop an initial item pool. Twenty-two pain and addiction specialists rated the items on importance and relevance, resulting in selection of a 40-item alpha COMM. Final item selection was based on empirical evaluation of items with patients taking opioids for chronic, noncancer pain (N=227). One-week test-retest reliability was examined with 55 participants. All participants were administered the alpha version of the COMM, the Prescription Drug Use Questionnaire (PDUQ) interview, and submitted a urine sample for toxicology screening. Physician ratings of patient aberrant behaviors were also obtained. Of the 40 items, 17 items appeared to adequately measure aberrant behavior, demonstrating excellent internal consistency and test-retest reliability. Cutoff scores were examined using ROC curve analysis and reasonable sensitivity and specificity were established. To evaluate the COMM's ability to capture change in patient status, it was tested on a subset of patients (N=86) that were followed and reassessed three months later. The COMM was found to have promise as a brief, self-report measure of current aberrant drug-related behavior. Further cross-validation and replication of these preliminary results is pending.
Evaluating Instrument Quality in Science Education: Rasch-based analyses of a Nature of Science test
NASA Astrophysics Data System (ADS)
Neumann, Irene; Neumann, Knut; Nehm, Ross
2011-07-01
Given the central importance of the Nature of Science (NOS) and Scientific Inquiry (SI) in national and international science standards and science learning, empirical support for the theoretical delineation of these constructs is of considerable significance. Furthermore, tests of the effects of varying magnitudes of NOS knowledge on domain-specific science understanding and belief require the application of instruments validated in accordance with AERA, APA, and NCME assessment standards. Our study explores three interrelated aspects of a recently developed NOS instrument: (1) validity and reliability; (2) instrument dimensionality; and (3) item scales, properties, and qualities within the context of Classical Test Theory and Item Response Theory (Rasch modeling). A construct analysis revealed that the instrument did not match published operationalizations of NOS concepts. Rasch analysis of the original instrument-as well as a reduced item set-indicated that a two-dimensional Rasch model fit significantly better than a one-dimensional model in both cases. Thus, our study revealed that NOS and SI are supported as two separate dimensions, corroborating theoretical distinctions in the literature. To identify items with unacceptable fit values, item quality analyses were used. A Wright Map revealed that few items sufficiently distinguished high performers in the sample and excessive numbers of items were present at the low end of the performance scale. Overall, our study outlines an approach for how Rasch modeling may be used to evaluate and improve Likert-type instruments in science education.
Subjective health literacy: Development of a brief instrument for school-aged children.
Paakkari, Olli; Torppa, Minna; Kannas, Lasse; Paakkari, Leena
2016-12-01
The present paper focuses on the measurement of health literacy (HL), which is an important determinant of health and health behaviours. HL starts to develop in childhood and adolescence; hence, there is a need for instruments to monitor HL among younger age groups. These instruments are still rare. The aim of the project reported here was, therefore, to develop a brief, multidimensional, theory-based instrument to measure subjective HL among school-aged children. The development of the instrument covered four phases: item generation based on a conceptual framework; a pilot study ( n = 405); test-retest ( n = 117); and construction of the instrument ( n = 3853). All the samples were taken from Finnish 7th and 9th graders. Initially, 65 items were generated, of which 32 items were selected for the pilot study. After item reduction, the instrument contained 16 items. The test-retest phase produced estimates of stability. In the final phase a 10-item instrument was constructed, referred to as Health Literacy for School-Aged Children (HLSAC). The instrument exhibited a high Cronbach alpha (0.93), and included two items from each of the five predetermined theoretical components (theoretical knowledge, practical knowledge, critical thinking, self-awareness, citizenship). The iterative and validity-driven development process made it possible to construct a brief multidimensional HLSAC instrument. Such instruments are suitable for large-scale studies, and for use with children and adolescents. Validation will require further testing for use in other countries.
Kozinszky, Zoltan; Töreki, Annamária; Hompoth, Emőke A; Dudas, Robert B; Németh, Gábor
2017-04-01
We endeavoured to analyze the factor structure of the Edinburgh Postnatal Depression Scale (EPDS) during a screening programme in Hungary, using exploratory (EFA) and confirmatory factor analysis (CFA), testing both previously published models and newly developed theory-driven ones, after a critical analysis of the literature. Between April 2011 and January 2015, a sample of 2967 pregnant women (between 12th and 30th weeks of gestation) and 714 women 6 weeks after delivery completed the Hungarian version of the EPDS in South-East Hungary. EFAs suggested unidimensionality in both samples. 33 out of 42 previously published models showed good and 6 acceptable fit with our antepartum data in CFAs, whilst 10 of them showed good and 28 acceptable fit in our postpartum sample. Using multiple fit indices, our theory-driven anhedonia (items 1,2) - anxiety (items 4,5) - low mood (items 8,9) model provided the best fit in the antepartum sample. In the postpartum sample, our theory-driven models were again among the best performing models, including an anhedonia and an anxiety factor together with either a low mood or a suicidal risk factor (items 3,6,10). The EPDS showed moderate within- and between-culture invariability, although this would also need to be re-examined with a theory-driven approach. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Constructing three emotion knowledge tests from the invariant measurement approach
Prieto, Gerardo; Burin, Debora I.
2017-01-01
Background Psychological constructionist models like the Conceptual Act Theory (CAT) postulate that complex states such as emotions are composed of basic psychological ingredients that are more clearly respected by the brain than basic emotions. The objective of this study was the construction and initial validation of Emotion Knowledge measures from the CAT frame by means of an invariant measurement approach, the Rasch Model (RM). Psychological distance theory was used to inform item generation. Methods Three EK tests—emotion vocabulary (EV), close emotional situations (CES) and far emotional situations (FES)—were constructed and tested with the RM in a community sample of 100 females and 100 males (age range: 18–65), both separately and conjointly. Results It was corroborated that data-RM fit was sufficient. Then, the effect of type of test and emotion on Rasch-modelled item difficulty was tested. Significant effects of emotion on EK item difficulty were found, but the only statistically significant difference was that between “happiness” and the remaining emotions; neither type of test, nor interaction effects on EK item difficulty were statistically significant. The testing of gender differences was carried out after corroborating that differential item functioning (DIF) would not be a plausible alternative hypothesis for the results. No statistically significant sex-related differences were found out in EV, CES, FES, or total EK. However, the sign of d indicate that female participants were consistently better than male ones, a result that will be of interest for future meta-analyses. Discussion The three EK tests are ready to be used as components of a higher-level measurement process. PMID:28929013
Effects of Linking Methods on Detection of DIF.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
1992-01-01
Effects of the following methods for linking metrics on detection of differential item functioning (DIF) were compared: (1) test characteristic curve method (TCC); (2) weighted mean and sigma method; and (3) minimum chi-square method. With large samples, results were essentially the same. With small samples, TCC was most accurate. (SLD)
HIV-Related Stigma Among Spanish-speaking Latinos in an Emerging Immigrant Receiving City.
Dolwick Grieb, Suzanne M; Shah, Harita; Flores-Miller, Alejandra; Zelaya, Carla; Page, Kathleen R
2017-08-01
HIV-related stigma has been associated with a reluctance to test for HIV among Latinos. This study assessed community HIV-related stigma within an emerging Latino immigrant receiving city. We conducted a brief survey among a convenience sample of 312 Spanish-speaking Latinos in Baltimore, Maryland. HIV-related stigma was assessed through six items. Associations between stigma items, socio-demographic characteristics, and HIV testing history were considered. Gender, education, and religiosity were significantly associated with stigmatizing HIV-related beliefs. For example, men were 3.4 times more likely to hold more than three stigmatizing beliefs than women, and were also twice as likely as women to report feeling hesitant to test for HIV for fear of people's reaction if the test is positive. These findings can help inform future stigma interventions in this community. In particular, we were able to distinguish between drivers of stigma such as fear and moralistic attitudes, highlighting specific actionable items.
Teresi, Jeanne A; Ocepek-Welikson, Katja; Cook, Karon F; Kleinman, Marjorie; Ramirez, Mildred; Reid, M Carrington; Siu, Albert
2016-01-01
Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System ® (PROMIS ® ) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, "How much did pain interfere with enjoyment of social activities?" was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta ( θ ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness.
Teresi, Jeanne A.; Ocepek-Welikson, Katja; Cook, Karon F.; Kleinman, Marjorie; Ramirez, Mildred; Reid, M. Carrington; Siu, Albert
2017-01-01
Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System® (PROMIS®) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. Methods DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. Results The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, “How much did pain interfere with enjoyment of social activities?” was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta (θ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Conclusions Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness. PMID:28983449
Alla, Arben; Czabanowska, Katarzyna; Kijowska, Violetta; Roshi, Enver; Burazeri, Genc
2012-01-01
Our aim was to validate an international instrument measuring self-perceived competency level of family physicians in Albania. A representative sample of 57 family physicians operating in primary health care services was interviewed twice in March-April 2012 in Tirana (26 men and 31 women; median age: 46 years, inter-quartile range: 38-56 years). A structured questionnaire was administered [and subsequently re-administered after two weeks (test-retest)] to all family physicians aiming to self-assess physicians' level of abilities, skills and competencies regarding different domains of quality of health care. The questionnaire included 37 items organized into 6 subscales/domains. Answers for each item of the tool ranged from 1 ("novice" physicians) to 5 ("expert" physicians). An overall summary score (range: 37-185) and a subscale summary score for each domain were calculated for the test and retest procedures. Cronbach's alpha was used to assess the internal consistency for both the test and the retest procedures, whereas Spearman's rho was employed to assess the stability over time (test-retest reliability) of the instrument. Cronbach's alpha was 0.87 for the test and 0.86 for the retest procedure. Overall, Spearman's rho was 0.84 (P<0.001). The overall summary score for the 37 items of the instrument was 96.3±10.0 for the test and 97.3±10.1 for the retest. All the subscale summary scores were very similar for the test and the retest procedure. This study provides evidence on cross-cultural adaptation of an international instrument taping self-perceived level of competencies of family physicians in Albania. The questionnaire displayed a satisfactory internal consistency for both test and retest procedures in this sample of family physicians in Albania. Furthermore, the high test-retest reliability (stability over time) of the instrument suggests a good potential for wide scale application to nationally representative samples of family physicians in Albanian populations.
A new self-report inventory of dyslexia for students: criterion and construct validity.
Tamboer, Peter; Vorst, Harrie C M
2015-02-01
The validity of a Dutch self-report inventory of dyslexia was ascertained in two samples of students. Six biographical questions, 20 general language statements and 56 specific language statements were based on dyslexia as a multi-dimensional deficit. Dyslexia and non-dyslexia were assessed with two criteria: identification with test results (Sample 1) and classification using biographical information (both samples). Using discriminant analyses, these criteria were predicted with various groups of statements. All together, 11 discriminant functions were used to estimate classification accuracy of the inventory. In Sample 1, 15 statements predicted the test criterion with classification accuracy of 98%, and 18 statements predicted the biographical criterion with classification accuracy of 97%. In Sample 2, 16 statements predicted the biographical criterion with classification accuracy of 94%. Estimations of positive and negative predictive value were 89% and 99%. Items of various discriminant functions were factor analysed to find characteristic difficulties of students with dyslexia, resulting in a five-factor structure in Sample 1 and a four-factor structure in Sample 2. Answer bias was investigated with measures of internal consistency reliability. Less than 20 self-report items are sufficient to accurately classify students with and without dyslexia. This supports the usefulness of self-assessment of dyslexia as a valid alternative to diagnostic test batteries. Copyright © 2015 John Wiley & Sons, Ltd.
PACIC Instrument: disentangling dimensions using published validation models.
Iglesias, K; Burnand, B; Peytremann-Bridevaux, I
2014-06-01
To better understand the structure of the Patient Assessment of Chronic Illness Care (PACIC) instrument. More specifically to test all published validation models, using one single data set and appropriate statistical tools. Validation study using data from cross-sectional survey. A population-based sample of non-institutionalized adults with diabetes residing in Switzerland (canton of Vaud). French version of the 20-items PACIC instrument (5-point response scale). We conducted validation analyses using confirmatory factor analysis (CFA). The original five-dimension model and other published models were tested with three types of CFA: based on (i) a Pearson estimator of variance-covariance matrix, (ii) a polychoric correlation matrix and (iii) a likelihood estimation with a multinomial distribution for the manifest variables. All models were assessed using loadings and goodness-of-fit measures. The analytical sample included 406 patients. Mean age was 64.4 years and 59% were men. Median of item responses varied between 1 and 4 (range 1-5), and range of missing values was between 5.7 and 12.3%. Strong floor and ceiling effects were present. Even though loadings of the tested models were relatively high, the only model showing acceptable fit was the 11-item single-dimension model. PACIC was associated with the expected variables of the field. Our results showed that the model considering 11 items in a single dimension exhibited the best fit for our data. A single score, in complement to the consideration of single-item results, might be used instead of the five dimensions usually described. © The Author 2014. Published by Oxford University Press in association with the International Society for Quality in Health Care; all rights reserved.
Reliability of self-rated tinnitus distress and association with psychological symptom patterns.
Hiller, W; Goebel, G; Rief, W
1994-05-01
Psychological complaints were investigated in two samples of 60 and 138 in-patients suffering from chronic tinnitus. We administered the Tinnitus Questionnaire (TQ), a 52-item self-rating scale which differentiates between dimensions of emotional and cognitive distress, intrusiveness, auditory perceptual difficulties, sleep disturbances and somatic complaints. The test-retest reliability was .94 for the TQ global score and between .86 and .93 for subscales. Three independent analyses were conducted to estimate the split-half reliability (internal consistency) which was only slightly lower than the test-retest values for scales with a relatively small number of items. Reliability was sufficient also on the level of single items. Low correlation between the TQ and the Hopkins Symptom Checklist (SCL-90-R) indicate a distinct quality of tinnitus-related and general psychological disturbances.
Joiner, Kevin L; Sternberg, Rosa Maria; Kennedy, Christine; Chen, Jyu-Lin; Fukuoka, Yoshimi; Janson, Susan L
2016-12-01
Create a Spanish-language version of the Risk Perception Survey for Developing Diabetes (RPS-DD) and assess psychometric properties. The Spanish-language version was created through translation, harmonization, and presentation to the tool's original author. It was field tested in a foreignborn Latino sample and properties evaluated in principal components analysis. Personal Control, Optimistic Bias, and Worry multi-item Likert subscale responses did not cluster together. A clean solution was obtained after removing two Personal Control subscale items. Neither the Personal Disease Risk scale nor the Environmental Health Risk scale responses loaded onto single factors. Reliabilities ranged from .54 to .88. Test of knowledge performance varied by item. This study contributes to evidence of validation of a Spanish-language RPS-DD in foreign-born Latinos.
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.
Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E
2018-02-02
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Assessing psychological well-being: self-report instruments for the NIH Toolbox.
Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David
2014-02-01
Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.
NASA Astrophysics Data System (ADS)
Prasetya, A. T.; Ridlo, S.
2018-03-01
The purpose of this study is to test the learning motivation of science instruments and compare the learning motivation of science from chemistry and biology teacher candidates. Kuesioner Motivasi Sains (KMS) in Indonesian adoption of the Science Motivation Questionnaire II (SMQ II) consisting of 25 items with a 5-point Likert scale. The number of respondents for the Exploratory Factor Analysis (EFA) test was 312. The Kaiser-Meyer-Olkin (KMO), determinant, Bartlett’s Sphericity, Measures of Sampling Adequacy (MSA) tests against KMS using SPSS 20.0, and Lisrel 8.51 software indicate eligible indications. However testing of Communalities obtained results that there are 4 items not qualified, so the item is discarded. The second test, all parameters of eligibility and has a magnitude of Root Mean Square Error of Approximation (RMSEA), P-Value for the Test of Close Fit (RMSEA <0.05), Goodness of Fit Index (GFI) was good. The new KMS with 21 valid items and composite reliability of 0.9329 can be used to test the level of learning motivation of science which includes Intrinsic Motivation, Sefl-Efficacy, Self-Determination, Grade Motivation and Career Motivation for students who master the Indonesian language. KMS trials of chemistry and biology teacher candidates obtained no significant difference in the learning motivation between the two groups.
Factors Affecting the Transfer of Basic Combat Skills Training in the Air Force
2006-03-01
Kaiser - Meyer - Olkin Measure of Sampling Adequacy (KMO) and Bartlett’s test of Sphericity. The items reported a KMO=.87 and χ2 = 5,158.57, p < .01...Results Factor Analysis Table E1 Kaiser - Meyer - Olkin (KMO) and Bartlett’s Test of Sphericity for Perceived Training Transfer and Transfer Enhancing...Activities KMO Χ2 df Sig. Kaiser - Meyer - Olkin Measure of Sampling Adequacy .87 Bartletts Test of Sphericity 5,158.57 66 .000 100
Development of the Serenity Scale.
Roberts, K T; Aspy, C B
1993-01-01
Serenity is a sustained inner peace. Nurses can use knowledge about serenity to help clients cope with harsh circumstances. The Serenity Scale is a 40-item self-report, summated scale that evaluates clients' serenity status. Critical attributes, identified by serenity experts, served as the theoretical framework. Sixty-five items were given to 542 male and female subjects age 20 to 95 (73% Caucasians and 27% minority) from varying income and educational levels yielding an alpha of .93. Forty items (SS.V2) were extracted for further analysis. The alpha coefficient was .92 with item-to-total correlations ranging from .25 to .67. Item means ranged from 2.6-3.7 (grand mean = 3.4). A principal components factor analysis with varimax rotation revealed nine factors explaining 58.2% of the variance. Limitations are that SS.V2 has not been tested with an independent sample and subjects with low educational levels had difficulty with some items.
Development and validation of an asthma first aid knowledge questionnaire.
Luckie, Kate; Pang, Tsz Chun; Kritikos, Vicky; Saini, Bandana; Moles, Rebekah Jane
2018-05-01
There is no gold standard outcome assessment for asthma first-aid knowledge. We therefore aimed to develop and validate an asthma first-aid knowledge questionnaire (AFAKQ) to be used before and after educational interventions. The AFAKQ was developed based on a content analysis of existing asthma knowledge questionnaires and current asthma management guidelines. Content and face validity was performed by a review panel consisting of expert respiratory physicians, researchers and parents of school aged children. A 21 item questionnaire was then pilot tested among a sample of caregivers, health professionals and pharmacy students. Exploratory Factor analysis was performed to determine internal consistency. The initial 46 item version of the AFAKQ, was reduced to 21 items after revision by the expert panel. This was then pilot tested amongst 161 participants and further reduced to 14 items. The exploratory factor analysis revealed a parsimonious one factor solution with a Cronbach's Alpha of 0.77 with the 14 item AFAKQ. The AFAKQ is a valid tool ready for application in evaluating the impact of educational interventions on asthma first-aid knowledge. Copyright © 2017 Elsevier Inc. All rights reserved.
Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.
2011-01-01
Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
A Litmus Test for Performance Assessment.
ERIC Educational Resources Information Center
Finson, Kevin D.; Beaver, John B.
1992-01-01
Presents 10 guidelines for developing performance-based assessment items. Presents a sample activity developed from the guidelines. The activity tests students ability to observe, classify, and infer, using red and blue litmus paper, a pH-range finder, vinegar, ammonia, an unknown solution, distilled water, and paper towels. (PR)
Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Bombardier, Charles H; Pohlig, Ryan T; Heinemann, Allen W; Carle, Adam; Choi, Seung W
2015-05-01
To develop a calibrated spinal cord injury-quality of life (SCI-QOL) item bank, computer adaptive test (CAT), and short form to assess depressive symptoms experienced by individuals with SCI, transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a crosswalk to the Patient Health Questionnaire (PHQ)-9. We used grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, item response theory (IRT) analyses, and statistical linking techniques to transform scores to a PROMIS metric and to provide a crosswalk with the PHQ-9. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. Spinal Cord Injury--Quality of Life (SCI-QOL) Depression Item Bank Individuals with SCI were involved in all phases of SCI-QOL development. A sample of 716 individuals with traumatic SCI completed 35 items assessing depression, 18 of which were PROMIS items. After removing 7 non-PROMIS items, factor analyses confirmed a unidimensional pool of items. We used a graded response IRT model to estimate slopes and thresholds for the 28 retained items. The SCI-QOL Depression measure correlated 0.76 with the PHQ-9. The SCI-QOL Depression item bank provides a reliable and sensitive measure of depressive symptoms with scores reported in terms of general population norms. We provide a crosswalk to the PHQ-9 to facilitate comparisons between measures. The item bank may be administered as a CAT or as a short form and is suitable for research and clinical applications.
Toward a Principled Sampling Theory for Quasi-Orders
Ünlü, Ali; Schrepp, Martin
2016-01-01
Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets. PMID:27965601
Toward a Principled Sampling Theory for Quasi-Orders.
Ünlü, Ali; Schrepp, Martin
2016-01-01
Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets.
[French version of TASTE (test for the ability and evaluation)].
Masson, A M; Cadot, M; Pereira, A M; Depreeuw, E; Ansseau, M
2001-01-01
Ability to study and evaluation is only one example of performance among many others but research and publications concerning this issue for more than 50 years, especially in the context of test anxiety and need of achievement, conferred upon it a prototypical dimension. Investigations about motivation also stimulate many scientists and constitute another foundation of this study (13). The level of performance depends on knowledge and motivation (33). Time devoted to study is essential to succeed; so motivation and procrastination are in competition. The importance of reinforcement (extrinsical motivation) and the desire for learning and knowing (intrinsical motivation) are determinant. Other elements must be emphasized: guarantee of obtaining rewards, self efficacy and causal attribution. These considerations point out the multidimensional and interactive aspects of test anxiety (7, 31). The number of components is not described unanimously but experts agree with emotional, cognitive and behavioral dimensions (25). So, anxiety was approached in its motivational properties, and it was the case until the sixties, in terms of drive corresponding to a need like thirst or hunger (18); then it was conceptualized in a dynamic context broader than that of stress and coping (29, 30). Last, it constitutes the object of theories highlighting cognitive interference (9, 23, 26) or defective skills (8, 32). A lot of questionnaires were built without answering the different aspects and for instance without linking the theoretical and therapeutic components concerning this problem. Committed to the traditional fields of research (test anxiety and need of achievement), to Weiner's work about attribution theory (34) and that of Bandura in self efficacy (4, 5), E. Depreeuw (10) was particularly interested in Heckhausen's model (16, 17), trying to associate experimental conceptions with the clinical reality. On this basis, he elaborated the TASTE (10, 12, 20): test for ability to study and evaluation. At first constituted of 121 items, the self-questionnaire, after factorial analysis, was reduced to 78 items assessing 4 factors which represent the 8 components of Heckhausen's model (16). The first factor (30 items) concerns anxiety in its emotional and cognitive characteristics. Interesting data were observed by Depreeuw and confirmed in the Netherlands and in Greece (19), especially the fact that anxiety of girls is higher than that of boys when they are confronted with an exam. The second one (19 items) represents self-confidence: confidence in ability to succeed and on using adequate strategies. The third factor (14 items) squared with value attributed to performance. It especially comments on the intrinsical component of motivation. The fourth factor (15 items) corresponds to procrastination; study is postponed on behalf of activities which actually reduce the achievement process. The self-report questionnaire (where answers run from total disagreement to total agreement whether they correspond to the way of thinking or acting) takes account of the emotional, cognitive and behavioral dimensions of the model. The multidimensional aspect of this questionnaire, but mainly its connections with theory and clinic, are convincing; we adopted it with other tests in research about fear of failure. Validation of the French translation is the subject of this article. In order to realize this operation, we chose the initial version of TASTE (121 items) to assess a Belgian sample of french-speaking students in the first year of University of Liège (n = 617). They are differentiated by gender, faculty (Economy, Medicine, Philosophy, Psychology, Applied Sciences) and experience of failure, i.e. the fact of repeating a school year. Statistics were realized with SPSS and SAS. A comprehension test and a back translation are satisfactory. The stability over time too: the one week test-retest was achieved with a sample of 33 student nurses; comparing the factors two by two, intra-class correlations ranged from 0.5 to 0.95. The component analysis with Varimax rotation does not allow us to find the four factors of construction. We obtain the same disappointing results with the version of 78 items. According to the screentest, we adopted the solution of five factors which confirms the original construction. A fifth particularly strong factor (Cronbach's alpha 0.82) corresponds to devalorisation. The internal reliability is very satisfactory. If we consider the items strong loading (more than 0.30), the French version is constituted of 94 items. If we consider the items which are specifically loading each factor, we can reduce the questionnaire to 50 items. Internal reliability remains the same. Correlations between the data obtained with this brief version are satisfactory; comparing the factors in pairs. Pearson's coefficients range from 0.8 to 0.89. The study of E. Depreeuw was realized ten years ago with a Belgian sample of Flemish-speaking students; is the cultural context or an evolution in test anxiety which explains such a difference in the results? We are now assessing a sample in another country which used the TASTE in 1996 in order to have some information about comparison of samples. In any case, the questionnaire of E. Depreeuw respects the multidimensional aspect of test anxiety. Our choice of 5 factors according to principal components confirms the original structure of the test but also discriminates another factor which could be predictive of psychopathology (panic, depression). This new dimension measures negative perception of ability and achievement especially when compared with other students. One aim of such a study is to discern clusters of students i.e. groups of students with characteristics such as Covington (7), Depreeuw (20) and Zeidner (35) describe them. This description is very important; the French version is only a part of a very broader study including other tests and could differentiate student's profiles. The French version of 5 factors finally gives results in agreement with the original. The items (10 per factor) are selected according to their highest specificity and after elimination of redundancy. Validity remains in the short version, making it more useful in clinical practice.
Niklasson, Johan; Conradsson, Mia; Hörnsten, Carl; Nyqvist, Fredrica; Padyab, Mojgan; Nygren, Björn; Olofsson, Birgitta; Lövheim, Hugo; Gustafson, Yngve
2015-11-01
Morale is related to psychological well-being and quality of life in older people. The Philadelphia Geriatric Center Morale Scale (PGCMS) is widely used to assess morale. The purpose of this study was to evaluate the psychometric properties and feasibility of the Swedish version of the 17-item PGCMS among very old people. The Umeå 85+/GERDA study included Swedish-speaking people aged 85, 90 and 95 years and older, from Sweden and Finland. Participants were interviewed in their own homes using a predefined set of questions. In the main sample, 493 individuals answered all 17 PGCMS items (aged 89.0 ± 4.3 years). Another 105 answered between 1 and 16 questions (aged 89.6 ± 4.4 years). A convenience sample was also collected, and 54 individuals answered all 17 PGCMS items twice (aged 84.7 ± 6.7 years). The same assessor restated the questions within 1 week. Cronbach's alpha was 0.74 among those who answered all 17 questions in the main sample. Confirmatory factor analysis was used to test the construct validity of the most widely used version of the PGCMS, with 17 items and three factors, and showed a generally good fit. Among those answering between 1 and 17 PGCMS questions, 92.6 % (554/598) answered 16 or 17. The convenience sample was used for intra-rater test-retesting, and the intraclass correlation coefficient (ICC) was 0.89. The least significant change between two assessments, with 95 % confidence interval, was 3.53 PGCMS points. The Swedish version of the PGCMS seems to have satisfactory psychometric properties and feasibility among very old people.
Nielsen, Marie Germund; Ørnbøl, Eva; Vestergaard, Mogens; Bech, Per; Christensen, Kaj Sparle
2017-06-01
We aimed to assess the measurement properties of the ten-item Major Depression Inventory when used on clinical suspicion in general practice by performing a Rasch analysis. General practitioners asked consecutive persons to respond to the web-based Major Depression Inventory on clinical suspicion of depression. We included 22 practices and 245 persons. Rasch analysis was performed using RUMM2030 software. The Rasch model fit suggests that all items contribute to a single underlying trait (defined as internal construct validity). Mokken analysis was used to test dimensionality and scalability. Our Rasch analysis showed misfit concerning the sleep and appetite items (items 9 and 10). The response categories were disordered for eight items. After modifying the original six-point to a four-point scoring system for all items, we achieved ordered response categories for all ten items. The person separation reliability was acceptable (0.82) for the initial model. Dimensionality testing did not support combining the ten items to create a total score. The scale appeared to be well targeted to this clinical sample. No significant differential item functioning was observed for gender, age, work status and education. The Rasch and Mokken analyses revealed two dimensions, but the Major Depression Inventory showed fit to one scale if items 9 and 10 were excluded. Our study indicated scalability problems in the current version of the Major Depression Inventory. The conducted analysis revealed better statistical fit when items 9 and 10 were excluded. Copyright © 2017 Elsevier Inc. All rights reserved.
Farin, Erik; Nagl, Michaela; Gramm, Lukas; Heyduck, Katja; Glattacker, Manuela
2014-05-01
Study aim was to translate the PROMIS(®) pain interference (PI) item bank (41 items) into German, test its psychometric properties in patients with chronic low back pain and develop static subforms. We surveyed N = 262 patients undergoing rehabilitation who were asked to fill out questionnaires at the beginning and 2 weeks after the end of rehabilitation, applying the Oswestry Disability Index (ODI) and Pain Disability Index (PDI) in addition to the PROMIS(®) PI items. For psychometric testing, a 1-parameter item response theory (IRT) model was used. Exploratory and confirmatory factor analyses as well as reliability and construct validity analyses were conducted. The assumptions regarding IRT scaling of the translated PROMIS(®) PI item bank as a whole were not confirmed. However, we succeeded in devising three static subforms (PI-G scales: PI mental 13 items, PI functional 11 items, PI physical 4 items), revealing good psychometric properties. The PI-G scales in their static form can be recommended for use in German-speaking countries. Their strengths versus the ODI and PDI are that pain interference is assessed in a differentiated manner and that several psychometric values are somewhat better than those associated with the ODI and PDI (distribution properties, IRT model fit, reliability). To develop an IRT-scaled item bank of the German translations of the PROMIS(®) PI items, it would be useful to have additional studies (e.g., with larger sample sizes and using a 2-parameter IRT model).
Validity and reliability of the Utrecht Work Engagement Scale-Student Version in Sri Lanka.
Wickramasinghe, Nuwan Darshana; Dissanayake, Devani Sakunthala; Abeywardena, Gihan Sajiwa
2018-05-04
The present study was aimed at assessing the validity and the reliability of the Sinhala version of the Utrecht Work Engagement Scale-Student Version (UWES-S) among collegiate cycle students in Sri Lanka. The 17-item UWES-S was translated to Sinhala and the judgmental validity was assessed by a multi-disciplinary panel of experts. Construct validity of the UWES-S was appraised by using multi-trait scaling analysis and exploratory factor analysis (EFA) on data obtained from a sample of 194 grade thirteen students in the Kurunegala district, Sri Lanka. Reliability of the UWES-S was assessed by using internal consistency and test-retest reliability. Except for item 13, all other items showed good psychometric properties in judgemental validity, item-convergent validity and item-discriminant validity. EFA using principal component analysis with Oblimin rotation, suggested a three-factor solution (including vigor, dedication and absorption subscales) explaining 65.4% of the total variance for the 16-item UWES-S (with item 13 deleted). All three subscales show high internal consistency with Cronbach's α coefficient values of 0.867, 0.819, and 0.903 and test-retest reliability was high (p < 0.001). Hence, the Sinhala version of the 16-item UWES-S is a valid and a reliable instrument to assess work engagement among collegiate cycle students in Sri Lanka.
ERIC Educational Resources Information Center
Hansen, Mark; Cai, Li; Monroe, Scott; Li, Zhen
2014-01-01
It is a well-known problem in testing the fit of models to multinomial data that the full underlying contingency table will inevitably be sparse for tests of reasonable length and for realistic sample sizes. Under such conditions, full-information test statistics such as Pearson's X[superscript 2]?? and the likelihood ratio statistic…
Simon, Daniela; Kriston, Levente; Loh, Andreas; Spies, Claudia; Scheibler, Fueloep; Wills, Celia; Härter, Martin
2010-09-01
Validation of the German version of the Autonomy-Preference-Index (API), a measure of patients' preferences for decision making and information seeking. Stepwise confirmatory factor analysis was conducted on a sample of patients (n = 1592) treated in primary care for depression (n = 186), surgical and internal medicine inpatients (n = 811) and patients with minor trauma treated in an emergency department (n = 595). An initial test of the model was done on calculation and validation halves of the sample. Both local and global indexes-of-fit suggested modifications to the scale. The scale was modified and re-tested in the calculation sample and confirmed in the validation sample. Subgroup analyses for age, gender and type of treatment setting were also performed. The confirmatory analysis led to a modified version of the API with better local and global indexes-of-fit for samples of German-speaking patients. Two items of the sub-scale, 'preference for decision-making', and one item of the sub-scale, 'preference for information seeking', showed very low reliability scores and were deleted. Thus, several global indexes-of-fit clearly improved significantly. The modified scale was confirmed on the validation sample with acceptable to good indices of fit. Results of subgroup analyses indicated that no adaptations were necessary. This first confirmatory analysis for a German-speaking population showed that the API was improved by the removal of several items. There were theoretically plausible explanations for this improvement suggesting that the modifications might also be appropriate in English and other language versions.
Johnson-Greene, Doug; McCaul, Mary E; Roger, Patricia
2009-09-01
Effective and valid screening methods are needed to identify hazardous drinking in elderly persons with new onset acute medical illness. The goal of the current study was to examine the effectiveness of the Michigan Alcohol Screening Test-Geriatric Version (MAST-G) in identifying hazardous drinking among elderly patients with acute cerebrovascular accidents (CVA) and to compare the effectiveness of 2 shorter versions of the MAST-G with the full instrument. The study sample included 100 men and women who averaged 12 days posthemorrhagic or ischemic CVA admitted to a rehabilitation unit and who were at least 50 years of age and free of substance use other than alcohol. This cross-sectional validation study compared the 24-item full MAST-G, the 10-item Short MAST-G (SMAST-G), and a 2-item regression analysis derived Mini MAST-G (MMAST-G) to the reference standard of hazardous drinking during the past 3 months. Alcohol use was collected using the Timeline Followback (TLFB). Recent and lifetime alcohol-related consequences were collected using the Short Inventory of Problems (SIP). Nearly one-third (28%) of the study sample met the World Health Organization (WHO) criteria for hazardous drinking. Moderately strong associations were found for the MAST-G, SMAST-G, and MMAST-G with alcohol quantity and frequency and recent and lifetime alcohol consequences. All 3 MAST-G versions could differentiate hazardous from nonhazardous drinkers and had nearly identical area under the curve characteristics. Comparable sensitivity was found across the 3 MAST-G measures. The optimal screening threshold for hazardous drinking was 5 for the MAST-G, 2 for the SMAST-G, and 1 for the MMAST-G. The 10-item SMAST-G and 2-item MMAST-G are brief screening tests that show comparable effectiveness in detecting hazardous drinking in elderly patients with acute CVA compared with the full 24-item MAST-G. Implications for research and clinical practice are discussed.
Xiao, Yu-Ying; Li, Ting; Xiao, Lin; Wang, Su-Wei; Wang, Si-Qi; Wang, Han-Xiao; Wang, Bei-Bei; Gao, Yu-Lin
2017-02-01
Professional attitude is of great importance for nursing talents in the modern society. To develop an effective educational program for student nurses in China, an appropriate instrument is required for the assessment of their professional attitude. To assess the validity and reliability of the Instrument of Professional Attitude for Student Nurses (IPASN) in Chinese version. The original version of IPASN was translated through Brislin model (translation, back translation, culture adaption and pilot study) with the authorization from the developer. A total of 681 nursing students were chosen by stratified convenience sampling to assess construct validity using exploratory factor analysis (EFA). Besides, item analysis, Cronbach's alpha coefficients, test-retest reliability were conducted to test the psychometric properties in this part. A total of 204 nursing undergraduate trainees were selected by cluster convenience sampling to confirm the structure using confirmatory factor analysis (CFA) in another time. Corrected item-total correlations, alpha if item deleted were between 0.33 and 0.69, 0.906 and 0.913, respectively, indicating no item should be deleted. Cronbach alpha value was 0.91 for the total scale and Cronbach alpha coefficient for subscales ranged from 0.67 to 0.89. Test-retest reliability estimated from intraclass correlation coefficient (ICC) was 0.74 (P<0.05). Differences in item scores between the high-score group (the first 27%) and low-score group (the last 27%) were significant (P<0.001), indicating that the item discrimination ability was good. Seven subscales (contribution to increase of scientific information load, autonomy, community service, continuous education, to promote professional development, cooperation and theory guiding practice) were identified in EFA and confirmed in CFA, and explained 65.5% of the total variance. It indicated that the Chinese version of IPASN was valid and reliable for the evaluation of nursing students' professional attitude. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lix, Lisa M; Wu, Xiuyun; Hopman, Wilma; Mayo, Nancy; Sajobi, Tolulope T; Liu, Juxin; Prior, Jerilynn C; Papaioannou, Alexandra; Josse, Robert G; Towheed, Tanveer E; Davison, K Shawn; Sawatzky, Richard
2016-01-01
Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample. Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects. The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size. SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
Salsman, John M; Victorson, David; Choi, Seung W; Peterman, Amy H; Heinemann, Allen W; Nowinski, Cindy; Cella, David
2013-11-01
To develop and validate an item-response theory-based patient-reported outcomes assessment tool of positive affect and well-being (PAW). This is part of a larger NINDS-funded study to develop a health-related quality of life measurement system across major neurological disorders, called Neuro-QOL. Informed by a literature review and qualitative input from clinicians and patients, item pools were created to assess PAW concepts. Items were administered to a general population sample (N = 513) and a group of individuals with a variety of neurologic conditions (N = 581) for calibration and validation purposes, respectively. A 23-item calibrated bank and a 9-item short form of PAW was developed, reflecting components of positive affect, life satisfaction, or an overall sense of purpose and meaning. The Neuro-QOL PAW measure demonstrated sufficient unidimensionality and displayed good internal consistency, test-retest reliability, model fit, convergent and discriminant validity, and responsiveness. The Neuro-QOL PAW measure was designed to aid clinicians and researchers to better evaluate and understand the potential role of positive health processes for individuals with chronic neurological conditions. Further psychometric testing within and between neurological conditions, as well as testing in non-neurologic chronic diseases, will help evaluate the generalizability of this new tool.
Dragesund, Tove; Strand, Liv Inger; Grotle, Margreth
2018-02-01
The Body Awareness Rating Questionnaire (BARQ) is a self-report questionnaire aimed at capturing how people with long-lasting musculoskeletal pain reflect on their own body awareness. Methods based on classical test theory were applied to the development of the instrument and resulted in 4 subscales. However, the scales were not correlated, and construct validity might be questioned. The primary purpose of this study was to explore the possibility of developing a unidimensional scale from items initially collected for the BARQ using Rasch analysis. A secondary purpose was to investigate the test-retest reliability of a revised version of the BARQ. This was a methodological study. Rasch and reliability analyses were performed for 3 samples of participants with long-lasting musculoskeletal pain. The first Rasch analysis was carried out on 66 items generated for the original BARQ and scored by 300 participants. The items supported by the first analysis were scored by a new group of 127 participants and analyzed in a second Rasch analysis. For the test-retest reliability analysis, 48 participants scored the revised BARQ items twice within 1 week. The 2-step Rasch analysis resulted in a unidimensional 12-item revised version of the BARQ with a 4-point response scale (scores from 0 to 36). It showed a good fit to the Rasch model, with acceptable internal consistency, satisfactory fit residuals, and no disordered thresholds. Test-retest reliability was high, with an intraclass correlation coefficient of .83 (95% CI = .71-.89) and a smallest detectable change of 6.3 points. The small sample size in the second Rasch analysis was a study limitation. The revised BARQ is a unidimensional and feasible measurement of body awareness, recommended for use in the context of body-mind physical therapy approaches for musculoskeletal conditions. © 2017 American Physical Therapy Association
Lower-fat menu items in restaurants satisfy customers.
Fitzpatrick, M P; Chapman, G E; Barr, S I
1997-05-01
To evaluate a restaurant-based nutrition program by measuring customer satisfaction with lower-fat menu items and assessing patrons' reactions to the program. Questionnaires to assess satisfaction with menu items were administered to patrons in eight of the nine restaurants that volunteered to participate in the nutrition program. One patron from each participating restaurant was randomly selected for a semistructured interview about nutrition programming in restaurants. Persons dining in eight participating restaurants over a 1-week period (n = 686). Independent samples t tests were used to compare respondents' satisfaction with lower-fat and regular menu items. Two-way analysis of variance tests were completed using overall satisfaction as the dependent variable and menu-item classification (ie, lower fat or regular) and one of eight other menu item and respondent characteristics as independent variables. Qualitative methods were used to analyze interview transcripts. Of 1,127 menu items rated for satisfaction, 205 were lower fat, 878 were regular, and 44 were of unknown classification. Customers were significantly more satisfied with lower-fat than with regular menu items (P < .001). Overall satisfaction did not vary by any of the other independent variables. Interview results indicate the importance of restaurant during as an indulgent experience. High satisfaction with lower-fat menu items suggests that customers will support restaurant providing such choices. Dietitians can use these findings to encourage restaurateurs to include lower-fat choices on their menus, and to assure clients that their expectations of being indulged are not incompatible with these choices.
Owens, Sherry; Kristjansson, Alfgeir L; Hunte, Haslyn E R
2015-11-05
We investigated whether individual items on the nine item William's Perceived Everyday Discrimination Scale (EDS) functioned differently by age (<45 vs ≥ 45) within five racial groups in the United States: Asians (n=2,017); Hispanics (n=2,688); Black Caribbeans (n=1,377); African Americans (n=3,434); and Whites (n=854). We used data from the 2001-2003 National Survey of American Lives and the 2001-2003 National Latino and Asian Studies. Multiple-indicator, multiple-cause models (MIMIC) were used to examine differential item functioning (DIF) on the EDS by age within each racial/ethnic group. Overall, Asian and Hispanic respondents reported less discrimination than Whites; on the other hand, African Americans and Black Caribbeans reported more discrimination than Whites. Regardless of race/ethnicity, the younger respondents (aged <45 years) reported less discrimination than the older respondents (aged ≥ 45 years). In terms of age by race/ethnicity, the results were mixed for 19 out of 45 tests of DIF (40%). No differences in item function were observed among Black Caribbeans. "Being called names or insulted" and others acting as "if they are afraid" of the respondents were the only two items that did not exhibit differential item functioning by age across all racial/ethnic groups. Overall, our findings suggest that the EDS scale should be used with caution in multi-age multi-racial/ethnic samples.
Tsubakita, Takashi; Shimazaki, Kazuyo; Ito, Hiroshi; Kawazoe, Nobuo
2017-10-30
The Utrecht Work Engagement Scale for Students has been used internationally to assess students' academic engagement, but it has not been analyzed via item response theory. The purpose of this study was to conduct an item response theory analysis of the Japanese version of the Utrecht Work Engagement Scale for Students translated by authors. Using a two-parameter model and Samejima's graded response model, difficulty and discrimination parameters were estimated after confirming the factor structure of the scale. The 14 items on the scale were analyzed with a sample of 3214 university and college students majoring medical science, nursing, or natural science in Japan. The preliminary parameter estimation was conducted with the two parameter model, and indicated that three items should be removed because there were outlier parameters. Final parameter estimation was conducted using the survived 11 items, and indicated that all difficulty and discrimination parameters were acceptable. The test information curve suggested that the scale better assesses higher engagement than average engagement. The estimated parameters provide a basis for future comparative studies. The results also suggested that a 7-point Likert scale is too broad; thus, the scaling should be modified to fewer graded scaling structure.
Arias González, Víctor B; Crespo Sierra, María Teresa; Arias Martínez, Benito; Martínez-Molina, Agustín; Ponce, Fernando P
2015-09-23
The Connor-Davidson Resilience Scale (CD-RISC) is inarguably one of the best-known instruments in the field of resilience assessment. However, the criteria for the psychometric quality of the instrument were based only on classical test theory. The aim of this paper has focused on the calibration of the CD-RISC with a nonclinical sample of 444 adults using the Rasch-Andrich Rating Scale Model, in order to clarify its structure and analyze its psychometric properties at the level of item. Two items showed misfit to the model and were eliminated. The remaining 22 items form basically a unidimensional scale. The CD-RISC has good psychometric properties. The fit of both the items and the persons to the Rasch model was good, and the response categories were functioning properly. Two of the items showed differential item functioning. The CD-RISC has an obvious ceiling effect, which suggests to include more difficult items in future versions of the scale.
A Duplicate Construction Experiment.
ERIC Educational Resources Information Center
Bridgeman, Brent
This experiment was designed to assess the ability of item writers to construct truly parallel tests based on a "duplicate-construction experiment" in which Cronbach argues that if the universe description and sampling are ideally refined, the two independently constructed tests will be entirely equivalent, and that within the limits of item…
Omani Twelfth Grade Students' Most Common Misconceptions in Chemistry
ERIC Educational Resources Information Center
Al-Balushi, Sulaiman M.; Ambusaidi, Abdullah K.; Al-Shuaili, Ali H.; Taylor, Neil
2012-01-01
The current study, undertaken in the Sultanate of Oman, explored twelfth grade students' common misconceptions in seven chemistry conceptual areas. The sample included 786 twelfth grade students in Oman while the instrument was a two-tier test called Chemistry Misconceptions Diagnostic Test (CMDT), consisting of 25 items with 12 items…
Math: Figure and Object Characteristics. Measurement and Geometry. Grades K-9. Revised Edition.
ERIC Educational Resources Information Center
Instructional Objectives Exchange, Los Angeles, CA.
To help classroom teachers construct mathematics tests, thirty-seven general objectives, corresponding sub-objectives, sample test items, and answers are presented. In general, sub-objectives are arranged in increasing order of difficulty. The objectives were written to comprehensively cover two categories: measurement and geometry. Measurement…
1983-06-01
of this repat) U7NCLASSIFIED ISo. OECLASSI PICATION/i DOWNGRADING SCHEDULE IS, OIS? UUTION STATEMENT (fo Sie ftepoe) Approved for public release...26 1. GPETS Initial Outfitt ng (GINO) ..... 26 2. GPETE End Item Replacement (GEIR) * . . 27 D. GINO REQUIREMENTS DETERMINYATION .. . . . 28 E...interval of a sample of 305 GPETE items increased from 8.8 tc 13.6 months. The estimated annual savings resui- ng from this increase was 18.000
Nyitray, Alan G; Harris, Robin B; Abalos, Andrew T; Nielson, Carrie M; Papenfuss, Mary; Giuliano, Anna R
2010-12-01
Accurate knowledge about human sexual behaviors is important for increasing our understanding of human sexuality; however, there have been few studies assessing the reliability of sexual behavior questionnaires designed for community samples of adult men. A test-retest reliability study was conducted on a questionnaire completed by 334 men who had been recruited in Tucson, Arizona. Reliability coefficients and refusal rates were calculated for 39 non-sexual and sexual behavior questionnaire items. Predictors of unreliable reporting for lifetime number of female sexual partners were also assessed. Refusal rates were generally low, with slightly higher refusal rates for questions related to immigration, income, the frequency of sexual intercourse with women, lifetime number of female sexual partners, and the lifetime number of male anal sex partners. Kappa and intraclass correlation coefficients were substantial or almost perfect for all non-sexual and sexual behavior items. Reliability dropped somewhat, but was still substantial, for items that asked about household income and the men's knowledge of their sexual partners' health, including abnormal Pap tests and prior sexually transmitted diseases (STD). Age and lifetime number of female sexual partners were independent predictors of unreliable reporting while years of education was inversely associated with unreliable reporting. These findings among a community sample of adult men are consistent with other test-retest reliability studies with populations of women and adolescents.
Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination
Johnson, Timothy P.; Shariff-Marco, Salma; Willis, Gordon; Cho, Young Ik; Breen, Nancy; Gee, Gilbert C.; Krieger, Nancy; Grant, David; Alegria, Margarita; Mays, Vickie M.; Williams, David R.; Landrine, Hope; Liu, Benmei; Reeve, Bryce B.; Takeuchi, David; Ponce, Ninez A.
2014-01-01
Cross-cultural variability in respondent processing of survey questions may bias results from multiethnic samples. We analyzed behavior codes, which identify difficulties in the interactions of respondents and interviewers, from a discrimination module contained within a field test of the 2007 California Health Interview Survey. In all, 553 (English) telephone interviews yielded 13,999 interactions involving 22 items. Multilevel logistic regression modeling revealed that respondent age and several item characteristics (response format, customized questions, length, and first item with new response format), but not race/ethnicity, were associated with interactional problems. These findings suggest that item function within a multi-cultural, albeit English language, survey may be largely influenced by question features, as opposed to respondent characteristics such as race/ethnicity. PMID:26166949
Ebesutani, Chad; Korathu-Larson, Priya; Nakamura, Brad J; Higa-McMillan, Charmaine; Chorpita, Bruce
2017-09-01
To help facilitate the dissemination and implementation of evidence-based assessment practices, we examined the psychometric properties of the shortened 25-item version of the Revised Child Anxiety and Depression Scale-parent report (RCADS-25-P), which was based on the same items as the previously published shortened 25-item child version. We used two independent samples of youth-a school sample ( N = 967, Grades 3-12) and clinical sample ( N = 433; 6-18 years)-to examine the factor structure, reliability, and validity of the RCADS-25-P scale scores. Results revealed that the two-factor structure (i.e., depression and broad anxiety factor) fit the data well in both the school and clinical sample. All reliability estimates, including test-retest indices, exceeded benchmark for good reliability. In the school sample, the RCADS-25-P scale scores converged significantly with related criterion measures and diverged with nonrelated criterion measures. In the clinical sample, the RCADS-25-P scale scores successfully discriminated between those with and without target problem diagnoses. In both samples, child-parent agreement indices were in the expected ranges. Normative data were also reported. The RCADS-25-P thus demonstrated robust psychometric properties across both a school and clinical sample as an effective brief screening instrument to assess for depression and anxiety in children and adolescents.
Kern, Margaret L.; Hampson, Sarah E.; Goldberg, Lewis R.; Friedman, Howard S.
2013-01-01
The present study used a collaborative framework to integrate two long-term prospective studies: the Terman Life Cycle Study and the Hawaii Personality and Health Longitudinal Study. Using a five-factor personality-trait framework, teacher assessments of child personality were rationally and empirically aligned to establish similar factor structures across samples. Comparable items related to adult self-rated health, education, and alcohol use were harmonized, and data were pooled on harmonized items. A structural model was estimated, allowing paths to differ by sample. Harmonized child personality factors were then used to examine markers of physiological dysfunction in the Hawaii sample and mortality risk in the Terman sample. Harmonized conscientiousness predicted less physiological dysfunction in the Hawaii sample and lower mortality risk in the Terman sample. These results illustrate how collaborative, integrative work with multiple samples offers the exciting possibility that samples from different cohorts and ages can be linked together to directly test lifespan theories of personality and health. PMID:23231689
The Quality of Working Life Questionnaire for Cancer Survivors (QWLQ-CS): a Pre-test Study.
de Jong, Merel; Tamminga, Sietske J; de Boer, Angela G E M; Frings-Dresen, Monique H W
2016-06-02
Returning to and continuing work is important to many cancer survivors, but also represents a challenge. We know little about subjective work outcomes and how cancer survivors perceive being returned to work. Therefore, we developed the Quality of Working Life Questionnaire for Cancer Survivors (QWLQ-CS). Our aim was to pre-test the items of the initial QWLQ-CS on acceptability and comprehensiveness. In addition, item retention was performed by pre-assessing the relevance scores and response distributions of the items in the QWLQ-CS. Semi-structured interviews were conducted after cancer survivors, who had returned to work, filled in the 102 items of the QWLQ-CS. To improve acceptability and comprehensiveness, the semi-structured interview inquired about items that were annoying, difficult, confusing, twofold or redundant. If cancer survivors had difficulty explaining their opinion or emotion about an item, the interviewer used verbal probing technique to investigate the cancer survivor's underlying thoughts. The cancer survivors' comments on the items were analysed, and items were revised accordingly. Decisions on item retention regarding the relevance of items and the response distributions were made by means of pre-set decision rules. The 19 cancer survivors (53 % male) had a mean age of 51 ± 11 years old. They were diagnosed between 2009 and 2013 with lymphoma, leukaemia, prostate cancer, breast cancer, or colon cancer. Acceptability of the QWLQ-CS was good - none of the items were annoying - but 73 items were considered difficult, confusing, twofold or redundant. To improve acceptability, for instance, the authors replaced the phrase 'disease' with 'health situation' in several items. Consequently, comprehensiveness was improved by the authors rephrasing and adjusting items by adding clarifying words, such as 'in the work situation'. The pre-assessment of the relevance scores resulted in a sufficient number of cancer survivors indicating the items as relevant to their quality of working life, and no evident indication for uneven response distributions. Therefore, all items were retained. The 104 items of the preliminary QWLQ-CS were found relevant, acceptable and comprehensible by cancer survivors who have returned to work. The QWLQ-CS is now suitable for larger sample sizes of cancer survivors, which is necessary to test the psychometric properties of this questionnaire.
Rodrigues-Bigaton, Delaine; de Castro, Ester M; Pires, Paulo F
Rasch analysis has been used in recent studies to test the psychometric properties of a questionnaire. The conditions for use of the Rasch model are one-dimensionality (assessed via prior factor analysis) and local independence (the probability of getting a particular item right or wrong should not be conditioned upon success or failure in another). To evaluate the dimensionality and the psychometric properties of the Fonseca anamnestic index (FAI), such as the fit of the data to the model, the degree of difficulty of the items, and the ability to respond in patients with myogenous temporomandibular disorder (TMD). The sample consisted of 94 women with myogenous TMD, diagnosed by the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD), who answered the FAI. For the factor analysis, we applied the Kaiser-Meyer-Olkin test, Bartlett's sphericity, Spearman's correlation, and the determinant of the correlation matrix. For extraction of the factors/dimensions, an eigenvalue >1.0 was used, followed by oblique oblimin rotation. The Rasch analysis was conducted on the dimension that showed the highest proportion of variance explained. Adequate sample "n" and FAI multidimensionality were observed. Dimension 1 (primary) consisted of items 1, 2, 3, 6, and 7. All items of dimension 1 showed adequate fit to the model, being observed according to the degree of difficulty (from most difficult to easiest), respectively, items 2, 1, 3, 6, and 7. The FAI presented multidimensionality with its main dimension consisting of five reliable items with adequate fit to the composition of its structure. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.
Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar
2015-05-03
Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. To develop and validate a new trust in physician scale for a developing country setting. Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. The final 12 item trust in physician scale has a good construct validity and internal consistency. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar
2015-01-01
Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. Objectives To develop and validate a new trust in physician scale for a developing country setting. Methods Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Results Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. Conclusions The final 12 item trust in physician scale has a good construct validity and internal consistency. PMID:25941182
Cross-Cultural Validity of the Ruminative Responses Scale in Argentina and the United States.
Arana, Fernán G; Rice, Kenneth G
2017-09-01
Although frequently used in the United States, the Ruminative Response Scale (RRS) has not been extensively studied in cross-cultural samples. The present study evaluated the factor structure of Treynor et al.'s 10-item version of the RRS in samples from Argentina ( N = 308) and the United States ( N = 371). In addition to testing measurement invariance between the countries, we evaluated whether the maladaptive implications of rumination were weaker for the Argentinians than for the U.S. group. Self-critical perfectionism was the criterion in those tests. Partial scalar invariance supported an 8-item version of the RRS. There were no differences in factor means or factor correlations in RRS dimensions between countries. Brooding and Reflection were positively correlated with self-critical perfectionism in both countries, with no significant differences in the sizes of these relations between the two samples. Results are discussed in terms of psychometric and cross-cultural implications for rumination.
Impact of Design Effects in Large-Scale District and State Assessments
ERIC Educational Resources Information Center
Phillips, Gary W.
2015-01-01
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
The Children's Perceived Locus of Causality Scale for Physical Education
ERIC Educational Resources Information Center
Pannekoek, Linda; Piek, Jan P.; Hagger, Martin S.
2014-01-01
A mixed methods design was applied to evaluate the application of the Perceived Locus of Causality scale (PLOC) to preadolescent samples in physical education settings. Subsequent to minor item adaptations to accommodate the assessment of younger samples, qualitative pilot tests were performed (N = 15). Children's reports indicated the need…
Teresi, Jeanne A.; Ocepek-Welikson, Katja; Kleinman, Marjorie; Ramirez, Mildred; Kim, Giyeon
2017-01-01
Short form measures from the Patient Reported Outcomes Measurement Information System® (PROMIS®) are used widely. The present study was among the first to examine differential item functioning (DIF) in the PROMIS Depression short form scales in a sample of over 5000 racially/ethnically diverse patients with cancer. DIF analyses were conducted across different racial/ethnic, educational, age, gender and language groups. Methods DIF hypotheses, generated by content experts, informed the evaluation of the DIF analyses. The graded item response theory (IRT) model was used to evaluate the five-level ordinal items. The primary tests of DIF were Wald tests; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude was evaluated using expected item score functions, and the non-compensatory differential item functioning (NCDIF) and T1 indexes, both based on group differences in the item curves. Aggregate impact was evaluated with expected scale score (test) response functions; individual impact was assessed through examination of differences in DIF adjusted and unadjusted depression estimates. Results Many items evidenced DIF; however, only a few had slightly elevated magnitude. No items evidenced salient DIF with respect to NCDIF and the scale-level impact was minimal for all group comparisons. The following short form items might be targeted for further study because they were also hypothesized to evidence DIF. One item showed slightly higher magnitude of DIF for age: nothing to look forward to; conditional on depression, this item was more likely to be endorsed in the depressed direction by individuals in older groups as contrasted with the cohort aged 21 to 49. This item was also hypothesized to show age DIF. Only one item (failure) showed DIF of slightly higher magnitude (just above threshold) for Whites vs. Asians/Pacific Islanders in the direction of higher likelihood of endorsement for Asians/Pacific Islanders. This item was also hypothesized to show DIF for minority groups. The impact of DIF was negligible. Conditional on depression, the items, worthless and hopeless were more likely to be endorsed in the depressed direction by respondents with less than high school education vs. those with a graduate degree; the magnitude of DIF was slightly above the T1 threshold, but not that of NCDIF. These items were also hypothesized to show DIF in the direction of more feelings of worthlessness by groups with lower education. While the magnitude and aggregate impact of DIF was small, in a few instances, individual impact was observed. Information provided was relatively high, particularly in the middle upper (depressed) tail of the distribution. Reliability estimates were high (> 0.90) across all studied groups, regardless of estimation method. Conclusions This was the first study to evaluate measurement equivalence of the PROMIS Depression short forms across large samples of ethnically diverse groups. There were few items with DIF, and none of high magnitude, thus supporting the use of PROMIS Depression short form measures across such groups. These results could be informative for those using the short forms in minority populations or clinicians evaluating individuals with the depression short forms. PMID:28553573
Gottvall, Maria; Vaez, Marjan
2017-01-01
A high proportion of refugees have been subjected to potentially traumatic experiences (PTEs), including torture. PTEs, and torture in particular, are powerful predictors of mental ill health. This paper reports the development and preliminary validation of a brief refugee trauma checklist applicable for survey studies. Methods: A pool of 232 items was generated based on pre-existing instruments. Conceptualization, item selection and item refinement was conducted based on existing literature and in collaboration with experts. Ten cognitive interviews using a Think Aloud Protocol (TAP) were performed in a clinical setting, and field testing of the proposed checklist was performed in a total sample of n = 137 asylum seekers from Syria. Results: The proposed refugee trauma history checklist (RTHC) consists of 2 × 8 items, concerning PTEs that occurred before and during the respondents’ flight, respectively. Results show low item non-response and adequate psychometric properties Conclusions: RTHC is a usable tool for providing self-report data on refugee trauma history surveys of community samples. The core set of included events can be augmented and slight modifications can be applied to RTHC for use also in other refugee populations and settings. PMID:28976937
Challet-Bouju, Gaëlle; Perrot, Bastien; Romo, Lucia; Valleur, Marc; Magalon, David; Fatséas, Mélina; Chéreau-Boudet, Isabelle; Luquiens, Amandine; Grall-Bronnec, Marie; Hardouin, Jean-Benoit
2016-01-01
Background and aims The aim of this study was to test the screening properties of several combinations of items from gambling scales, in order to harmonize screening of gambling problems in epidemiological surveys. The objective was to propose two brief screening tools (three items or less) for a use in interviews and self-administered questionnaires. Methods We tested the screening properties of combinations of items from several gambling scales, in a sample of 425 gamblers (301 non-problem gamblers and 124 disordered gamblers). Items tested included interview-based items (Pathological Gambling section of the DSM-IV, lifetime history of problem gambling, monthly expenses in gambling, and abstinence of 1 month or more) and self-report items (South Oaks Gambling Screen, Gambling Attitudes, and Beliefs Survey). The gold standard used was the diagnosis of a gambling disorder according to the DSM-5. Results Two versions of the Rapid Screener for Problem Gambling (RSPG) were developed: the RSPG-Interview (RSPG-I), being composed of two interview items (increasing bets and loss of control), and the RSPG-Self-Assessment (RSPG-SA), being composed of three self-report items (chasing, guiltiness, and perceived inability to stop). Discussion and conclusions We recommend using the RSPG-SA/I for screening problem gambling in epidemiological surveys, with the version adapted for each purpose (RSPG-I for interview-based surveys and RSPG-SA for self-administered surveys). This first triage of potential problem gamblers must be supplemented by further assessment, as it may overestimate the proportion of problem gamblers. However, a first triage has the great advantage of saving time and energy in large-scale screening for problem gambling. PMID:27348558
Development and Validation of the Transgender Attitudes and Beliefs Scale.
Kanamori, Yasuko; Cornelius-White, Jeffrey H D; Pegors, Teresa K; Daniel, Todd; Hulgus, Joseph
2017-07-01
In recent years, issues surrounding transgender have garnered media and legal attention, contributing to rapidly shifting views on gender in the U.S. Yet, there is a paucity of data-driven studies on the public's views of transgender identity. This study reports the development and validation of the Transgender Attitudes and Beliefs Scale (TABS). After constructing an initial 96-item pool from consulting experts and existing scales, Phase 1 of the study was launched, involving an exploratory factor analysis of 48 items. The initial factor analysis with 295 participants revealed three factors across 33 items-16 items on interpersonal comfort, 11 on sex/gender beliefs, and 6 on human value. The internal consistency of each factor was high-α = .97 for Factor 1, α = .95 for Factor 2, and α = .94 for Factor 3. A confirmatory factor analysis was conducted in the second phase with an independent sample consisting of 238 participants. The Attitudes Toward Transgender Individual Scale and the Genderism and Transphobia Scale were also included to test for convergent validity, and the Rosenberg Self-Esteem Scale and the short form of the Marlowe-Crowne Social Desirability Scale were utilized to test discriminant validity. Both of the data collection phases employed MTurk, a form of online sampling with increased diversity compared to college student samples and more generalizability to the general U.S. TABS represents an addition to the literature in its ability to capture a more nuanced conceptualization of transgender attitude not found in previous scales.
Test blueprints for psychiatry residency in-training written examinations in Riyadh, Saudi Arabia
Gaffas, Eisha M; Sequeira, Reginald P; Namla, Riyadh A Al; Al-Harbi, Khalid S
2012-01-01
Background The postgraduate training program in psychiatry in Saudi Arabia, which was established in 1997, is a 4-year residency program. Written exams comprising of multiple choice questions (MCQs) are used as a summative assessment of residents in order to determine their eligibility for promotion from one year to the next. Test blueprints are not used in preparing examinations. Objective To develop test blueprints for the written examinations used in the psychiatry residency program. Methods Based on the guidelines of four professional bodies, documentary analysis was used to develop global and detailed test blueprints for each year of the residency program. An expert panel participated during piloting and final modification of the test blueprints. Their opinion about the content, weightage for each content domain, and proportion of test items to be sampled in each cognitive category as defined by modified Bloom’s taxonomy were elicited. Results Eight global and detailed test blueprints, two for each year of the psychiatry residency program, were developed. The global test blueprints were reviewed by experts and piloted. Six experts participated in the final modification of test blueprints. Based on expert consensus, the content, total weightage for each content domain, and proportion of test items to be included in each cognitive category were determined for each global test blueprint. Experts also suggested progressively decreasing the weightage for recall test items and increasing problem solving test items in examinations, from year 1 to year 4 of the psychiatry residence program. Conclusion A systematic approach using a documentary and content analysis technique was used to develop test blueprints with additional input from an expert panel as appropriate. Test blueprinting is an important step to ensure the test validity in all residency programs. PMID:23762000
Test blueprints for psychiatry residency in-training written examinations in Riyadh, Saudi Arabia.
Gaffas, Eisha M; Sequeira, Reginald P; Namla, Riyadh A Al; Al-Harbi, Khalid S
2012-01-01
The postgraduate training program in psychiatry in Saudi Arabia, which was established in 1997, is a 4-year residency program. Written exams comprising of multiple choice questions (MCQs) are used as a summative assessment of residents in order to determine their eligibility for promotion from one year to the next. Test blueprints are not used in preparing examinations. To develop test blueprints for the written examinations used in the psychiatry residency program. Based on the guidelines of four professional bodies, documentary analysis was used to develop global and detailed test blueprints for each year of the residency program. An expert panel participated during piloting and final modification of the test blueprints. Their opinion about the content, weightage for each content domain, and proportion of test items to be sampled in each cognitive category as defined by modified Bloom's taxonomy were elicited. Eight global and detailed test blueprints, two for each year of the psychiatry residency program, were developed. The global test blueprints were reviewed by experts and piloted. Six experts participated in the final modification of test blueprints. Based on expert consensus, the content, total weightage for each content domain, and proportion of test items to be included in each cognitive category were determined for each global test blueprint. Experts also suggested progressively decreasing the weightage for recall test items and increasing problem solving test items in examinations, from year 1 to year 4 of the psychiatry residence program. A systematic approach using a documentary and content analysis technique was used to develop test blueprints with additional input from an expert panel as appropriate. Test blueprinting is an important step to ensure the test validity in all residency programs.
Construct validity and reliability of the Music Attentiveness Screening Assessment (MASA).
Waldon, Eric G; Broadhurst, Emily
2014-01-01
Music as alternate engagement (MAE) can be used effectively to distract children during painful or anxiety-provoking medical procedures. For such interventions to be successful, it would seem important to assess the degree to which a child can attend to musical stimuli. The purposes of this study were as follows: (a) To establish construct validity by determining the extent to which the Music Attentiveness Screening Assessment (MASA) measures auditory attention; and (b) to gather evidence regarding MASA test-retest and inter-observer reliability. The Auditory Attention (AA) subtest from the NEPSY-II (NEPSY, Second Edition) and the two items from MASA were administered to a nonclinical sample of children (N = 50) aged 5 to 9 years. There was a statistically significant proportion of AA score variance shared with MASA (both items), R (2) = .21, F(2, 47) = 6.34, p = .004. Test-retest reliability on the first MASA item was moderately high (Pearson r = .84) while on the second item it was lower (r = .63). Similarly, interobserver agreement was high for Item I (intraclass correlation coefficient [ICC] = .95) and lower for Item II (ICC = .71). Evidence suggests that MASA measures, at least in part, auditory attention. Despite this finding, a large proportion of unexplained variance remains. Furthermore, reliability estimates (test-retest and interobserver agreement) differ between both items. These findings are discussed with particular attention paid to the ways in which MASA should be revised and further study conducted. © the American Music Therapy Association 2014. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Papenberg, Martin; Musch, Jochen
2017-01-01
In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
A Comparison of the Fit of Empirical Data to Two Latent Trait Models. Report No. 92.
ERIC Educational Resources Information Center
Hutten, Leah R.
Goodness of fit of raw test score data were compared, using two latent trait models: the Rasch model and the Birnbaum three-parameter logistic model. Data were taken from various achievement tests and the Scholastic Aptitude Test (Verbal). A minimum sample size of 1,000 was required, and the minimum test length was 40 items. Results indicated that…
Effects of Test Level Discrimination and Difficulty on Answer-Copying Indices
ERIC Educational Resources Information Center
Sunbul, Onder; Yormaz, Seha
2018-01-01
In this study Type I Error and the power rates of omega (?) and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable…
Carter, Amanda G; Creedy, Debra K; Sidebotham, Mary
2017-11-01
develop and test a tool designed for use by academics to evaluate pre-registration midwifery students' critical thinking skills in reflective writing. a descriptive cohort design was used. a random sample (n = 100) of archived student reflective writings based on a clinical event or experience during 2014 and 2015. a staged model for tool development was used to develop a fifteen item scale involving item generation; mapping of draft items to critical thinking concepts and expert review to test content validity; inter-rater reliability testing; pilot testing of the tool on 100 reflective writings; and psychometric testing. Item scores were analysed for mean, range and standard deviation. Internal reliability, content and construct validity were assessed. expert review of the tool revealed a high content validity index score of 0.98. Using two independent raters to establish inter-rater reliability, good absolute agreement of 72% was achieved with a Kappa coefficient K = 0.43 (p<0.0001). Construct validity via exploratory factor analysis revealed three factors: analyses context, reasoned inquiry, and self-evaluation. The mean total score for the tool was 50.48 (SD = 12.86). Total and subscale scores correlated significantly. The scale achieved good internal reliability with a Cronbach's alpha coefficient of .93. this study establishedthe reliability and validity of the CACTiM (reflection) for use by academics to evaluate midwifery students' critical thinking in reflective writing. Validation with large diverse samples is warranted. reflective practice is a key learning and teaching strategy in undergraduate Bachelor of Midwifery programmes and essential for safe, competent practice. There is the potential to enhance critical thinking development by assessingreflective writing with the CACTiM (reflection) tool to provide formative and summative feedback to students and inform teaching strategies. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Faggion, Clovis Mariano; Giannakopoulos, Nikolaos Nikitas
2012-10-01
Most readers, reviewers, and editors rely on abstracts to decide whether to assess the full text of an article. A research abstract should, therefore, be as informative as possible. The standard of reporting in abstracts of randomized controlled trials (RCTs) in periodontology and implant dentistry has not yet been assessed. The objectives of this review are: 1) to assess the quality of reporting in abstracts of RCTs in periodontology and implant dentistry, and 2) to investigate changes in the quality of reporting by comparing samples from different periods. The authors searched the PubMed electronic database, independently and in duplicate, for abstracts of RCTs published in seven leading journals of periodontology and implant dentistry from 2005 to 2007 and from 2009 to 2011. The quality of reporting in selected abstracts with reference to the CONSORT (Consolidated Standards of Reporting Trials) for Abstracts checklist published in January 2008 was assessed independently and in duplicate. Cohen κ statistic was used to determine the extent of agreement of the reviewers. Pearson χ(2) test and/or Fisher exact test were used to assess differences in reporting in the two samples. Level of significance was set at P <0.05. Three hundred ninety-two abstracts are included in this review. Three items (intervention, objective, and conclusions) were almost fully reported in both samples. In contrast, other items (randomization, trial registration, and funding) were never reported. There were significant changes in reporting for only two items, trial design and title (items better reported in the pre- and post-CONSORT samples, respectively). Most topics, however, were similarly poorly reported in both samples of abstracts. The quality of reporting in abstracts of RCTs in periodontology and implant dentistry can be improved. Authors should follow the CONSORT for Abstracts guidelines, and journal editors should promote clear rules to improve authors' adherence to these guidelines.
Sanjeevi, Namrata; Freeland-Graves, Jeanne; George, Goldy Chacko
2017-12-01
The Supplemental Nutrition Assistance Program (SNAP) plays a critical role in reducing food insecurity by distribution of benefits at a monthly interval to participants. Households that receive assistance from SNAP spend at least three-quarters of benefits within the first 2 weeks of receipt. Because this expenditure pattern may be associated with lower food intake toward the end of the month, it is important to develop a tool that can assess the weekly diets of SNAP participants. The goal of this study was to develop and assess the relative validity and reliability of a semiquantitative 1-week food frequency questionnaire (FFQ) tailored to a population of women participating in SNAP. The FFQ was derived from an existing 195-item FFQ that was based on a reference period of 1 month. This 195-item FFQ has been validated in a population of low-income postpartum women who were recruited from central Texas during 2004. Mean daily servings of each food item in the 195-item FFQ completed by women who took part in the 2004 validation study were calculated to determine the most frequently consumed food items. Emphasis on these items led to the creation of a shorter, 1-week FFQ of only 95 items. This new 1-week instrument was compared with 3-day diet records to evaluate relative validity in a sample of women participating in SNAP. For reliability, the FFQ was administered a second time, separated by a 1-month time interval. The validity study included 70 female SNAP participants who were recruited from the partner agencies of the Central Texas Food Bank from March to June 2015. A subsample of 40 women participated in the reliability study. Outcome measures were mean nutrient intake values obtained from the two tests of the 95-item FFQ and 3-day diet records. Deattenuated Pearson correlation coefficients examined relationships in nutrient intake between the 95-item FFQ and 3-day diet records, and a paired samples t test determined differences in mean nutrient intake. Weighted Cohen's κ indicated agreement in quartile classification of study participants by the 95-item FFQ and 3-day diet records, according to nutrient intake. Test-retest reliability was assessed by intraclass correlations and weighted Cohen's κ. Mean deattenuated Pearson correlation between the FFQ and 3-day diet records was 0.61, and the weighted Cohen's κ=0.39. Finally, the average test-retest correlation and weighted Cohen's κ of the FFQ was 0.66 and 0.50, respectively. These results suggest that the 1-week, 95-item FFQ demonstrated acceptable relative validity and reliability in low-income women participating in SNAP in southwestern United States. Copyright © 2017 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
Perez, Samara; Shapiro, Gilla K; Tatar, Ovidiu; Joyal-Desmarais, Keven; Rosberger, Zeev
2016-10-01
Parents' human papillomavirus (HPV) vaccination decision-making is strongly influenced by their attitudes and beliefs toward vaccination. To date, psychometrically evaluated HPV vaccination attitudes scales have been narrow in their range of measured beliefs and often limited to attitudes surrounding female HPV vaccination. The study aimed to develop a comprehensive, validated and reliable HPV vaccination attitudes and beliefs scale among parents of boys. Data were collected from Canadian parents of 9- to 16-year-old boys using an online questionnaire completed in 2 waves with a 7-month interval. Based on existing vaccination attitudes scales, a set of 61 attitude and belief items were developed. Exploratory and confirmatory factor analyses were conducted. Internal consistency was evaluated with Cronbach's α and stability over time with intraclass correlations. The HPV Attitudes and Beliefs Scale (HABS) was informed by 3117 responses at time 1 and 1427 at time 2. The HABS contains 46 items organized in 9 factors: Benefits (10 items), Threat (3 items), Influence (8 items), Harms (6 items), Risk (3 items), Affordability (3 items), Communication (5 items), Accessibility (4 items), and General Vaccination Attitudes (4 items). Model fit at time 2 were: χ/df = 3.13, standardized root mean square residual = 0.056, root mean square error approximation (confidence interval) = 0.039 (0.037-0.04), comparative fit index = 0.962 and Tucker-Lewis index = 0.957. Cronbach's αs were greater than 0.8 and intraclass correlations of factors were greater than 0.6. The HABS is the first psychometrically-tested scale of HPV attitude and beliefs among parents of boys available for use in English and French. Further testing among parents of girls and young adults and assessing predictive validity are warranted.
Exploring the dimensionality of digit span.
Bowden, Stephen C; Petrauskas, Vilija M; Bardenhagen, Fiona J; Meade, Catherine E; Simpson, Leonie C
2013-04-01
The Digit Span subtest from the Wechsler Scales is used to measure Freedom from Distractibility or Working Memory. Some published research suggests that Digit Span forward should be interpreted differently from Digit Span backward. The present study explored the dimensionality of the Wechsler Memory Scale-III Digit Span (forward and backward) items in a sample of heterogeneous neuroscience patients (n = 267) using confirmatory factor analysis (CFA) for dichotomous items. Results suggested that four correlated factors underlie Digit Span, reflecting easy and hard items in both forward and backward presentation orders. The model for Digit Span was then cross-validated in a seizure disorders sample (n = 223) by replication of the CFA and by examination of measurement invariance. Measurement invariance tests of the precise numerical generalization of trait estimation across groups. Results supported measurement invariance and it was concluded that forward and backward digit span scores should be interpreted as measures of the same cognitive ability.
Confirmatory Factor Analysis of the Minnesota Nicotine Withdrawal Scale
Toll, Benjamin A.; O’Malley, Stephanie S.; McKee, Sherry A.; Salovey, Peter; Krishnan-Sarin, Suchitra
2008-01-01
The authors examined the factor structure of the Minnesota Nicotine Withdrawal Scale (MNWS) using confirmatory factor analysis in clinical research samples of smokers trying to quit (n = 723). Three confirmatory factor analytic models, based on previous research, were tested with each of the 3 study samples at multiple points in time. A unidimensional model including all 8 MNWS items was found to be the best explanation of the data. This model produced fair to good internal consistency estimates. Additionally, these data revealed that craving should be included in the total score of the MNWS. Factor scores derived from this single-factor, 8-item model showed that increases in withdrawal were associated with poor smoking outcome for 2 of the clinical studies. Confirmatory factor analyses of change scores showed that the MNWS symptoms cohere as a syndrome over time. Future investigators should report a total score using all of the items from the MNWS. PMID:17563141
Development and initial validation of the appropriate antibiotic use self-efficacy scale.
Hill, Erin M; Watkins, Kaitlin
2018-06-04
While there are various medication self-efficacy scales that exist, none assess self-efficacy for appropriate antibiotic use. The Appropriate Antibiotic Use Self-Efficacy Scale (AAUSES) was developed, pilot tested, and its psychometric properties were examined. Following pilot testing of the scale, a 28-item questionnaire was examined using a sample (n = 289) recruited through the Amazon Mechanical Turk platform. Participants also completed other scales and items, which were used in assessing discriminant, convergent, and criterion-related validity. Test-retest reliability was also examined. After examining the scale and removing items that did not assess appropriate antibiotic use, an exploratory factor analysis was conducted on 13 items from the original scale. Three factors were retained that explained 65.51% of the variance. The scale and its subscales had adequate internal consistency. The scale had excellent test-retest reliability, as well as demonstrated convergent, discriminant, and criterion-related validity. The AAUSES is a valid and reliable scale that assesses three domains of appropriate antibiotic use self-efficacy. The AAUSES may have utility in clinical and research settings in understanding individuals' beliefs about appropriate antibiotic use and related behavioral correlates. Future research is needed to examine the scale's utility in these settings. Copyright © 2018 Elsevier B.V. All rights reserved.
Gibbons, Laura E; Crane, Paul K; Mehta, Kala M; Pedraza, Otto; Tang, Yuxiao; Manly, Jennifer J; Narasimhalu, Kaavya; Teresi, Jeanne; Jones, Richard N; Mungas, Dan
2011-04-28
Differential item functioning (DIF) occurs when a test item has different statistical properties in subgroups, controlling for the underlying ability measured by the test. DIF assessment is necessary when evaluating measurement bias in tests used across different language groups. However, other factors such as educational attainment can differ across language groups, and DIF due to these other factors may also exist. How to conduct DIF analyses in the presence of multiple, correlated factors remains largely unexplored. This study assessed DIF related to Spanish versus English language in a 44-item object naming test. Data come from a community-based sample of 1,755 Spanish- and English-speaking older adults. We compared simultaneous accounting, a new strategy for handling differences in educational attainment across language groups, with existing methods. Compared to other methods, simultaneously accounting for language- and education-related DIF yielded salient differences in some object naming scores, particularly for Spanish speakers with at least 9 years of education. Accounting for factors that vary across language groups can be important when assessing language DIF. The use of simultaneous accounting will be relevant to other cross-cultural studies in cognition and in other fields, including health-related quality of life.
Gibbons, Laura E.; Crane, Paul K.; Mehta, Kala M.; Pedraza, Otto; Tang, Yuxiao; Manly, Jennifer J.; Narasimhalu, Kaavya; Teresi, Jeanne; Jones, Richard N.; Mungas, Dan
2012-01-01
Differential item functioning (DIF) occurs when a test item has different statistical properties in subgroups, controlling for the underlying ability measured by the test. DIF assessment is necessary when evaluating measurement bias in tests used across different language groups. However, other factors such as educational attainment can differ across language groups, and DIF due to these other factors may also exist. How to conduct DIF analyses in the presence of multiple, correlated factors remains largely unexplored. This study assessed DIF related to Spanish versus English language in a 44-item object naming test. Data come from a community-based sample of 1,755 Spanish- and English-speaking older adults. We compared simultaneous accounting, a new strategy for handling differences in educational attainment across language groups, with existing methods. Compared to other methods, simultaneously accounting for language- and education-related DIF yielded salient differences in some object naming scores, particularly for Spanish speakers with at least 9 years of education. Accounting for factors that vary across language groups can be important when assessing language DIF. The use of simultaneous accounting will be relevant to other cross-cultural studies in cognition and in other fields, including health-related quality of life. PMID:22900138
Gray, Kerryn; Crowle, Damian; Scott, Pam
2014-09-01
A significant number of evidence items submitted to Forensic Science Service Tasmania (FSST) are blood swabs or bloodstained items. Samples from these items routinely undergo phenol:chloroform:isoamyl alcohol organic extraction and quantitative Polymerase Chain Reaction (qPCR) testing prior to PowerPlex(®) 21 amplification. This multi-step process has significant cost and timeframe implications in a fiscal climate of tightening government budgets, pressure towards improved operating efficiencies, and an increasing emphasis on rapid techniques better supporting intelligence-led policing. Direct amplification of blood and buccal cells on cloth and Whatman FTA™ card with PowerPlex(®) 21 has already been successfully implemented for reference samples, eliminating the requirement for sample pre-treatment. Scope for expanding this method to include less pristine casework blood swabs and samples from bloodstained items was explored in an endeavour to eliminate lengthy DNA extraction, purification and qPCR steps for a wider subset of samples. Blood was deposited onto a range of substrates including those historically found to inhibit STR amplification. Samples were collected with micro-punch, micro-swab, or both. The potential for further fiscal savings via reduced volume amplifications was assessed by amplifying all samples at full and reduced volume (25 and 13μL). Overall success rate data showed 80% of samples yielded a complete profile at reduced volume, compared to 78% at full volume. Particularly high success rates were observed for the blood on fabric/textile category with 100% of micro-punch samples yielding complete profiles at reduced volume and 85% at full volume. Following the success of this trial, direct amplification of suitable casework blood samples has been implemented at reduced volume. Significant benefits have been experienced, most noticeably where results from crucial items have been provided to police investigators prior to interview of suspects, and a coronial identification has been successfully completed in a short timeframe to avoid delay in the release of human remains to family members. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Selecting Items for Criterion-Referenced Tests.
ERIC Educational Resources Information Center
Mellenbergh, Gideon J.; van der Linden, Wim J.
1982-01-01
Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Wu, Yang; Zuo, Bin; Wen, Fangfang; Yan, Lei
2017-01-01
Using confirmatory factor analyses, this study examined the method effects on a Chinese version of the Rosenberg Self-Esteem Scale (RSES; Rosenberg, 1965 ) in a sample of migrant and urban children in China. In all, 982 children completed the RSES, and 9 models and 9 corresponding variants were specified and tested. The results indicated that the method effects are associated with both positively and negatively worded items and that Item 8 should be treated as a positively worded item. Additionally, the method effects models were invariant across migrant and urban children in China.
Parameter Estimation with Small Sample Size: A Higher-Order IRT Model Approach
ERIC Educational Resources Information Center
de la Torre, Jimmy; Hong, Yuan
2010-01-01
Sample size ranks as one of the most important factors that affect the item calibration task. However, due to practical concerns (e.g., item exposure) items are typically calibrated with much smaller samples than what is desired. To address the need for a more flexible framework that can be used in small sample item calibration, this article…
Development and psychometric properties of the Inner Strength Scale.
Lundman, Berit; Viglund, Kerstin; Aléx, Lena; Jonsén, Elisabeth; Norberg, Astrid; Fischer, Regina Santamäki; Strandberg, Gunilla; Nygren, Björn
2011-10-01
Four dimensions of inner strength were previously identified in a meta-theoretical analysis: firmness, creativity, connectedness, and flexibility. The aim of this study was to develop an Inner Strength Scale (ISS) based on those four dimensions and to evaluate its psychometric properties. An initial version of ISS was distributed for validation purpose with the Rosenberg Self-Esteem Scale, the resilience scale, and the sense of Coherence Scale. A convenience sample of 391 adults, aged 19-90 years participated. Principal component analysis (PCA) and confirmatory factor analysis (CFA) were used in the process of exploring, evaluating, and reducing the 63-item ISS to the 20-item ISS. Cronbach's alpha and test-retest were used to measure reliability. CFA showed satisfactory goodness-of-fit for the 20-item ISS. The analysis supported a fourfactor solution explaining 51% of the variance. Cronbach's alpha on the 20-item ISS was 0.86, and the test-retest showed stability over time (r=0.79). The ISS was found to be a valid and reliable instrument for capturing a multifaceted understanding of inner strength. Further tests of psychometric properties of the ISS will be performed in forthcoming studies. Copyright © 2011 Elsevier Ltd. All rights reserved.
Validity and reliability of a scale to measure genital body image.
Zielinski, Ruth E; Kane-Low, Lisa; Miller, Janis M; Sampselle, Carolyn
2012-01-01
Women's body image dissatisfaction extends to body parts usually hidden from view--their genitals. Ability to measure genital body image is limited by lack of valid and reliable questionnaires. We subjected a previously developed questionnaire, the Genital Self Image Scale (GSIS) to psychometric testing using a variety of methods. Five experts determined the content validity of the scale. Then using four participant groups, factor analysis was performed to determine construct validity and to identify factors. Further construct validity was established using the contrasting groups approach. Internal consistency and test-retest reliability was determined. Twenty one of 29 items were considered content valid. Two items were added based on expert suggestions. Factor analysis was undertaken resulting in four factors, identified as Genital Confidence, Appeal, Function, and Comfort. The revised scale (GSIS-20) included 20 items explaining 59.4% of the variance. Women indicating an interest in genital cosmetic surgery exhibited significantly lower scores on the GSIS-20 than those who did not. The final 20 item scale exhibited internal reliability across all sample groups as well as test-retest reliability. The GSIS-20 provides a measure of genital body image demonstrating reliability and validity across several populations of women.
Tuliao, Antover P; Landoy, Bernice Vania N; McChargue, Dennis E
2016-01-01
The Alcohol Use Disorder Identification Test's factor structure varies depending on population and culture. Because of this inconsistency, this article examined the factor structure of the test and conducted a factorial invariance test between a U.S. and a Philippines college sample. Confirmatory factor analyses indicated that a three-factor solution outperforms the one- and two-factor solution in both samples. Factorial invariance analyses further supports the confirmatory findings by showing that factor loadings were generally invariant across groups; however, item intercepts show non-invariance. Country differences between factors show that Filipino consumption factor mean scores were significantly lower than their U.S. counterparts.
Development and validation of a new tool to measure Iranian pregnant women's empowerment.
Borghei, N S; Taghipour, A; Roudsari, R Latifnejad; Keramat, A
2016-03-15
Empowering pregnant women improves their health and reduces maternal mortality, but there is a lack of suitable tools to measure women's empowerment in some cultures. This study aimed to design and validate a questionnaire for measuring the dimensions of empowerment among Iranian pregnant women. After a literature review, and face and content validity testing, a 38-item questionnaire was developed and tested on a sample of 161 pregnant women. Factor analysis grouped the items into 3 subscales: educational empowerment (e.g. prenatal training), autonomy (e.g. financial independency and mental ability) and sociopolitical empowerment (e.g. involvement in social and political activities). Criterion validity testing showed a strong positive correlation of the total scale and subscales scores with the Kameda and the Spritzer empowerment scales. Cronbach alpha was 0.92 for total empowerment. A total of 32 items remained in the Self-Structured Pregnancy Empowerment Questionnaire, which is a valid new tool to measure the dimensions of pregnant women's empowerment.
Ashur, S T; Shamsuddin, K; Shah, S A; Bosseri, S; Morisky, D E
2015-12-13
No validation study has previously been made for the Arabic version of the 8-item Morisky Medication Adherence Scale (MMAS-8(©)) as a measure for medication adherence in diabetes. This study in 2013 tested the reliability and validity of the Arabic MMAS-8 for type 2 diabetes mellitus patients attending a referral centre in Tripoli, Libya. A convenience sample of 103 patients self-completed the questionnaire. Reliability was tested using Cronbach alpha, average inter-item correlation and Spearman-Brown coefficient. Known-group validity was tested by comparing MMAS-8 scores of patients grouped by glycaemic control. The Arabic version showed adequate internal consistency (α = 0.70) and moderate split-half reliability (r = 0.65). Known-group validity was supported as a significant association was found between medication adherence and glycaemic control, with a moderate effect size (ϕc = 0.34). The Arabic version displayed good psychometric properties and could support diabetes research and practice in Arab countries.
ERIC Educational Resources Information Center
Eakman, Aaron M.; Carlson, Mike E.; Clark, Florence A.
2010-01-01
The Meaningful Activity Participation Assessment (MAPA), a recently developed 28-item tool designed to measure the meaningfulness of activity, was tested in a sample of 154 older adults. The MAPA evidenced a sufficient level of internal consistency and test-retest reliability and correlated as theoretically predicted with the Life Satisfaction…
Validation of the Asthma Control Test questionnaire in a North African population.
El Hasnaoui, Abdelkader; Martin, Jennifer; Salhi, Hocine; Doble, Adam
2009-12-01
Patient-reported outcome measures are required to measure asthma control. The Asthma Control Test (ACT) is one such measure which was used in the AIRMAG study, a general population study of asthma in the Maghreb. Three dialectal Arabic versions of the ACT (Algerian, Moroccan and Tunisian) were developed. To perform a psychometric evaluation of the properties of dialectal Arabic versions of the ACT used in the AIRMAG study. The test data came from 624 adult subjects in a random general population sample in Algeria, Morocco and Tunisia. The internal consistency of the ACT was analysed using Cronbach's a coefficient. The factorial structure was explored by primary component analysis with varimax rotation. Test-retest reproducibility was assessed in a subgroup of 61 subjects. Face and discriminant validity were assessed. Cronbach's a coefficient ranged from 0.58 for the Algerian version to 0.67 for the Moroccan version. The 'use-of-rescue-treatment' item was identified as discordant, since its removal resulted in an increase in Cronbach's a coefficient. The discordance of this item was confirmed by primary component analysis, where the four remaining items were aligned along a single dimension, and the 'use-of-rescue-treatment' item offset along a second dimension. Test and retest scores were well correlated (r =0.704). The ACT showed good face and discriminant validity. The ACT is a valid measure of asthma control in a North African context, although its internal consistency is compromised by the 'use-of-rescue-treatment' item, probably due to limited access to care and use of short-acting beta-agonists. (c) 2009 Elsevier Ltd. All rights reserved.
Piredda, Michela; Ghezzi, Valerio; Fenizia, Elisa; Marchetti, Anna; Petitti, Tommasangelo; De Marinis, Maria Grazia; Sili, Alessandro
2017-12-01
To develop and psychometrically test the Italian-language Nurse Caring Behaviours Scale, a short measure of nurse caring behaviour as perceived by inpatients. Patient perceptions of nurses' caring behaviours are a predictor of care quality. Caring behaviours are culture-specific, but no measure of patient perceptions has previously been developed in Italy. Moreover, existing tools show unclear psychometric properties, are burdensome for respondents, or are not widely applicable. Instrument development and psychometric testing. Item generation included identifying and adapting items from existing measures of caring behaviours as perceived by patients. A pool of 28 items was evaluated for face validity. Content validity indexes were calculated for the resulting 15-item scale; acceptability and clarity were pilot tested with 50 patients. To assess construct validity, a sample of 2,001 consecutive adult patients admitted to a hospital in 2014 completed the scale and was split into two groups. Reliability was evaluated using nonlinear structural equation modelling coefficients. Measurement invariance was tested across subsamples. Item 15 loaded poorly in the exploratory factor analysis (n = 983) and was excluded from the final solution, positing a single latent variable with 14 indicators. This model fitted the data moderately. The confirmatory factor analysis (n = 1018) returned similar results. Internal consistency was excellent in both subsamples. Full scalar invariance was reached, and no significant latent mean differences were detected across subsamples. The new instrument shows reasonable psychometric properties and is a promising short and widely applicable measure of inpatient perceptions of nurse caring behaviours. © 2017 John Wiley & Sons Ltd.
2016-01-01
We aimed to validate the Inventory of Complicated Grief (ICG)-Korean version among 1,138 Korean adolescents, representing a response rate of 57% of 1,997 students. Participants completed a set of questionnaires including demographic variables (age, sex, years of education, experience of grief), the ICG, the Children's Depression Inventory (CDI) and the Lifetime Incidence of Traumatic Events-Child (LITE-C). Exploratory factor analysis was performed to determine whether the ICG items indicated complicated grief in Korean adolescents. The internal consistency of the ICG-Korean version was Cronbach's α=0.87. The test-retest reliability for a randomly selected sample of 314 participants in 2 weeks was r=0.75 (P<0.001). Concurrent validity was assessed using a correlation between the ICG total scores and the CDI total scores (r=0.75, P<0.001). The criterion-related validity based on the comparison of ICG total scores between adolescents without complicated grief (1.2±3.7) and adolescent with complicated grief (3.2±6.6) groups was relatively high (t=5.71, P<0.001). The data acquired from the 1,138 students was acceptable for a factor analysis (Kaiser-Meyer-Olkin Measure of Sampling Adequacy=0.911; Bartlett's Test of Sphericity, χ2=13,144.7, P<0.001). After omission of 3 items, the value of Cronbach's α increased from 0.87 for the 19-item ICG-Korean version to 0.93 for the 16-item ICG-Korean version. These results suggest that the ICG is a useful tool in assessing for complicated grief in Korean adolescents. However, the 16-item version of the ICG appeared to be more valid compared to the 19-item version of the ICG. We suggest that the 16-item version of the ICG be used to screen for complicated grief in Korean adolescents. PMID:26770046
Han, Doug Hyun; Lee, Jung Jae; Moon, Duk-Soo; Cha, Myoung-Jin; Kim, Min A; Min, Seonyeong; Yang, Ji Hoon; Lee, Eun Jeong; Yoo, Seo Koo; Chung, Un-Sun
2016-01-01
We aimed to validate the Inventory of Complicated Grief (ICG)-Korean version among 1,138 Korean adolescents, representing a response rate of 57% of 1,997 students. Participants completed a set of questionnaires including demographic variables (age, sex, years of education, experience of grief), the ICG, the Children's Depression Inventory (CDI) and the Lifetime Incidence of Traumatic Events-Child (LITE-C). Exploratory factor analysis was performed to determine whether the ICG items indicated complicated grief in Korean adolescents. The internal consistency of the ICG-Korean version was Cronbach's α=0.87. The test-retest reliability for a randomly selected sample of 314 participants in 2 weeks was r=0.75 (P<0.001). Concurrent validity was assessed using a correlation between the ICG total scores and the CDI total scores (r=0.75, P<0.001). The criterion-related validity based on the comparison of ICG total scores between adolescents without complicated grief (1.2 ± 3.7) and adolescent with complicated grief (3.2 ± 6.6) groups was relatively high (t=5.71, P<0.001). The data acquired from the 1,138 students was acceptable for a factor analysis (Kaiser-Meyer-Olkin Measure of Sampling Adequacy=0.911; Bartlett's Test of Sphericity, χ(2)=13,144.7, P<0.001). After omission of 3 items, the value of Cronbach's α increased from 0.87 for the 19-item ICG-Korean version to 0.93 for the 16-item ICG-Korean version. These results suggest that the ICG is a useful tool in assessing for complicated grief in Korean adolescents. However, the 16-item version of the ICG appeared to be more valid compared to the 19-item version of the ICG. We suggest that the 16-item version of the ICG be used to screen for complicated grief in Korean adolescents.
Retest of a Principal Components Analysis of Two Household Environmental Risk Instruments.
Oneal, Gail A; Postma, Julie; Odom-Maryon, Tamara; Butterfield, Patricia
2016-08-01
Household Risk Perception (HRP) and Self-Efficacy in Environmental Risk Reduction (SEERR) instruments were developed for a public health nurse-delivered intervention designed to reduce home-based, environmental health risks among rural, low-income families. The purpose of this study was to test both instruments in a second low-income population that differed geographically and economically from the original sample. Participants (N = 199) were recruited from the Women, Infants, and Children (WIC) program. Paper and pencil surveys were collected at WIC sites by research-trained student nurses. Exploratory principal components analysis (PCA) was conducted, and comparisons were made to the original PCA for the purpose of data reduction. Instruments showed satisfactory Cronbach alpha values for all components. HRP components were reduced from five to four, which explained 70% of variance. The components were labeled sensed risks, unseen risks, severity of risks, and knowledge. In contrast to the original testing, environmental tobacco smoke (ETS) items was not a separate component of the HRP. The SEERR analysis demonstrated four components explaining 71% of variance, with similar patterns of items as in the first study, including a component on ETS, but some differences in item location. Although low-income populations constituted both samples, differences in demographics and risk exposures may have played a role in component and item locations. Findings provided justification for changing or reducing items, and for tailoring the instruments to population-level risks and behaviors. Although analytic refinement will continue, both instruments advance the measurement of environmental health risk perception and self-efficacy. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Ang, Rebecca P; Chong, Wan Har; Huan, Vivien S; Yeo, Lay See
2007-01-01
This article reports the development and initial validation of scores obtained from the Adolescent Concerns Measure (ACM), a scale which assesses concerns of Asian adolescent students. In Study 1, findings from exploratory factor analysis using 619 adolescents suggested a 24-item scale with four correlated factors--Family Concerns (9 items), Peer Concerns (5 items), Personal Concerns (6 items), and School Concerns (4 items). Initial estimates of convergent validity for ACM scores were also reported. The four-factor structure of ACM scores derived from Study 1 was confirmed via confirmatory factor analysis in Study 2 using a two-fold cross-validation procedure with a separate sample of 811 adolescents. Support was found for both the multidimensional and hierarchical models of adolescent concerns using the ACM. Internal consistency and test-retest reliability estimates were adequate for research purposes. ACM scores show promise as a reliable and potentially valid measure of Asian adolescents' concerns.
Basilio, Camille D.; Knight, George P.; O'Donnell, Megan; Roosa, Mark W.; Gonzales, Nancy A.; Umaña-Taylor, Adriana J.; Torres, Marisela
2014-01-01
Empirical research on biculturalism is limited, in part because of the lack of quality measures of biculturalism. The currently available measures have limitations due to scoring procedures and sampling of only a narrow range of behaviors and attitudes. We present a measure of biculturalism that captures a broader range of the bicultural experience and uses a scoring system that better represents the wide ranging levels of biculturalism that exist in the diverse population of Mexican American adolescents, mothers, and fathers born either in Mexico or the United States. The Mexican American Biculturalism Scale (MABS; 27 items) includes 3 subscales: bicultural comfort (9 items), bicultural facility (9 items), and bicultural advantages (9 items). We report on the reliability and construct validity of test scores, and confirmatory factor analyses findings for a diverse sample of 316 Mexican American families from a large southwestern metropolitan city. The MABS is available both in English and Spanish (see Appendix). The use of the scale has implications for future research studying how biculturalism is related to psychological outcomes for Mexicans/Mexican Americans. PMID:24548151
Self-Stigma of Mental Illness Scale – Short Form: Reliability and Validity
Corrigan, Patrick W.; Michaels, Patrick J.; Vega, Eduardo; Gause, Michael; Watson, Amy C.; Rüsch, Nicolas
2012-01-01
The internalization of public stigma by persons with serious mental illnesses may lead to self-stigma, which harms self-esteem, self-efficacy, and empowerment. Previous research has evaluated a hierarchical model that distinguishes among stereotype awareness, agreement, application to self, and harm to self with the 40-item Self-Stigma of Mental Illness Scale (SSMIS). This study addressed SSMIS critiques (too long, contains offensive items that discourages test completion) by strategically omitting half of the original scale’s items. Here we report reliability and validity of the 20-item short form (SSMIS-SF) based on data from three previous studies. Retained items were rated less offensive by a sample of consumers. Results indicated adequate internal consistencies for each subscale. Repeated measures ANOVAs showed subscale means progressively diminished from awareness to harm. In support of its validity, the harm subscale was found to be inversely and significantly related to self-esteem, self-efficacy, empowerment, and hope. After controlling for level of depression, these relationships remained significant with the exception of the relation between empowerment and harm SSMIS-SF subscale. Future research with the SSMIS-SF should evaluate its sensitivity to change and its stability through test-rest reliability. PMID:22578819
NASA Astrophysics Data System (ADS)
Liu, Xiufeng; McKeough, Anne
2005-05-01
The aim of this study was to develop a model of students' energy concept development. Applying Case's (1985, 1992) structural theory of cognitive development, we hypothesized that students' concept of energy undergoes a series of transitions, corresponding to systematic increases in working memory capacity. The US national sample from the Third International Mathematics and Science Study (TIMSS) database was used to test our hypothesis. Items relevant to the energy concept in the TIMSS test booklets for three populations were identified. Item difficulty from Rasch modeling was used to test the hypothesized developmental sequence, and percentage of students' correct responses was used to test the correspondence between students' age/grade level and level of the energy concepts. The analysis supported our hypothesized sequence of energy concept development and suggested mixed effects of maturation and schooling on energy concept development. Further, the results suggest that curriculum and instruction design take into consideration the developmental progression of students' concept of energy.
The purpose of this SOP is to describe the steps to be taken when materials, such as field sampling materials, are shipped from Battelle. A transmittal form accompanies every shipment of such materials or other test articles, substances, paper data, or any other item directly re...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Newcomb, M.D.
1989-10-01
Studies of reactions and attitudes toward nuclear war have progressed from the use of anecdotal evidence to multi-item psychological measures. Additional psychometric data and substantive results of the Nuclear Attitudes Questionnaire (NAQ; Newcomb, 1986) are reported here. Data from three independent samples of students from the United States collected in 1984, 1986, and 1987 were compared and contrasted. The 1986 data were obtained immediately following the Chernobyl nuclear power plant accident. Test-retest reliability of the NAQ items and subscales was quite high and comparable among samples and established the across-time stability of the measure. There were several secular trends acrossmore » years on items and subscales, indicating some increased concern about nuclear power (particularly in 1986), but also a general increase in nuclear concerns, fears, and anxiety. Anticipated sex differences were found on many of the NAQ items and subscales. Correlations between the NAQ subscales and the nine SCL-90-R scales (Derogatis, 1977) were consistent for the 1986 and 1987 samples. In latent variable analyses, a general factor of Emotional Distress was significantly correlated with a general factor of Nuclear Anxiety, as well as specifically with nuclear concern and fear for the future.« less
Michelessi, Manuele; Lucenteforte, Ersilia; Miele, Alba; Oddone, Francesco; Crescioli, Giada; Fameli, Valeria; Korevaar, Daniël A; Virgili, Gianni
2017-01-01
Research has shown a modest adherence of diagnostic test accuracy (DTA) studies in glaucoma to the Standards for Reporting of Diagnostic Accuracy Studies (STARD). We have applied the updated 30-item STARD 2015 checklist to a set of studies included in a Cochrane DTA systematic review of imaging tools for diagnosing manifest glaucoma. Three pairs of reviewers, including one senior reviewer who assessed all studies, independently checked the adherence of each study to STARD 2015. Adherence was analyzed on an individual-item basis. Logistic regression was used to evaluate the effect of publication year and impact factor on adherence. We included 106 DTA studies, published between 2003-2014 in journals with a median impact factor of 2.6. Overall adherence was 54.1% for 3,286 individual rating across 31 items, with a mean of 16.8 (SD: 3.1; range 8-23) items per study. Large variability in adherence to reporting standards was detected across individual STARD 2015 items, ranging from 0 to 100%. Nine items (1: identification as diagnostic accuracy study in title/abstract; 6: eligibility criteria; 10: index test (a) and reference standard (b) definition; 12: cut-off definitions for index test (a) and reference standard (b); 14: estimation of diagnostic accuracy measures; 21a: severity spectrum of diseased; 23: cross-tabulation of the index and reference standard results) were adequately reported in more than 90% of the studies. Conversely, 10 items (3: scientific and clinical background of the index test; 11: rationale for the reference standard; 13b: blinding of index test results; 17: analyses of variability; 18; sample size calculation; 19: study flow diagram; 20: baseline characteristics of participants; 28: registration number and registry; 29: availability of study protocol; 30: sources of funding) were adequately reported in less than 30% of the studies. Only four items showed a statistically significant improvement over time: missing data (16), baseline characteristics of participants (20), estimates of diagnostic accuracy (24) and sources of funding (30). Adherence to STARD 2015 among DTA studies in glaucoma research is incomplete, and only modestly increasing over time.
Okochi, Jiro; Utsunomiya, Sakiko; Takahashi, Tai
2005-01-01
Background The International Classification of Functioning, Disability and Health (ICF) was published by the World Health Organization (WHO) to standardize descriptions of health and disability. Little is known about the reliability and clinical relevance of measurements using the ICF and its qualifiers. This study examines the test-retest reliability of ICF codes, and the rate of immeasurability in long-term care settings of the elderly to evaluate the clinical applicability of the ICF and its qualifiers, and the ICF checklist. Methods Reliability of 85 body function (BF) items and 152 activity and participation (AP) items of the ICF was studied using a test-retest procedure with a sample of 742 elderly persons from 59 institutional and at home care service centers. Test-retest reliability was estimated using the weighted kappa statistic. The clinical relevance of the ICF was estimated by calculating immeasurability rate. The effect of the measurement settings and evaluators' experience was analyzed by stratification of these variables. The properties of each item were evaluated using both the kappa statistic and immeasurability rate to assess the clinical applicability of WHO's ICF checklist in the elderly care setting. Results The median of the weighted kappa statistics of 85 BF and 152 AP items were 0.46 and 0.55 respectively. The reproducibility statistics improved when the measurements were performed by experienced evaluators. Some chapters such as genitourinary and reproductive functions in the BF domain and major life area in the AP domain contained more items with lower test-retest reliability measures and rated as immeasurable than in the other chapters. Some items in the ICF checklist were rated as unreliable and immeasurable. Conclusion The reliability of the ICF codes when measured with the current ICF qualifiers is relatively low. The result in increase in reliability according to evaluators' experience suggests proper education will have positive effects to raise the reliability. The ICF checklist contains some items that are difficult to be applied in the geriatric care settings. The improvements should be achieved by selecting the most relevant items for each measurement and by developing appropriate qualifiers for each code according to the interest of the users. PMID:16050960
Components of a Measure to Describe Organizational Culture in Academic Pharmacy.
Desselle, Shane; Rosenthal, Meagen; Holmes, Erin R; Andrews, Brienna; Lui, Julia; Raja, Leela
2017-12-01
Objective. To develop a measure of organizational culture in academic pharmacy and identify characteristics of an academic pharmacy program that would be impactful for internal (eg, students, employees) and external (eg, preceptors, practitioners) clients of the program. Methods. A three-round Delphi procedure of 24 panelists from pharmacy schools in the U.S. and Canada generated items based on the Organizational Culture Profile (OCP), which were then evaluated and refined for inclusion in subsequent rounds. Items were assessed for appropriateness and impact. Results. The panel produced 35 items across six domains that measured organizational culture in academic pharmacy: competitiveness, performance orientation, social responsibility, innovation, emphasis on collegial support, and stability. Conclusion. The items generated require testing for validation and reliability in a large sample to finalize this measure of organizational culture.
Capacity and precision in an animal model of visual short-term memory.
Lara, Antonio H; Wallis, Jonathan D
2012-03-14
Temporary storage of information in visual short-term memory (VSTM) is a key component of many complex cognitive abilities. However, it is highly limited in capacity. Understanding the neurophysiological nature of this capacity limit will require a valid animal model of VSTM. We used a multiple-item color change detection task to measure macaque monkeys' VSTM capacity. Subjects' performance deteriorated and reaction times increased as a function of the number of items in memory. Additionally, we measured the precision of the memory representations by varying the distance between sample and test colors. In trials with similar sample and test colors, subjects made more errors compared to trials with highly discriminable colors. We modeled the error distribution as a Gaussian function and used this to estimate the precision of VSTM representations. We found that as the number of items in memory increases the precision of the representations decreases dramatically. Additionally, we found that focusing attention on one of the objects increases the precision with which that object is stored and degrades the precision of the remaining. These results are in line with recent findings in human psychophysics and provide a solid foundation for understanding the neurophysiological nature of the capacity limit of VSTM.
Caporali, Priscila Faissola; Caporali, Sueli Aparecida; Bucuvic, Érika Cristina; Vieira, Sheila de Souza; Santos, Zeila Maria; Chiari, Brasília Maria
2016-01-01
Translation and cross-cultural adaptation of the instrument Hearing Implant Sound Quality Index (HISQUI19), and characterization of the target population and auditory performance in Cochlear Implant (CI) users through the application of a synthesis version of this tool. Evaluations of conceptual, item, semantic and operational equivalences were performed. The synthesis version was applied as a pre-test to 33 individuals, whose final results characterized the final sample and performance of the questionnaire. The results were analyzed statistically. The final translation (FT) was back-translated and compared with the original version, revealing a minimum difference between items. The changes observed between the FT and the synthesis version were characterized by the application of simplified vocabulary used on a daily basis. For the pre-test, the average score of the interviewees was 90.2, and a high level of reliability was achieved (0.83). The translation and cross-cultural adaptation of the HISQUI19 questionnaire showed suitability for conceptual, item, semantic and operational equivalences. For the sample characterization, the sound quality was classified as good with better performance for the categories of location and distinction of sound/voices.
Validation to Portuguese of the Scale of Student Satisfaction and Self-Confidence in Learning.
Almeida, Rodrigo Guimarães dos Santos; Mazzo, Alessandra; Martins, José Carlos Amado; Baptista, Rui Carlos Negrão; Girão, Fernanda Berchelli; Mendes, Isabel Amélia Costa
2015-01-01
Translate and validate to Portuguese the Scale of Student Satisfaction and Self-Confidence in Learning. Methodological translation and validation study of a research tool. After following all steps of the translation process, for the validation process, the event III Workshop Brazil - Portugal: Care Delivery to Critical Patients was created, promoted by one Brazilian and another Portuguese teaching institution. 103 nurses participated. As to the validity and reliability of the scale, the correlation pattern between the variables, the sampling adequacy test (Kaiser-Meyer-Olkin) and the sphericity test (Bartlett) showed good results. In the exploratory factorial analysis (Varimax), item 9 behaved better in factor 1 (Satisfaction) than in factor 2 (Self-confidence in learning). The internal consistency (Cronbach's alpha) showed coefficients of 0.86 in factor 1 with six items and 0.77 for factor 2 with 07 items. In Portuguese this tool was called: Escala de Satisfação de Estudantes e Autoconfiança na Aprendizagem. The results found good psychometric properties and a good potential use. The sampling size and specificity are limitations of this study, but future studies will contribute to consolidate the validity of the scale and strengthen its potential use.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis.
Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén
2017-01-01
Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument.
Children's Sleep Comic: development of a new diagnostic tool for children with sleep disorders.
Schwerdtle, Barbara; Kanis, Julia; Kahl, Lena; Kübler, Andrea; Schlarb, Angelika A
2012-01-01
A solid diagnosis of sleep disorders in children should include both self-ratings and parent ratings. However, there are few standardized self-assessment instruments to meet this need. The Children's Sleep Comic is an adapted version of the unpublished German questionnaire "Freiburger Kinderschlafcomic" and provides pictures for items and responses. Because the drawings were outdated and allowed only for qualitative analysis, we revised the comic, tested its applicability in a target sample, and suggest a procedure for quantitative analysis. All items were updated and pictures were newly drawn. We used a sample of 201 children aged 5-10 years to test the applicability of the Children's Sleep Comic in young children and to run a preliminary analysis. The Children's Sleep Comic comprises 37 items covering relevant aspects of sleep disorders in children. Application took on average 30 minutes. The procedure was well accepted by the children, as reflected by the absence of any dropouts. First comparisons with established questionnaires indicated moderate correlations. The Children's Sleep Comic is appropriate for screening sleep behavior and sleep problems in children. The interactive procedure can foster a good relationship between the investigator and the child, and thus establish the basis for successful intervention if necessary.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis
Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén
2017-01-01
Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument. PMID:28298898
Medeiros, Lydia C; Hillers, Virginia N; Chen, Gang; Bergmann, Verna; Kendall, Patricia; Schroeder, Mary
2004-11-01
The objective of this study was to design and develop food safety knowledge and attitude scales based on food-handling guidelines developed by a national panel of food safety experts. Knowledge (n=43) and attitude (n=49) questions were developed and pilot-tested with a variety of consumer groups. Final questions were selected based on item analysis and on validity and reliability statistical tests. Knowledge questions were tested in Washington State with participants in low-income nutrition education programs (pretest/posttest n=58, test/retest n=19) and college students (pretest/posttest n=34). Attitude questions were tested in Ohio with nutrition education program participants (n=30) and college students (non-nutrition majors n=138, nutrition majors n=57). Item analysis, paired sample t tests, Pearson's correlation coefficients, and Cronbach's alpha were used. Reliability and validity tests of individual items and the question sets were used to reduce the scales to 18 knowledge questions and 10 attitude questions. The knowledge and attitude scales covered topics ranked as important by a national panel of experts and met most validity and reliability standards. The 18-item knowledge questionnaire had instructional sensitivity (mean score increase of more than three points after instruction), internal reliability (Cronbach's alpha >.75), and produced similar results in test-retest without intervention (coefficient of stability=.81). Knowledge of correct procedures for hand washing and avoiding cross-contamination was widespread before instruction. Knowledge was limited regarding avoiding food preparation while ill, cooking hamburgers, high-risk foods, and whether cooked rice and potatoes could be stored at room temperature. The 10-item attitude scale had an appropriate range of responses (item difficulty) and produced similar results in test-retest ( P =.01). Internal consistency ranged from alpha=.63 to .89. Students anticipating a career where food safety is valued had higher attitude scale scores than participants of extension education programs. Uses for the knowledge questionnaire include assessment of subject matter knowledge before instruction and knowledge gain after instruction. The attitude scale assesses an outcome variable that may predict food safety behavior.
Luttenberger, Katharina; Reppermund, Simone; Schmiedeberg-Sohn, Anke; Book, Stephanie; Graessel, Elmar
2016-05-26
There are currently no valid, fast, and easy-to-administer performance tests that are designed to assess the capacities to perform activities of daily living in persons with mild dementia and mild cognitive impairment (MCI). However, such measures are urgently needed for determining individual support needs as well as the efficacy of interventions. The aim of the present study was therefore to validate the Erlangen Test of Activities of Daily Living in Persons with Mild Dementia and Mild Cognitive Impairment (ETAM), a performance test that is based on the International Classification of Functioning and Health (ICF), which assesses the relevant domains of living in older adults with MCI and mild dementia who live independently. The 10 ICF-based items on the research version of the ETAM were tested in a final sample of 81 persons with MCI or mild dementia. The items were selected for the final version in accordance with 6 criteria: 1) all domains must be represented and have equal weight, 2) all items must load on the same factor, 3) item difficulties and item discriminatory powers, 4) convergent validity (Bayer Activities of Daily Living Scale [B-ADL]) and discriminant validity (Mini Mental State Examination [MMSE], Geriatric Depression Scale 15 [GDS-15]), 5) inter-rater reliabilities of the individual items, 6) as little material as possible. Retest reliability was also examined. Cohen's ds were calculated to determine the magnitudes of the differences in ETAM scores between participants diagnosed with different grades of severity of cognitive impairment. The final version of the ETAM consists of 6 items that cover the five ICF domains communication, mobility, self-care, domestic life (assessed by two 3-point items), and major life areas (specifically, the economic life sub-category) and load on a single factor. The maximum achievable score is 30 points (6 points per domain). The average administration time was 35 min, 19 of which were needed for pure item performance. The internal consistency was α = .71. The three-week test-retest reliability was r = .78, and the inter-rater reliability was r = .97. The ETAM also provided satisfactory discrimination between healthy individuals and persons with MCI or mild dementia as well as between persons with mild and moderate dementia. The 6-item final version of the ETAM shows satisfactory psychometric characteristics and can be administered quickly. It is therefore suitable for use in both clinical practice and research.
A replication of a factor analysis of motivations for trapping
Schroeder, Susan; Fulton, David C.
2015-01-01
Using a 2013 sample of Minnesota trappers, we employed confirmatory factor analysis to replicate an exploratory factor analysis of trapping motivations conducted by Daigle, Muth, Zwick, and Glass (1998). We employed the same 25 items used by Daigle et al. and tested the same five-factor structure using a recent sample of Minnesota trappers. We also compared motivations in our sample to those reported by Daigle et el.
2011-01-01
Background Knowledge in natural sciences generally predicts study performance in the first two years of the medical curriculum. In order to reduce delay and dropout in the preclinical years, Hamburg Medical School decided to develop a natural science test (HAM-Nat) for student selection. In the present study, two different approaches to scale construction are presented: a unidimensional scale and a scale composed of three subject specific dimensions. Their psychometric properties and relations to academic success are compared. Methods 334 first year medical students of the 2006 cohort responded to 52 multiple choice items from biology, physics, and chemistry. For the construction of scales we generated two random subsamples, one for development and one for validation. In the development sample, unidimensional item sets were extracted from the item pool by means of weighted least squares (WLS) factor analysis, and subsequently fitted to the Rasch model. In the validation sample, the scales were subjected to confirmatory factor analysis and, again, Rasch modelling. The outcome measure was academic success after two years. Results Although the correlational structure within the item set is weak, a unidimensional scale could be fitted to the Rasch model. However, psychometric properties of this scale deteriorated in the validation sample. A model with three highly correlated subject specific factors performed better. All summary scales predicted academic success with an odds ratio of about 2.0. Prediction was independent of high school grades and there was a slight tendency for prediction to be better in females than in males. Conclusions A model separating biology, physics, and chemistry into different Rasch scales seems to be more suitable for item bank development than a unidimensional model, even when these scales are highly correlated and enter into a global score. When such a combination scale is used to select the upper quartile of applicants, the proportion of successful completion of the curriculum after two years is expected to rise substantially. PMID:21999767
Hissbach, Johanna C; Klusmann, Dietrich; Hampe, Wolfgang
2011-10-14
Knowledge in natural sciences generally predicts study performance in the first two years of the medical curriculum. In order to reduce delay and dropout in the preclinical years, Hamburg Medical School decided to develop a natural science test (HAM-Nat) for student selection. In the present study, two different approaches to scale construction are presented: a unidimensional scale and a scale composed of three subject specific dimensions. Their psychometric properties and relations to academic success are compared. 334 first year medical students of the 2006 cohort responded to 52 multiple choice items from biology, physics, and chemistry. For the construction of scales we generated two random subsamples, one for development and one for validation. In the development sample, unidimensional item sets were extracted from the item pool by means of weighted least squares (WLS) factor analysis, and subsequently fitted to the Rasch model. In the validation sample, the scales were subjected to confirmatory factor analysis and, again, Rasch modelling. The outcome measure was academic success after two years. Although the correlational structure within the item set is weak, a unidimensional scale could be fitted to the Rasch model. However, psychometric properties of this scale deteriorated in the validation sample. A model with three highly correlated subject specific factors performed better. All summary scales predicted academic success with an odds ratio of about 2.0. Prediction was independent of high school grades and there was a slight tendency for prediction to be better in females than in males. A model separating biology, physics, and chemistry into different Rasch scales seems to be more suitable for item bank development than a unidimensional model, even when these scales are highly correlated and enter into a global score. When such a combination scale is used to select the upper quartile of applicants, the proportion of successful completion of the curriculum after two years is expected to rise substantially.
Vleeschouwer, Marloes; Schubart, Chris D.; Henquet, Cecile; Myin-Germeys, Inez; van Gastel, Willemijn A.; Hillegers, Manon H. J.; van Os, Jim J.; Boks, Marco P. M.; Derks, Eske M.
2014-01-01
Background The psychometric properties of an online test are not necessarily identical to its paper and pencil original. The aim of this study is to test whether the factor structure of the Community Assessment of Psychic Experiences (CAPE) is measurement invariant with respect to online vs. paper and pencil assessment. Method The factor structure of CAPE items assessed by paper and pencil (N = 796) was compared with the factor structure of CAPE items assessed by the Internet (N = 21,590) using formal tests for Measurement Invariance (MI). The effect size was calculated by estimating the Signed Item Difference in the Sample (SIDS) index and the Signed Test Difference in the Sample (STDS) for a hypothetical subject who scores 2 standard deviations above average on the latent dimensions. Results The more restricted Metric Invariance model showed a significantly worse fit compared to the less restricted Configural Invariance model (χ2(23) = 152.75, p<0.001). However, the SIDS indices appear to be small, with an average of −0.11. A STDS of −4.80 indicates that Internet sample members who score 2 standard deviations above average would be expected to score 4.80 points lower on the CAPE total scale (ranging from 42 to 114 points) than would members of the Paper sample with the same latent trait score. Conclusions Our findings did not support measurement invariance with respect to assessment method. Because of the small effect sizes, the measurement differences between the online assessed CAPE and its paper and pencil original can be neglected without major consequences for research purposes. However, a person with a high vulnerability for psychotic symptoms would score 4.80 points lower on the total scale if the CAPE is assessed online compared to paper and pencil assessment. Therefore, for clinical purposes, one should be cautious with online assessment of the CAPE. PMID:24465389
Bakken, Suzanne; Cimino, James J.; Haskell, Robert; Kukafka, Rita; Matsumoto, Cindi; Chan, Garrett K.; Huff, Stanley M.
2000-01-01
Objective: The purpose of this study was to test the adequacy of the Clinical LOINC (Logical Observation Identifiers, Names, and Codes) semantic structure as a terminology model for standardized assessment measures. Methods: After extension of the definitions, 1,096 items from 35 standardized assessment instruments were dissected into the elements of the Clinical LOINC semantic structure. An additional coder dissected at least one randomly selected item from each instrument. When multiple scale types occurred in a single instrument, a second coder dissected one randomly selected item representative of each scale type. Results: The results support the adequacy of the Clinical LOINC semantic structure as a terminology model for standardized assessments. Using the revised definitions, the coders were able to dissect into the elements of Clinical LOINC all the standardized assessment items in the sample instruments. Percentage agreement for each element was as follows: component, 100 percent; property, 87.8 percent; timing, 82.9 percent; system/sample, 100 percent; scale, 92.6 percent; and method, 97.6 percent. Discussion: This evaluation was an initial step toward the representation of standardized assessment items in a manner that facilitates data sharing and re-use. Further clarification of the definitions, especially those related to time and property, is required to improve inter-rater reliability and to harmonize the representations with similar items already in LOINC. PMID:11062226
Marfeo, Elizabeth E; Ni, Pengsheng; McDonough, Christine; Peterik, Kara; Marino, Molly; Meterko, Mark; Rasch, Elizabeth K; Chan, Leighton; Brandt, Diane; Jette, Alan M
2018-03-01
Purpose To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function. Methods Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB. Results Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties. Conclusions IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.
Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S
2013-12-01
To develop a novel age-appropriate measure of functional vision (FV) for self-reporting by visually impaired (VI) children and young people. Questionnaire development. A representative patient sample of VI children and young people aged 10 to 15 years, visual acuity of the logarithm of the minimum angle of resolution (logMAR) worse than 0.48, and a school-based (nonrandom) expert group sample of VI students aged 12 to 17 years. A total of 32 qualitative semistructured interviews supplemented by narrative feedback from 15 eligible VI children and young people were used to generate draft instrument items. Seventeen VI students were consulted individually on item relevance and comprehensibility, instrument instructions, format, and administration methods. The resulting draft instrument was piloted with 101 VI children and young people comprising a nationally representative sample, drawn from 21 hospitals in the United Kingdom. Initial item reduction was informed by presence of missing data and individual item response pattern. Exploratory factor analysis (FA) and parallel analysis (PA), and Rasch analysis (RA) were applied to test the instrument's psychometric properties. Psychometric indices and validity assessment of the Functional Vision Questionnaire for Children and Young People (FVQ_CYP). A total of 712 qualitative statements became a 56-item draft scale, capturing the level of difficulty in performing vision-dependent activities. After piloting, items were removed iteratively as follows: 11 for high percentage of missing data, 4 for skewness, and 1 for inadequate item infit and outfit values in RA, 3 having shown differential item functioning across age groups and 1 across gender in RA. The remaining 36 items showed item fit values within acceptable limits, good measurement precision and targeting, and ordered response categories. The reduced scale has a clear unidimensional structure, with all items having a high factor loading on the single factor in FA and PA. The summary scores correlated significantly with visual acuity. We have developed a novel, psychometrically robust self-report questionnaire for children and young people-the FVQ_CYP-that captures the functional impact of visual disability from their perspective. The 36-item, 4-point unidimensional scale has potential as a complementary adjunct to objective clinical assessments in routine pediatric ophthalmology practice and in research. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Screening for Moral Injury: The Moral Injury Symptom Scale - Military Version Short Form.
Koenig, Harold G; Ames, Donna; Youssef, Nagy A; Oliver, John P; Volk, Fred; Teng, Ellen J; Haynes, Kerry; Erickson, Zachary D; Arnold, Irina; O'Garo, Keisha; Pearce, Michelle
2018-03-26
To develop a short form (SF) of the 45-item multidimensional Moral Injury Symptom Scale - Military Version (MISS-M) to use when screening for moral injury and monitoring treatment response in veterans and active duty military with PTSD. A total of 427 veterans and active duty military with PTSD symptoms were recruited from VA Medical Centers in Augusta, GA; Los Angeles, CA; Durham, NC; Houston, TX; and San Antonio, TX; and from Liberty University, Lynchburg, Virginia. The sample was randomly split in two. In the first half (n = 214), exploratory factor analysis identified the highest loading item on each of the 10 MISS scales (guilt, shame, moral concerns, loss of meaning, difficulty forgiving, loss of trust, self-condemnation, religious struggle, and loss of religious faith) to form the 10-item MISS-M-SF; confirmatory factor analysis was then performed to replicate results in the second half of the sample (n = 213). Internal reliability, test-retest reliability, and convergent, discriminant, and concurrent validity were examined in the overall sample. The study was approved by the institutional review boards and the Research & Development (R&D) Committees at Veterans Administration medical centers in Durham, Los Angeles, Augusta, Houston, and San Antonio, and the Liberty University and Duke University Medical Center institutional review boards. The 10-item MISS-M-SF had a median of 50 and a range of 12-91 (possible range 10-100). Over 70% scored a 9 or 10 (highest possible) on at least one item. Cronbach's alpha was 0.73 (95% CI 0.69-0.76), and test-retest reliability was 0.87 (95% CI 0.79-0.92). Convergent validity with the 45-item MISS-M was r = 0.92. Discriminant validity was demonstrated by relatively weak correlations with social, religious, and physical health constructs (r = 0.21-0.35), and concurrent validity was indicated by strong correlations with PTSD, depression, and anxiety symptoms (r = 0.54-0.58). The MISS-M-SF is a reliable and valid measure of MI symptoms that can be used to screen for MI and monitor response to treatment in veterans and active duty military with PTSD.
Three approaches to investigating the multidimensional nature of a science assessment
NASA Astrophysics Data System (ADS)
Gokiert, Rebecca Jayne
The purpose of this study was to investigate a multi-method approach for collecting validity evidence about the underlying knowledge and skills measured by a large-scale science assessment. The three approaches included analysis of dimensionality, differential item functioning (DIF), and think-aloud interviews. The specific research questions addressed were: (1) Does the 4-factor model previously found by Hamilton et al. (1995) for the grade 8 sample explain the data? (2) Do the performances of male and female students systematically differ? Are these performance differences captured in the dimensions? (3) Can think-aloud reports aid in the generation of hypotheses about the underlying knowledge and skills that are measured by this test? A confirmatory factor analysis of the 4-factor model revealed good model data fit for both the AB and AC tests. Twenty-four of the 83 AB test items and 16 of the 77 AC test items displayed significant DIF, however, items were found, on average, to favour both males and females equally. There were some systematic differences found across the 4-factors; items favouring males tended to be related to earth and space sciences, stereotypical male related activities, and numerical operations. Conversely, females were found to outperform males on items that required careful reading and attention to detail. Concurrent and retrospective verbal reports (Ericsson & Simon, 1993) were collected from 16 grade 8 students (9 male and 7 female) while they solved 12 DIF items. Four general cognitive processing themes were identified from the student protocols that could be used to explain male and female problem solving. The themes included comprehension (verbal and visual), visualization, background knowledge/experience (school or life), and strategy use. There were systematic differences in cognitive processing between the students that answered the items correctly and the students who answered the items incorrectly; however, this did not always correspond with the statistical gender DIF results. Although the multifaceted approach produced interpretable and meaningful validity evidence about the knowledge and skills, these forms of validity evidence only begin to provide a basic understanding of the underlying construct(s) that are being measured.
Johnson, Catherine; Burke, Christine; Brinkman, Sally; Wade, Tracey
2017-03-01
Mindfulness-based interventions show consistent benefits in adults for a range of pathologies, but exploration of these approaches in youth is an emergent field, with limited measures of mindfulness for this population. This study aimed to investigate whether multifactor scales of mindfulness can be used in adolescents. A series of studies are presented assessing the performance of a recently developed adult measure, the Comprehensive Inventory of Mindfulness Experiences (CHIME) in 4 early adolescent samples. Study 1 was an investigation of how well the full adult measure (37 items) was understood by youth (N = 292). Study 2 piloted a revision of items in child friendly language with a small group (N = 48). The refined questionnaire for adolescents (CHIME-A) was then tested in Study 3 in a larger sample (N = 461) and subjected to exploratory factor analysis and a range of external validity measures. Study 4 was a confirmatory factor analysis in a new sample (N = 498) with additional external validity measures. Study 5 tested temporal stability (N = 120). Results supported an 8-factor 25-item measure of mindfulness in adolescents, with excellent model fit indices and sound internal consistency for the 8 subscales. Although the CFA supported an overarching factor, internal reliability of a combined total score was poor. The development of a multifactor measure represents a first step toward testing developmental models of mindfulness in young people. This in turn will aid construction of evidence based interventions that are not simply downward derivations of adult mindfulness programs. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Development of the Systems Thinking Scale for Adolescent Behavior Change.
Moore, Shirley M; Komton, Vilailert; Adegbite-Adeniyi, Clara; Dolansky, Mary A; Hardin, Heather K; Borawski, Elaine A
2018-03-01
This report describes the development and psychometric testing of the Systems Thinking Scale for Adolescent Behavior Change (STS-AB). Following item development, initial assessments of understandability and stability of the STS-AB were conducted in a sample of nine adolescents enrolled in a weight management program. Exploratory factor analysis of the 16-item STS-AB and internal consistency assessments were then done with 359 adolescents enrolled in a weight management program. Test-retest reliability of the STS-AB was .71, p = .03; internal consistency reliability was .87. Factor analysis of the 16-item STS-AB indicated a one-factor solution with good factor loadings, ranging from .40 to .67. Evidence of construct validity was supported by significant correlations with established measures of variables associated with health behavior change. We provide beginning evidence of the reliability and validity of the STS-AB to measure systems thinking for health behavior change in young adolescents.
Zuverza-Chavarria, Virginia; Tsanadis, John
2011-05-01
The goal of this study was to explore the psychometric properties of the CLOX Executive Clock Drawing Task (Royall, Cordes, & Polk, 1998) in persons who had sustained a stroke and were receiving inpatient rehabilitation. Rasch modeling was utilized to examine the psychometric properties of the CLOX. Separate analyses were conducted for the free draw (CLOX 1) and copy (CLOX 2) portions of the measure to investigate each presentation mode independently. The sample consisted of 66 inpatient adults who had sustained a stroke. CLOX 1 met most Rasch model expectations for item fit, unidimensionality, test reliability, and sample targeting. CLOX 2 was less psychometrically sound and contained two items with significant misfit. CLOX 2 demonstrated a significant ceiling effect that resulted in poor sample targeting. CLOX 1 is a psychometrically sound screening instrument for assessing persons with stroke receiving inpatient rehabilitation. In addition to the psychometric weaknesses of CLOX 2, its interpretive yield is minimal and clinicians may consider omitting it. Recommendations are made for using the Rasch item-person maps in clinical practice.
Reliability of the Ego-Grasping Scale.
Lester, David
2012-04-01
Research using Knoblauch and Falconer's Ego-Grasping Scale is reviewed. Using a sample of 695 undergraduate students, the scale had moderate reliability (Cronbach alpha, odd-even numbered items, and test-retest), but a principal-components analysis with a varimax rotation identified five components, indicating heterogeneity in the content of the items. Lower Ego-Grasping scores appear to be associated with better psychological health. The scale has been translated and used with Korean, Kuwaiti, and Turkish students, indicating that the scale can be useful in cross-cultural studies.
Development of flame-resistant structures for use in the Apollo and Skylab programs
NASA Technical Reports Server (NTRS)
Coskren, R. J.
1973-01-01
Flame-resistant materials have been designed and fabricated to meet certain end-use criteria established by NASA with emphasis on meeting established flammability standards. The program had three general phases: (1) fabrication of candidate sample structures for evaluation by the Structures and Mechanics Division and/or NASA contractors; (2) physical testing of the structures developed; and (3) supply of required quantities of specific items for fabrication into prototype and/or flight items for the Apollo and Skylab programs.
ERIC Educational Resources Information Center
Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.
2006-01-01
In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
ERIC Educational Resources Information Center
Frisby, Craig L.; Wang, Ze
2016-01-01
Data from the standardization sample of the Woodcock-Johnson Psychoeducational Battery--Third Edition (WJ III) Cognitive standard battery and Test Session Observation Checklist items were analyzed to understand the relationship between g (general mental ability) and test session behavior (TSB; n = 5,769). Latent variable modeling methods were used…
ERIC Educational Resources Information Center
O'Keeffe, Lisa; O'Halloran, Kay L.; Wignell, Peter; Tan, Sabine
2017-01-01
In 2015, the Australian Council for Educational Research (ACER) was tasked with developing literacy and numeracy skills testing for pre-service teachers. All undergraduate and postgraduate trainee teachers are now required to pass these literacy and numeracy tests at some stage on their journey to becoming a teacher; for commencing students from…
Preliminary psychometric testing of the Fox Simple Quality-of-Life Scale.
Fox, Sherry
2004-06-01
Although quality of life is extensively defined as subjective and multidimensional with both affective and cognitive components, few instruments capture important dimensions of the construct, and few are both conceptually congruent and user friendly for the clinical setting. The aim of this study was to develop and test a measure that would be easy to use clinically and capture both cognitive and affective components of quality of life. Initial item sources for the Fox Simple Quality-of-Life Scale (FSQOLS) were literature-based. Thirty items were compiled for content validity assessment by a panel of expert healthcare clinicians from various disciplines, predominantly nursing. Five items were removed as a result of the review because they reflected negatively worded or redundant items. The 25-item scale was mailed to 177 people with lung, colon, and ovarian cancer in various stages. Cancer types were selected theoretically, based on similarity in prognosis, degree of symptom burden, and possible meaning and experience. Of the 145 participants, all provided complete data on the FSQOLS. Psychometric evaluation of the FSQOLS included item-total correlations, principal components analysis with varimax rotation revealing two factors explaining 50% variance, reliability estimation using alpha estimates, and item-factor correlations. The FSQOLS exhibited significant convergent validity with four popular quality-of-life instruments: the Ferrans and Powers Quality of Life Index, the Functional Assessment of Cancer Therapy Scale, the Short-Form-36 Health Survey, and the General Well-Being Scale. Content validity of the scale was explored and supported using qualitative interviews of 14 participants with lung, colon and ovarian cancer, who were a subgroup of the sample for the initial instrument testing.
Rodríguez, Daniela C; Hoe, Connie; Dale, Elina M; Rahman, M Hafizur; Akhter, Sadika; Hafeez, Assad; Irava, Wayne; Rajbangshi, Preety; Roman, Tamlyn; Ţîrdea, Marcela; Yamout, Rouham; Peters, David H
2017-08-01
The capacity to demand and use research is critical for governments if they are to develop policies that are informed by evidence. Existing tools designed to assess how government officials use evidence in decision-making have significant limitations for low- and middle-income countries (LMICs); they are rarely tested in LMICs and focus only on individual capacity. This paper introduces an instrument that was developed to assess Ministry of Health (MoH) capacity to demand and use research evidence for decision-making, which was tested for reliability and validity in eight LMICs (Bangladesh, Fiji, India, Lebanon, Moldova, Pakistan, South Africa, Zambia). Instrument development was based on a new conceptual framework that addresses individual, organisational and systems capacities, and items were drawn from existing instruments and a literature review. After initial item development and pre-testing to address face validity and item phrasing, the instrument was reduced to 54 items for further validation and item reduction. In-country study teams interviewed a systematic sample of 203 MoH officials. Exploratory factor analysis was used in addition to standard reliability and validity measures to further assess the items. Thirty items divided between two factors representing organisational and individual capacity constructs were identified. South Africa and Zambia demonstrated the highest level of organisational capacity to use research, whereas Pakistan and Bangladesh were the lowest two. In contrast, individual capacity was highest in Pakistan, followed by South Africa, whereas Bangladesh and Lebanon were the lowest. The framework and related instrument represent a new opportunity for MoHs to identify ways to understand and improve capacities to incorporate research evidence in decision-making, as well as to provide a basis for tracking change.
Development and testing of the Multidimensional Trust in Health Care Systems Scale.
Egede, Leonard E; Ellis, Charles
2008-06-01
To describe the development and psychometric testing of the Multidimensional Trust in Health Care Systems Scale (MTHCSS). Scale development occurred in 2 phases. In phase 1, a pilot instrument with 70 items was generated from the review of the trust literature, focus groups, and expert opinion. The 70 items were pilot tested in a sample of 256 students. Exploratory factor analysis was used to derive an orthogonal set of correlated factors. In phase 2, the final scale was administered to 301 primary care patients to assess reliability and validity. Phase 2 participants also completed validated measures of patient-centered care, health locus of control, medication nonadherence, social support, and patient satisfaction. In phase 1, a 17-item scale (MTHCSS) was developed with 10 items measuring trust in health care providers, 4 items measuring trust in health care payers, and 3 items measuring trust in health care institutions. In phase 2, the 17-item MTHCSS had a mean score of 63.0 (SD 8.8); the provider subscale had a mean of 40.0 (SD 6.2); the payers subscale had a mean of 12.8 (SD 3.0); and the institutions subscale had a mean of 10.3 (SD 2.1). Cronbach's alpha for the MTHCSS was 0.89 and 0.92, 0.74, and 0.64 for the 3 subscales. The MTHCSS was significantly correlated with patient-centered care (r = .22 to .62), locus of control-chance (r = .42), medication nonadherence (r = -.22), social support (r = .25), and patient satisfaction (r = .67). The MTHCSS is a valid and reliable instrument for measuring the 3 objects of trust in health care and is correlated with patient-level health outcomes.
ERIC Educational Resources Information Center
Matlock, Ki Lynn; Turner, Ronna
2016-01-01
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Oyeyemi, Adewale L; Sallis, James F; Oyeyemi, Adetoyeje Y; Amin, Mariam M; De Bourdeaudhuij, Ilse; Deforche, Benedicte
2013-11-01
This study adapted the Physical Activity Neighborhood Environment Scale (PANES) to the Nigerian context and assessed the test-retest reliability and construct validity of the Nigerian version (PANESN). A multidisciplinary panel of experts adapted the original PANES to reflect the built and social environment of Nigeria. The adapted PANES was subjected to cognitive testing and test retest reliability in a diverse sample of Nigerian adults (N = 132) from different neighborhood types. Intraclass Correlation Coefficients (ICC) was used to assess test-retest reliability, and construct validity was investigated with Analysis of Covariance for differences in environmental attributes between neighborhoods. Four of the 17 items on the original PANES were significantly modified, 3 were removed and 2 new items were incorporated into the final version of adapted PANES-N. Test-retest reliability was substantial to almost perfect (ICC = 0.62-1.00) for all items on the PANES-N, and residents of neighborhoods in the inner city reported higher residential density, land use mix and safety, but lower pedestrian facilities and aesthetics than did residents of government reserved area/new layout neighborhoods. The PANES-N appears promising for assessing environmental perceptions related to physical activity in Nigeria, but further testing is required to assess its applicability across Africa.
Menon, Chloe; Westervelt, Holly James; Jahn, Danielle R.; Dressel, Jeffrey A.; O’Bryant, Sid E.
2013-01-01
The Brief Smell Identification Test (BSIT) is a commonly used measure of olfactory functioning in elderly populations. Few studies have provided normative data for this measure, and minimal data are available regarding the impact of sociodemographic factors on test scores. This study presents normative data for the BSIT in a sample of English- and Spanish-speaking Hispanic and non-Hispanic Whites. A Rasch analysis was also conducted to identify the items that best discriminated between varying levels of olfactory functioning, as measured by the BSIT. The total sample included 302 older adults seen as part of an ongoing study of rural cognitive aging, Project FRONTIER. Hierarchical regression analyses revealed that BSIT scores require adjustment by age and gender, but years of education, ethnicity, and language did not significantly influence BSIT performance. Four items best discriminated between varying levels of smell identification, accounting for 59.44% of total information provided by the measure. However, items did not represent a continuum of difficulty on the BSIT. The results of this study indicate that the BSIT appears to be well-suited for assessing odor identification deficits in older adults of diverse backgrounds, but that fine-tuning of this instrument may be recommended in light of its items’ difficulty and discrimination parameters. Clinical and empirical implications are discussed. PMID:23634698
ERIC Educational Resources Information Center
Spaan, Mary
2007-01-01
This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
ERIC Educational Resources Information Center
Hewitt, Margaret A.; Homan, Susan P.
2004-01-01
Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Cole, Jason C; Ito, Diane; Chen, Yaozhu J; Cheng, Rebecca; Bolognese, Jennifer; Li-McLeod, Josephine
2014-09-04
There is a lack of validated instruments to measure the level of burden of Alzheimer's disease (AD) on caregivers. The Impact of Alzheimer's Disease on Caregiver Questionnaire (IADCQ) is a 12-item instrument with a seven-day recall period that measures AD caregiver's burden across emotional, physical, social, financial, sleep, and time aspects. Primary objectives of this study were to evaluate psychometric properties of IADCQ administered on the Web and to determine most appropriate scoring algorithm. A national sample of 200 unpaid AD caregivers participated in this study by completing the Web-based version of IADCQ and Short Form-12 Health Survey Version 2 (SF-12v2™). The SF-12v2 was used to measure convergent validity of IADCQ scores and to provide an understanding of the overall health-related quality of life of sampled AD caregivers. The IADCQ survey was also completed four weeks later by a randomly selected subgroup of 50 participants to assess test-retest reliability. Confirmatory factor analysis (CFA) was implemented to test the dimensionality of the IADCQ items. Classical item-level and scale-level psychometric analyses were conducted to estimate psychometric characteristics of the instrument. Test-retest reliability was performed to evaluate the instrument's stability and consistency over time. Virtually none (2%) of the respondents had either floor or ceiling effects, indicating the IADCQ covers an ideal range of burden. A single-factor model obtained appropriate goodness of fit and provided evidence that a simple sum score of the 12 items of IADCQ can be used to measure AD caregiver's burden. Scales-level reliability was supported with a coefficient alpha of 0.93 and an intra-class correlation coefficient (for test-retest reliability) of 0.68 (95% CI: 0.50-0.80). Low-moderate negative correlations were observed between the IADCQ and scales of the SF-12v2. The study findings suggest the IADCQ has appropriate psychometric characteristics as a unidimensional, Web-based measure of AD caregiver burden and is supported by strong model fit statistics from CFA, high degree of item-level reliability, good internal consistency, moderate test-retest reliability, and moderate convergent validity. Additional validation of the IADCQ is warranted to ensure invariance between the paper-based and Web-based administration and to determine an appropriate responder definition.
Jamieson, Jennifer A; Gougeon, Laura
2017-12-01
We investigated the price difference between gluten-free (GF) and gluten-containing (GC) foods available in rural Maritime stores. GF foods and comparable GC items were sampled through random visits to 21 grocery stores in nonurban areas of Nova Scotia, New Brunswick, and Prince Edward Island, Canada. Wilcoxon rank tests were conducted on price per 100 g of product, and on the price relative to iron content; 2226 GF foods (27.2% staple items, defined as breads, cereals, flours, and pastas) and 1625 GC foods were sampled, with an average ± SD of 66 ± 2.7 GF items per store in rural areas and 331 ± 12 in towns. The median price of GF items ($1.76/100 g) was more expensive than GC counterparts ($1.05/100 g) and iron density was approximately 50% less. GF staple foods were priced 5% higher in rural stores than in town stores. Although the variety of GF products available to consumers has improved, higher cost and lower nutrient density remain issues in nonurban Maritime regions. Dietitians working in nonurban areas should consider the relative high price, difficult access, and low iron density of key GF items, and work together with clients to find alternatives and enhance their food literacy.
Brazilian Adaptation of the Woodcock-Johnson III Cognitive Tests
ERIC Educational Resources Information Center
Wechsler, Solange Muglia; Nunes, Carlos Sancineto; Schelini, Patricia Waltz; Pasian, Sonia Regina; Homsi, Silvia Vertoni; Moretti, Lucia; Anache, Alexandra Ayach
2010-01-01
An adaptation of the standard battery of Woodcock-Johnson III Tests of Cognitive Abilities (WJ-III) for Brazilian children and youth was investigated. The sample was composed of 1094 students (54 percent girls), ages 7-17, living in Sao Paulo state (91 percent). Items from Brazilian school books as well as from the WJ-III Spanish version…
ERIC Educational Resources Information Center
Instructional Objectives Exchange, Los Angeles, CA.
To help classroom teachers in grades K-9 construct mathematics tests, fifteen general objectives, corresponding sub-objectives, sample test items, and answers are presented. In general, sub-objectives are arranged in increasing order of difficulty. The objectives were written to comprehensively cover three categories. The first, graphs, covers the…
Thomas, Michael L
2012-03-01
There is growing evidence that psychiatric disorders maintain hierarchical associations where general and domain-specific factors play prominent roles (see D. Watson, 2005). Standard, unidimensional measurement models can fail to capture the meaningful nuances of such complex latent variable structures. The present study examined the ability of the multidimensional item response theory bifactor model (see R. D. Gibbons & D. R. Hedeker, 1992) to improve construct validity by serving as a bridge between measurement and clinical theories. Archival data consisting of 688 outpatients' psychiatric diagnoses and item-level responses to the Brief Symptom Inventory (BSI; L. R. Derogatis, 1993) were extracted from files at a university mental health clinic. The bifactor model demonstrated superior fit for the internal structure of the BSI and improved overall diagnostic accuracy in the sample (73%) compared with unidimensional (61%) and oblique simple structure (65%) models. Consistent with clinical theory, multiple sources of item variance were drawn from individual test items. Test developers and clinical researchers are encouraged to consider model-based measurement in the assessment of psychiatric distress.
A sampling and classification item selection approach with content balancing.
Chen, Pei-Hua
2015-03-01
Existing automated test assembly methods typically employ constrained combinatorial optimization. Constructing forms sequentially based on an optimization approach usually results in unparallel forms and requires heuristic modifications. Methods based on a random search approach have the major advantage of producing parallel forms sequentially without further adjustment. This study incorporated a flexible content-balancing element into the statistical perspective item selection method of the cell-only method (Chen et al. in Educational and Psychological Measurement, 72(6), 933-953, 2012). The new method was compared with a sequential interitem distance weighted deviation model (IID WDM) (Swanson & Stocking in Applied Psychological Measurement, 17(2), 151-166, 1993), a simultaneous IID WDM, and a big-shadow-test mixed integer programming (BST MIP) method to construct multiple parallel forms based on matching a reference form item-by-item. The results showed that the cell-only method with content balancing and the sequential and simultaneous versions of IID WDM yielded results comparable to those obtained using the BST MIP method. The cell-only method with content balancing is computationally less intensive than the sequential and simultaneous versions of IID WDM.
Psychometric properties of the Chinese-version Quality of Nursing Work Life Scale.
Lee, Ya-Wen; Dai, Yu-Tzu; McCreary, Linda L; Yao, Grace; Brooks, Beth A
2014-09-01
In this study, we developed and tested the psychometric properties of the Chinese-version Quality of Nursing Work Life Scale along seven subscales: supportive milieu with security and professional recognition, work arrangement and workload, work/home life balance, head nurse's/supervisor's management style, teamwork and communication, nursing staffing and patient care, and milieu of respect and autonomy. An instrument-development procedure with three phases was conducted in seven hospitals in 2010-2011. Phase I comprised translation and the cultural-adaptation process, phase II comprised a pilot study, and phase III comprised a field-testing process. Purposive sampling was used in the pilot study (n = 150) and the large field study (n = 1254). Five new items were added, and 85.7% of the original items were retained in the 41 item Chinese version. Principal component analysis revealed that a model accounted for 56.6% of the variance with acceptable internal consistency, concurrent validity, and discriminant validity. This study gave evidence of reliability and validity of the 41 item Chinese-version Quality of Nursing Work Life Scale. © 2014 Wiley Publishing Asia Pty Ltd.
[Development of a questionnaire to measure family stress among married working women].
Kim, Gwang Suk; Cho, Won Jung
2006-08-01
Even though a number of studies have suggested that appropriate measuring instruments of family stress for working women have to be developed, the validity and reliability of the instruments used have not been consistently examined. The purpose of the present study was to develop a sensitive instrument to measure family stress for married working women, and to test the validity and reliability of the instrument. The items generated for this instrument were drawn from a comprehensive literature review. Twenty four items were developed through evaluation by 10 experts and twenty one items were finally confirmed through item analysis. Psychometric testing was preformed and confirmed with a convenient sample of 240 women employed in the industrial sector. Four factors evolved by factor analysis, which explained 50.5% of the total variance. The first factor 'Cooperation' explained 28.1%, 2nd factor 'Satisfaction with relationships' 10.6%, 3rd factor 'Democratic and comfortable environment' 6.3%, and 4th factor 'Disturbance of own living' 5.5%. Cronbach's coefficient of this instrument was 0.86. The study supports the validity and reliability of the instrument.
Assessing learning in small sized physics courses
NASA Astrophysics Data System (ADS)
Ene, Emanuela; Ackerson, Bruce J.
2018-01-01
We describe the construction, validation, and testing of a concept inventory for an Introduction to Physics of Semiconductors course offered by the department of physics to undergraduate engineering students. By design, this inventory addresses both content knowledge and the ability to interpret content via different cognitive processes outlined in Bloom's revised taxonomy. The primary challenge comes from the low number of test takers. We describe the Rasch modeling analysis for this concept inventory, and the results of the calibration on a small sample size, with the intention of providing a useful blueprint to other instructors. Our study involved 101 students from Oklahoma State University and fourteen faculty teaching or doing research in the field of semiconductors at seven universities. The items were written in four-option multiple-choice format. It was possible to calibrate a 30-item unidimensional scale precisely enough to characterize the student population enrolled each semester and, therefore, to allow the tailoring of the learning activities of each class. We show that this scale can be employed as an item bank from which instructors could extract short testlets and where we can add new items fitting the existing calibration.
Developing and investigating the use of single-item measures in organizational research.
Fisher, Gwenith G; Matthews, Russell A; Gibbons, Alyssa Mitchell
2016-01-01
The validity of organizational research relies on strong research methods, which include effective measurement of psychological constructs. The general consensus is that multiple item measures have better psychometric properties than single-item measures. However, due to practical constraints (e.g., survey length, respondent burden) there are situations in which certain single items may be useful for capturing information about constructs that might otherwise go unmeasured. We evaluated 37 items, including 18 newly developed items as well as 19 single items selected from existing multiple-item scales based on psychometric characteristics, to assess 18 constructs frequently measured in organizational and occupational health psychology research. We examined evidence of reliability; convergent, discriminant, and content validity assessments; and test-retest reliabilities at 1- and 3-month time lags for single-item measures using a multistage and multisource validation strategy across 3 studies, including data from N = 17 occupational health subject matter experts and N = 1,634 survey respondents across 2 samples. Items selected from existing scales generally demonstrated better internal consistency reliability and convergent validity, whereas these particular new items generally had higher levels of content validity. We offer recommendations regarding when use of single items may be more or less appropriate, as well as 11 items that seem acceptable, 14 items with mixed results that might be used with caution due to mixed results, and 12 items we do not recommend using as single-item measures. Although multiple-item measures are preferable from a psychometric standpoint, in some circumstances single-item measures can provide useful information. (c) 2016 APA, all rights reserved).
Killingsworth, Erin; Kimble, Laura P; Sudia, Tanya
2015-01-01
To explore the decision-making process of BSN faculty when determining which best practices to use for classroom testing. A descriptive, correlational study was conducted with a national sample (N = 127) of full-time BSN faculty. Participants completed a web-based survey incorporating instruments that measured beliefs about evaluation, decision-making, and best practices for item analysis and constructing and revising classroom tests. Study participants represented 31 states and were primarily middle-aged white women. In multiple linear regression analyses, faculty beliefs, contextual factors for decision-making, and decision-making processes accounted for statistically significant amounts of the variance in item analysis and test construction and revision. Strong faculty beliefs that rules were important when evaluating students was a significant predictor of increased use of best practices. Results support that understanding faculty beliefs around classroom testing is important in promoting the use of best practices.
The development of the "Cantonese receptive vocabulary test' for children aged 2-6 in Hong Kong.
Cheung, P S; Lee, K Y; Lee, L W
1997-01-01
The study aims to develop a Cantonese receptive vocabulary test to assess 2-6-year-old children in Hong Kong. The test consists of 100 test items. Each target item is accompanied by a phonological distractor, a semantic distractor and an unrelated distractor. A sample of 609 normal children from four Maternal and Child Health Centres and nine kindergartens was selected. The results show that there is a significant effect of age on the correct score. ANOVA was performed to look at the age effect on each distractor individually. It was found that the scores of the three distractors decrease in their own patterns as age increases. With strong content validity, strong construct validity and high correlation coefficients in the split-half reliability, this test could be used as a reliable measurement for the Cantonese-speaking population in Hong Kong.
Rasch analysis of the Chedoke-McMaster Attitudes towards Children with Handicaps scale.
Armstrong, Megan; Morris, Christopher; Tarrant, Mark; Abraham, Charles; Horton, Mike C
2017-02-01
Aim To assess whether the Chedoke-McMaster Attitudes towards Children with Handicaps (CATCH) 36-item total scale and subscales fit the unidimensional Rasch model. Method The CATCH was administered to 1881 children, aged 7-16 years in a cross-sectional survey. Data were used from a random sample of 416 for the initial Rasch analysis. The analysis was performed on the 36-item scale and then separately for each subscale. The analysis explored fit to the Rasch model in terms of overall scale fit, individual item fit, item response categories, and unidimensionality. Item bias for gender and school level was also assessed. Revised scales were then tested on an independent second random sample of 415 children. Results Analyses indicated that the 36-item overall scale was not unidimensional and did not fit the Rasch model. Two scales of affective attitudes and behavioural intention were retained after four items were removed from each due to misfit to the Rasch model. Additionally, the scaling was improved when the two most negative response categories were aggregated. There was no item bias by gender or school level on the revised scales. Items assessing cognitive attitudes did not fit the Rasch model and had low internal consistency as a scale. Conclusion Affective attitudes and behavioural intention CATCH sub-scales should be treated separately. Caution should be exercised when using the cognitive subscale. Implications for Rehabilitation The 36-item Chedoke-McMaster Attitudes towards Children with Handicaps (CATCH) scale as a whole did not fit the Rasch model; thus indicating a multi-dimensional scale. Researchers should use two revised eight-item subscales of affective attitudes and behavioural intentions when exploring interventions aiming to improve children's attitudes towards disabled people or factors associated with those attitudes. Researchers should use the cognitive subscale with caution, as it did not create a unidimensional and internally consistent scale. Therefore, conclusions drawn from this scale may not accurately reflect children's attitudes.
Watt, Torquil; Barbesino, Giuseppe; Bjorner, Jakob Bue; Bonnema, Steen Joop; Bukvic, Branka; Drummond, Russell; Groenvold, Mogens; Hegedüs, Laszlo; Kantzer, Valeska; Lasch, Kathryn E; Marcocci, Claudio; Mishra, Anjali; Netea-Maier, Romana; Ekker, Merel; Paunovic, Ivan; Quinn, Terence J; Rasmussen, Åse Krogh; Russell, Audrey; Sabaretnam, Mayilvaganan; Smit, Johannes; Törring, Ove; Zivaljevic, Vladan; Feldt-Rasmussen, Ulla
2015-03-01
Thyroid diseases are common and often affect quality of life (QoL). No cross-culturally validated patient-reported outcome measuring thyroid-related QoL is available. The purpose of the present study was to test the cross-cultural validity of the newly developed thyroid-related patient-reported outcome ThyPRO, using tests for differential item functioning (DIF) according to language version. The ThyPRO consists of 85 items summarized in 13 multi-item scales and one single item. Scales cover physical and mental symptoms, well-being and function as well as social and daily function and cosmetic concerns. Translation applied standard forward-backward methodology with subsequent cognitive interviews and reviews. Responses (N = 1,810) to the ThyPRO were collected in seven countries: UK (n = 166), The Netherlands (n = 147), Serbia (n = 150), Italy (n = 110), India (n = 148), Denmark (n = 902) and Sweden (n = 187). Translated versions were compared pairwise to the English version by examining uniform and nonuniform DIF, i.e., whether patients from different countries respond differently to a particular item, although they have identical level of the concept measured by the item. Analyses were controlled for thyroid diagnosis. DIF was investigated by ordinal logistic regression, testing for both statistical significance and magnitude (ΔR (2) > 0.02). Scale level was estimated by the sum score, after purification. For twelve of the 84 tested items, DIF was identified in more than one language. Eight of these were small, but four were indicative of possible low translatability. Twenty-one instances of DIF in single languages were identified, indicating potential problems with the particular translation. However, only seven were of a magnitude which could affect scale scores, most of which could be explained by sample differences not controlled for. The ThyPRO has good cross-cultural validity with only minor cross-cultural invariance and is recommended for use in international multicenter studies.