test item types: Topics by Science.gov

Sample records for test item types

An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

ERIC Educational Resources Information Center

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.

ERIC Educational Resources Information Center

Lau, C. Allen; Wang, Tianyou

This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

ERIC Educational Resources Information Center

Kim, Jihye; Oshima, T. C.

2013-01-01

In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
Usability of Interactive Item Types and Tools Introduced in the New GRE® Revised General Test. ETS GRE® Board Research Report. ETS GRE®-14-05. ETS Research Report. RR-14-28

ERIC Educational Resources Information Center

Swiggett, Wanda D.; Kotloff, Laurie; Ezzo, Chelsea; Adler, Rachel; Oliveri, Maria Elena

2014-01-01

The computer-based "Graduate Record Examinations"® ("GRE"®) revised General Test includes interactive item types and testing environment tools (e.g., test navigation, on-screen calculator, and help). How well do test takers understand these innovations? If test takers do not understand the new item types, these innovations may…
Weighted Maximum-a-Posteriori Estimation in Tests Composed of Dichotomous and Polytomous Items

ERIC Educational Resources Information Center

Sun, Shan-Shan; Tao, Jian; Chang, Hua-Hua; Shi, Ning-Zhong

2012-01-01

For mixed-type tests composed of dichotomous and polytomous items, polytomous items often yield more information than dichotomous items. To reflect the difference between the two types of items and to improve the precision of ability estimation, an adaptive weighted maximum-a-posteriori (WMAP) estimation is proposed. To evaluate the performance of…
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

ERIC Educational Resources Information Center

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.

ERIC Educational Resources Information Center

Benson, Jeri; Wilson, Michael

Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

PubMed

Sinharay, Sandip

2017-09-01

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
Evaluating innovative items for the NCLEX, part I: usability and pilot testing.

PubMed

Wendt, Anne; Harmes, J Christine

2009-01-01

National Council of State Boards of Nursing (NCSBN) has recently conducted preliminary research on the feasibility of including various types of innovative test questions (items) on the NCLEX. This article focuses on the participants' reactions to and their strategies for interacting with various types of innovative items. Part 2 in the May/June issue will focus on the innovative item templates and evaluation of the statistical characteristics and the level of cognitive processing required to answer the examination items.
Innovative testing of spatial ability: interactive responding and the use of complex stimuli material.

PubMed

Jelínek, Martin; Květon, Petr; Vobořil, Dalibor

2015-02-01

Despite initial expectations, which have emerged with the advancement of computer technology over the last decade of the twentieth century, scientific literature does not contain many relevant references regarding the development and use of innovative items in psychological testing. Our study presents and evaluates two novel item types. One item type is derived from a standard schematic test item used for the assessment of the spatial perception aspect of spatial ability, enhanced by an interactive response module. The performance on this item type is correlated with the performance on its paper and pencil counterpart. The other innovative item type used complex stimuli in the form of a short video of a ride through a city presented in an on-route perspective, which is intended to measure navigation skills and the ability to keep oneself oriented in space. In this case, the scores were related to the capacity of visuo-spatial working memory and also to the overall score in the paper/pencil test of spatial ability. The second relationship was moderated by gender.
Developing a Strategy for Using Technology-Enhanced Items in Large-Scale Standardized Tests

ERIC Educational Resources Information Center

Bryant, William

2017-01-01

As large-scale standardized tests move from paper-based to computer-based delivery, opportunities arise for test developers to make use of items beyond traditional selected and constructed response types. Technology-enhanced items (TEIs) have the potential to provide advantages over conventional items, including broadening construct measurement,…
An Analysis of Factors Affecting the Difficulty of Dialogue Items in TOEFL Listening Comprehension. TOEFL Research Reports, 51.

ERIC Educational Resources Information Center

Nissan, Susan; And Others

One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

ERIC Educational Resources Information Center

Bennett, Randy Elliot; And Others

1990-01-01

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation

ERIC Educational Resources Information Center

Chuah, Siang Chee; Drasgow, Fritz; Luecht, Richard

2006-01-01

Adaptive tests offer the advantages of reduced test length and increased accuracy in ability estimation. However, adaptive tests require large pools of precalibrated items. This study looks at the development of an item pool for 1 type of adaptive administration: the computer-adaptive sequential test. An important issue is the sample size required…
Beneficial effects of semantic memory support on older adults' episodic memory: Differential patterns of support of item and associative information.

PubMed

Mohanty, Praggyan Pam; Naveh-Benjamin, Moshe; Ratneshwar, Srinivasan

2016-02-01

The effects of two types of semantic memory support-meaningfulness of an item and relatedness between items-in mitigating age-related deficits in item and associative, memory are examined in a marketing context. In Experiment 1, participants studied less (vs. more) meaningful brand logo graphics (pictures) paired with meaningful brand names (words) and later were assessed by item (old/new) and associative (intact/recombined) memory recognition tests. Results showed that meaningfulness of items eliminated age deficits in item memory, while equivalently boosting associative memory for older and younger adults. Experiment 2, in which related and unrelated brand logo graphics and brand name pairs served as stimuli, revealed that relatedness between items eliminated age deficits in associative memory, while improving to the same degree item memory in older and younger adults. Experiment 2 also provided evidence for a probable boundary condition that could reconcile seemingly contradictory extant results. Overall, these experiments provided evidence that although the two types of semantic memory support can improve both item and associative memory in older and younger adults, older adults' memory deficits can be eliminated when the type of support provided is compatible with the type of information required to perform well on the test. (c) 2016 APA, all rights reserved).
A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing

ERIC Educational Resources Information Center

Wang, Chun; Fan, Zhewen; Chang, Hua-Hua; Douglas, Jeffrey A.

2013-01-01

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the…
Easy and Informative: Using Confidence-Weighted True-False Items for Knowledge Tests in Psychology Courses

ERIC Educational Resources Information Center

Dutke, Stephan; Barenberg, Jonathan

2015-01-01

We introduce a specific type of item for knowledge tests, confidence-weighted true-false (CTF) items, and review experiences of its application in psychology courses. A CTF item is a statement about the learning content to which students respond whether the statement is true or false, and they rate their confidence level. Previous studies using…
Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Wyse, Adam E.; Albano, Anthony D.

2015-01-01

This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for…
Rasch analysis of the Pediatric Evaluation of Disability Inventory-computer adaptive test (PEDI-CAT) item bank for children and young adults with spinal muscular atrophy.

PubMed

Pasternak, Amy; Sideridis, Georgios; Fragala-Pinkham, Maria; Glanzman, Allan M; Montes, Jacqueline; Dunaway, Sally; Salazar, Rachel; Quigley, Janet; Pandya, Shree; O'Riley, Susan; Greenwood, Jonathan; Chiriboga, Claudia; Finkel, Richard; Tennekoon, Gihan; Martens, William B; McDermott, Michael P; Fournier, Heather Szelag; Madabusi, Lavanya; Harrington, Timothy; Cruz, Rosangel E; LaMarca, Nicole M; Videon, Nancy M; Vivo, Darryl C De; Darras, Basil T

2016-12-01

In this study we evaluated the suitability of a caregiver-reported functional measure, the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test (PEDI-CAT), for children and young adults with spinal muscular atrophy (SMA). PEDI-CAT Mobility and Daily Activities domain item banks were administered to 58 caregivers of children and young adults with SMA. Rasch analysis was used to evaluate test properties across SMA types. Unidimensional content for each domain was confirmed. The PEDI-CAT was most informative for type III SMA, with ability levels distributed close to 0.0 logits in both domains. It was less informative for types I and II SMA, especially for mobility skills. Item and person abilities were not distributed evenly across all types. The PEDI-CAT may be used to measure functional performance in SMA, but additional items are needed to identify small changes in function and best represent the abilities of all types of SMA. Muscle Nerve 54: 1097-1107, 2016. © 2016 Wiley Periodicals, Inc.
Rasch-family models are more valuable than score-based approaches for analysing longitudinal patient-reported outcomes with missing data.

PubMed

de Bock, Élodie; Hardouin, Jean-Benoit; Blanchin, Myriam; Le Neel, Tanguy; Kubis, Gildas; Bonnaud-Antignac, Angélique; Dantan, Étienne; Sébille, Véronique

2016-10-01

The objective was to compare classical test theory and Rasch-family models derived from item response theory for the analysis of longitudinal patient-reported outcomes data with possibly informative intermittent missing items. A simulation study was performed in order to assess and compare the performance of classical test theory and Rasch model in terms of bias, control of the type I error and power of the test of time effect. The type I error was controlled for classical test theory and Rasch model whether data were complete or some items were missing. Both methods were unbiased and displayed similar power with complete data. When items were missing, Rasch model remained unbiased and displayed higher power than classical test theory. Rasch model performed better than the classical test theory approach regarding the analysis of longitudinal patient-reported outcomes with possibly informative intermittent missing items mainly for power. This study highlights the interest of Rasch-based models in clinical research and epidemiology for the analysis of incomplete patient-reported outcomes data. © The Author(s) 2013.

Item Type and Gender Differences on the Mental Rotations Test

ERIC Educational Resources Information Center

Voyer, Daniel; Doyle, Randi A.

2010-01-01

This study investigated gender differences on the Mental Rotations Test (MRT) as a function of item and response types. Accordingly, 86 male and 109 female undergraduate students completed the MRT without time limits. Responses were coded as reflecting two correct (CC), one correct and one wrong (CW), two wrong (WW), one correct and one blank…
Five-Point Likert Items: t Test versus Mann-Whitney-Wilcoxon

ERIC Educational Resources Information Center

de Winter, Joost C. F.; Dodou, Dimitra

2010-01-01

Likert questionnaires are widely used in survey research, but it is unclear whether the item data should be investigated by means of parametric or nonparametric procedures. This study compared the Type I and II error rates of the "t" test versus the Mann-Whitney-Wilcoxon (MWW) for five-point Likert items. Fourteen population…
Decomposing the interaction between retention interval and study/test practice: The role of retrievability

PubMed Central

Jang, Yoonhee; Wixted, John T.; Pecher, Diane; Zeelenberg, René; Huber, David E.

2012-01-01

Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially non-retrievable items. In two experiments, an initial test determined item retrievability. Retrievable or non-retrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical crossover interaction between retention interval and practice type. For retrievable items, however, the crossover interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For non-retrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially non-retrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and non-retrievable items. PMID:22304454
Decomposing the interaction between retention interval and study/test practice: the role of retrievability.

PubMed

Jang, Yoonhee; Wixted, John T; Pecher, Diane; Zeelenberg, René; Huber, David E

2012-01-01

Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially nonretrievable items. In two experiments, an initial test determined item retrievability. Retrievable or nonretrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical cross-over interaction between retention interval and practice type. For retrievable items, however, the cross-over interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For nonretrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially nonretrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and nonretrievable items.
Elicited Speech from Graph Items on the Test of Spoken English[TM]. Research Reports. Report 74. RR-04-06

ERIC Educational Resources Information Center

Katz, Irvin R.; Xi, Xiaoming; Kim, Hyun-Joo; Cheng, Peter C. H.

2004-01-01

This research applied a cognitive model to identify item features that lead to irrelevant variance on the Test of Spoken English[TM] (TSE[R]). The TSE is an assessment of English oral proficiency and includes an item that elicits a description of a statistical graph. This item type sometimes appears to tap graph-reading skills--an irrelevant…
Optimal Bayesian Adaptive Design for Test-Item Calibration.

PubMed

van der Linden, Wim J; Ren, Hao

2015-06-01

An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
Developing Computerized Tests for Classroom Teachers: A Pilot Study.

ERIC Educational Resources Information Center

Glowacki, Margaret L.; And Others

Two types of computerized testing have been defined: (1) computer-based testing, using a computer to administer conventional tests in which all examinees take the same set of items; and (2) adaptive tests, in which items are selected for administration by the computer, based on examinee's previous responses. This paper discusses an option for…
A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating

PubMed Central

Michaelides, Michalis P.

2010-01-01

Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230
A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

PubMed

Michaelides, Michalis P

2010-01-01

Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
Learning Factors Transfer Analysis: Using Learning Curve Analysis to Automatically Generate Domain Models

ERIC Educational Resources Information Center

Pavlik, Philip I. Jr.; Cen, Hao; Koedinger, Kenneth R.

2009-01-01

This paper describes a novel method to create a quantitative model of an educational content domain of related practice item-types using learning curves. By using a pairwise test to search for the relationships between learning curves for these item-types, we show how the test results in a set of pairwise transfer relationships that can be…
Assessment of Computer and Information Literacy in ICILS 2013: Do Different Item Types Measure the Same Construct?

ERIC Educational Resources Information Center

Ihme, Jan Marten; Senkbeil, Martin; Goldhammer, Frank; Gerick, Julia

2017-01-01

The combination of different item formats is found quite often in large scale assessments, and analyses on the dimensionality often indicate multi-dimensionality of tests regarding the task format. In ICILS 2013, three different item types (information-based response tasks, simulation tasks, and authoring tasks) were used to measure computer and…
Examining the Effectiveness of Test Accommodation Using DIF and a Mixture IRT Model

ERIC Educational Resources Information Center

Cho, Hyun-Jeong; Lee, Jaehoon; Kingston, Neal

2012-01-01

This study examined the validity of test accommodation in third-eighth graders using differential item functioning (DIF) and mixture IRT models. Two data sets were used for these analyses. With the first data set (N = 51,591) we examined whether item type (i.e., story, explanation, straightforward) or item features were associated with item…
Using Cochran's Z Statistic to Test the Kernel-Smoothed Item Response Function Differences between Focal and Reference Groups

ERIC Educational Resources Information Center

Zheng, Yinggan; Gierl, Mark J.; Cui, Ying

2010-01-01

This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
[Memory Checking Tests].

ERIC Educational Resources Information Center

Hart, Joseph T.

Two basic tests for checking memory skills are included in these appendices. The first, the General Information Test, uses the same 150 items for each of its two versions. One version is a completion-type test which measures recall by requiring the examinee to supply a specific response. The other version supplements each of the 150 items with…
Predictive Control of Speededness in Adaptive Testing

ERIC Educational Resources Information Center

van der Linden, Wim J.

2009-01-01

An adaptive testing method is presented that controls the speededness of a test using predictions of the test takers' response times on the candidate items in the pool. Two different types of predictions are investigated: posterior predictions given the actual response times on the items already administered and posterior predictions that use the…
Archeological Survey and Testing in the Holy Cross Historic District, New Orleans, Louisiana. Volume 2

DTIC Science & Technology

1992-02-01

467 Table 4 Personal Items from Shovel Tests, 160R130. SURF SURF SURF N15 N5 NO NO $5 S5 1 2 3 W20 El5 E20 W10 E20 EO Bone button, Type B-5 Ceramic...Table 4 . Personal Items from Shovel Tests, 160R130. S15 S20 S20 S25 S25 S30 S30 S30 S32.5 E5 E35 E20 E50 E25 E50 E35 E20 E35 Bone button, Type B-5...1 1 1 7 1 471 Table 4 Personal Items from Shovel Tests, 160R130. S30 S34 S35 S45 S50 TOTAL El0 E35 E30 E30 E55 Bone button, Type B-5 1 1 Ceramic
Do Images Influence Assessment in Anatomy? Exploring the Effect of Images on Item Difficulty and Item Discrimination

ERIC Educational Resources Information Center

Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.

2013-01-01

Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…
DEVELOPMENT OF DIAGNOSTIC ANALYTICAL AND MECHANICAL ABILITY TESTS THROUGH FACET DESIGN AND ANALYSIS.

ERIC Educational Resources Information Center

GUTTMAN, LOUIS,; SCHLESINGER, I.M.

METHODOLOGY BASED ON FACET THEORY (MODIFIED SET THEORY) WAS USED IN TEST CONSTRUCTION AND ANALYSIS TO PROVIDE AN EFFICIENT TOOL OF EVALUATION FOR VOCATIONAL GUIDANCE AND VOCATIONAL SCHOOL USE. THE TYPE OF TEST DEVELOPMENT UNDERTAKEN WAS LIMITED TO THE USE OF NONVERBAL PICTORIAL ITEMS. ITEMS FOR TESTING ABILITY TO IDENTIFY ELEMENTS BELONGING TO AN…
Outlier Detection in High-Stakes Certification Testing.

ERIC Educational Resources Information Center

Meijer, Rob R.

2002-01-01

Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)
Detection of Item Preknowledge Using Likelihood Ratio Test and Score Test

ERIC Educational Resources Information Center

Sinharay, Sandip

2017-01-01

An increasing concern of producers of educational assessments is fraudulent behavior during the assessment (van der Linden, 2009). Benefiting from item preknowledge (e.g., Eckerly, 2017; McLeod, Lewis, & Thissen, 2003) is one type of fraudulent behavior. This article suggests two new test statistics for detecting individuals who may have…

Assessment Guide for Educators: Introduction

ERIC Educational Resources Information Center

GED Testing Service, 2016

2016-01-01

This guide is designed to help adult educators and administrators better understand the content of the GED® test. This guide is tailored to each test subject and highlights the test's item types, assessment targets, and guidelines for how items will be scored. This 2016 edition has been updated to include the most recent information about the…
Does the Cognitive Reflection Test actually capture heuristic versus analytic reasoning styles in older adults?

PubMed

Hertzog, Christopher; Smith, R Marit; Ariel, Robert

2018-01-01

Background/Study Context: This study evaluated adult age differences in the original three-item Cognitive Reflection Test (CRT; Frederick, 2005, The Journal of Economic Perspectives, 19, 25-42) and an expanded seven-item version of that test (Toplak et al., 2013, Thinking and Reasoning, 20, 147-168). The CRT is a numerical problem-solving test thought to capture a disposition towards either rapid, intuition-based problem solving (Type I reasoning) or a more thoughtful, analytical problem-solving approach (Type II reasoning). Test items are designed to induce heuristically guided errors that can be avoided if using an appropriate numerical representation of the test problems. We evaluated differences between young adults and old adults in CRT performance and correlates of CRT performance. Older adults (ages 60 to 80) were paid volunteers who participated in experiments assessing age differences in self-regulated learning. Young adults (ages 17 to 35) were students participating for pay as part of a project assessing measures of critical thinking skills or as a young comparison group in the self-regulated learning study. There were age differences in the number of CRT correct responses in two independent samples. Results with the original three-item CRT found older adults to have a greater relative proportion of errors based on providing the intuitive lure. However, younger adults actually had a greater proportion of intuitive errors on the long version of the CRT, relative to older adults. Item analysis indicated a much lower internal consistency of CRT items for older adults. These outcomes do not offer full support for the argument that older adults are higher in the use of a "Type I" cognitive style. The evidence was also consistent with an alternative hypothesis that age differences were due to lower levels of numeracy in the older samples. Alternative process-oriented evaluations of how older adults solve CRT items will probably be needed to determine conditions under which older adults manifest an increase in the Type I dispositional tendency to opt for superficial, heuristically guided problem representations in numerical problem-solving tasks.
A Basic Test Theory Generalizable to Tailored Testing. Technical Report No. 1.

ERIC Educational Resources Information Center

Cliff, Norman

Measures of consistency and completeness of order relations derived from test-type data are proposed. The measures are generalized to apply to incomplete data such as tailored testing. The measures are based on consideration of the items-plus-persons by items-plus-persons matrix as an adjacency matrix in which a 1 means that the row element…
The Impact of Escape Alternative Position Change in Multiple-Choice Test on the Psychometric Properties of a Test and Its Items Parameters

ERIC Educational Resources Information Center

Hamadneh, Iyad Mohammed

2015-01-01

This study aimed at investigating the impact changing of escape alternative position in multiple-choice test on the psychometric properties of a test and it's items parameters (difficulty, discrimination & guessing), and estimation of examinee ability. To achieve the study objectives, a 4-alternative multiple choice type achievement test…
Application of a Method of Estimating DIF for Polytomous Test Items.

ERIC Educational Resources Information Center

Camilli, Gregory; Congdon, Peter

1999-01-01

Demonstrates a method for studying differential item functioning (DIF) that can be used with dichotomous or polytomous items and that is valid for data that follow a partial credit Item Response Theory model. A simulation study shows that positively biased Type I error rates are in accord with results from previous studies. (SLD)
Item Parameter Changes and Equating: An Examination of the Effects of Lack of Item Parameter Invariance on Equating and Score Accuracy for Different Proficiency Levels

ERIC Educational Resources Information Center

Store, Davie

2013-01-01

The impact of particular types of context effects on actual scores is less understood although there has been some research carried out regarding certain types of context effects under the nonequivalent anchor test (NEAT) design. In addition, the issue of the impact of item context effects on scores has not been investigated extensively when item…
Building the BIKE: Development and Testing of the Biotechnology Instrument for Knowledge Elicitation (BIKE)

NASA Astrophysics Data System (ADS)

Witzig, Stephen B.; Rebello, Carina M.; Siegel, Marcelle A.; Freyermuth, Sharyn K.; Izci, Kemal; McClure, Bruce

2014-10-01

Identifying students' conceptual scientific understanding is difficult if the appropriate tools are not available for educators. Concept inventories have become a popular tool to assess student understanding; however, traditionally, they are multiple choice tests. International science education standard documents advocate that assessments should be reform based, contain diverse question types, and should align with instructional approaches. To date, no instrument of this type targeting student conceptions in biotechnology has been developed. We report here the development, testing, and validation of a 35-item Biotechnology Instrument for Knowledge Elicitation (BIKE) that includes a mix of question types. The BIKE was designed to elicit student thinking and a variety of conceptual understandings, as opposed to testing closed-ended responses. The design phase contained nine steps including a literature search for content, student interviews, a pilot test, as well as expert review. Data from 175 students over two semesters, including 16 student interviews and six expert reviewers (professors from six different institutions), were used to validate the instrument. Cronbach's alpha on the pre/posttest was 0.664 and 0.668, respectively, indicating the BIKE has internal consistency. Cohen's kappa for inter-rater reliability among the 6,525 total items was 0.684 indicating substantial agreement among scorers. Item analysis demonstrated that the items were challenging, there was discrimination among the individual items, and there was alignment with research-based design principles for construct validity. This study provides a reliable and valid conceptual understanding instrument in the understudied area of biotechnology.
Modulation of the electrophysiological correlates of retrieval cue processing by the specificity of task demands.

PubMed

Johnson, Jeffrey D; Rugg, Michael D

2006-02-03

Retrieval orientation refers to the differential processing of retrieval cues according to the type of information sought from memory (e.g., words vs. pictures). In the present study, event-related potentials (ERPs) were employed to investigate whether the neural correlates of differential retrieval orientations are sensitive to the specificity of the retrieval demands of the test task. In separate study-test phases, subjects encoded lists of intermixed words and pictures, and then undertook one of two retrieval tests, in both of which the retrieval cues were exclusively words. In the recognition test, subjects performed 'old/new' discriminations on the test items, and old items corresponded to only one class of studied material (words or pictures). In the exclusion test, old items corresponded to both classes of study material, and subjects were required to respond 'old' only to test items corresponding to a designated class of material. Thus, demands for retrieval specificity were greater in the exclusion test than during recognition. ERPs elicited by correctly classified new items in the two types of test were contrasted according to whether words or pictures were the sought-for material. Material-dependent ERP effects were evident in both tests, but the effects onset earlier and offset later in the exclusion test. The findings suggest that differential processing of retrieval cues, and hence the adoption of differential retrieval orientations, varies according to the specificity of the retrieval goal.
Item response theory scoring and the detection of curvilinear relationships.

PubMed

Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A

2017-03-01

Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Planning a Study for Testing the Rasch Model given Missing Values due to the use of Test-booklets.

PubMed

Yanagida, Takuya; Kubinger, Klaus D; Rasch, Dieter

2015-01-01

Though calibration of an achievement test within a psychological and educational context is very often carried out by the Rasch model, data sampling is hardly designed according to statistical foundations. However, Kubinger, Rasch, and Yanagida (2009, 2011) suggested an approach for the determination of sample size according to a given Type-I and Type-II risk and a certain effect of model contradiction when testing the Rasch model. The approach uses a three-way analysis of variance design with mixed classification. For the while, their simulation studies deal with complete data, meaning every examinee is administered with all of the items of an item pool. The simulation study now presented in this paper deals with the practical relevant case, in particular for large-scale assessments, that item presentation happens to use several test-booklets. As a consequence, there are missing values by design. Therefore, the question to be considered is, whether this approach works in this case as well. Besides the fact, that data are not normally distributed but there is a dichotomous variable (an examinee either solves an item or fails to solve it), only a single entry for each cell exists in the given three-way analysis of variance design, if at all, due to missing values. Hence, the obligatory test-statistic's distribution may not be retained, in contrast to the case of having no missing values. The result of our simulation study, despite applying only to a very special scenario, is that this approach works, indeed: Whether test-booklets were used or every examinee is administered all of the items changes nothing in respect to the actual Type-I risk or to the power of the test, given almost the same amount of information of examinees per item. However, as the results are limited to a special scenario, we currently recommend any interested researcher to simulate the appropriate one in advance by him/herself.
Solving the measurement invariance anchor item problem in item response theory.

PubMed

Meade, Adam W; Wright, Natalie A

2012-09-01

The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.
75 FR 82407 - Submission for OMB Review; Comment Request; Testing Successful Health Communications Surrounding...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-12-30

... surrounding aging-related issues from the National Institute on Aging (NIA). Type of Information Collection... information technology. Direct Comments to OMB: Written comments and/or suggestions regarding the item(s...; Comment Request; Testing Successful Health Communications Surrounding Aging-Related Issues From the...
Nickel and cobalt release from jewellery and metal clothing items in Korea.

PubMed

Cheong, Seung Hyun; Choi, You Won; Choi, Hae Young; Byun, Ji Yeon

2014-01-01

In Korea, the prevalence of nickel allergy has shown a sharply increasing trend. Cobalt contact allergy is often associated with concomitant reactions to nickel, and is more common in Korea than in western countries. The aim of the present study was to investigate the prevalence of items that release nickel and cobalt on the Korean market. A total of 471 items that included 193 branded jewellery, 202 non-branded jewellery and 76 metal clothing items were sampled and studied with a dimethylglyoxime (DMG) test and a cobalt spot test to detect nickel and cobalt release, respectively. Nickel release was detected in 47.8% of the tested items. The positive rates in the DMG test were 12.4% for the branded jewellery, 70.8% for the non-branded jewellery, and 76.3% for the metal clothing items. Cobalt release was found in 6.2% of items. Among the types of jewellery, belts and hair pins showed higher positive rates in both the DMG test and the cobalt spot test. Our study shows that the prevalence of items that release nickel or cobalt among jewellery and metal clothing items is high in Korea. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Incidental retrieval-induced forgetting of location information.

PubMed

Gómez-Ariza, Carlos J; Fernandez, Angel; Bajo, M Teresa

2012-06-01

Retrieval-induced forgetting (RIF) has been studied with different types of tests and materials. However, RIF has always been tested on the items' central features, and there is no information on whether inhibition also extends to peripheral features of the events in which the items are embedded. In two experiments, we specifically tested the presence of RIF in a task in which recall of peripheral information was required. After a standard retrieval practice task oriented to item identity, participants were cued with colors (Exp. 1) or with the items themselves (Exp. 2) and asked to recall the screen locations where the items had been displayed during the study phase. RIF for locations was observed after retrieval practice, an effect that was not present when participants were asked to read instead of retrieving the items. Our findings provide evidence that peripheral location information associated with an item during study can be also inhibited when the retrieval conditions promote the inhibition of more central, item identity information.
Evaluating construct validity of the second version of the Copenhagen Psychosocial Questionnaire through analysis of differential item functioning and differential item effect.

PubMed

Bjorner, Jakob Bue; Pejtersen, Jan Hyld

2010-02-01

To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures

ERIC Educational Resources Information Center

Atar, Burcu; Kamata, Akihito

2011-01-01

The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Design Patterns for Digital Item Types in Higher Education

ERIC Educational Resources Information Center

Draaijer, S.; Hartog, R. J. M.

2007-01-01

A set of design patterns for digital item types has been developed in response to challenges identified in various projects by teachers in higher education. The goal of the projects in question was to design and develop formative and summative tests, and to develop interactive learning material in the form of quizzes. The subject domains involved…
Automatic Scoring of Paper-and-Pencil Figural Responses. Research Report.

ERIC Educational Resources Information Center

Martinez, Michael E.; And Others

Large-scale testing is dominated by the multiple-choice question format. Widespread use of the format is due, in part, to the ease with which multiple-choice items can be scored automatically. This paper examines automatic scoring procedures for an alternative item type: figural response. Figural response items call for the completion or…
A Comparison between Discrimination Indices and Item-Response Theory Using the Rasch Model in a Clinical Course Written Examination of a Medical School.

PubMed

Park, Jong Cook; Kim, Kwang Sig

2012-03-01

The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Development and validation of the Chinese Attitudes to Starting Insulin Questionnaire (Ch-ASIQ) for primary care patients with type 2 diabetes.

PubMed

Fu, Sau Nga; Chin, Weng Yee; Wong, Carlos King Ho; Yeung, Vincent Tok Fai; Yiu, Ming Pong; Tsui, Hoi Yee; Chan, Ka Hung

2013-01-01

To develop and evaluate the psychometric properties of a Chinese questionnaire which assesses the barriers and enablers to commencing insulin in primary care patients with poorly controlled Type 2 diabetes. Questionnaire items were identified using literature review. Content validation was performed and items were further refined using an expert panel. Following translation, back translation and cognitive debriefing, the translated Chinese questionnaire was piloted on target patients. Exploratory factor analysis and item-scale correlations were performed to test the construct validity of the subscales and items. Internal reliability was tested by Cronbach's alpha. Twenty-seven identified items underwent content validation, translation and cognitive debriefing. The translated questionnaire was piloted on 303 insulin naïve (never taken insulin) Type 2 diabetes patients recruited from 10 government-funded primary care clinics across Hong Kong. Sufficient variability in the dataset for factor analysis was confirmed by Bartlett's Test of Sphericity (P<0.001). Using exploratory factor analysis with varimax rotation, 10 factors were generated onto which 26 items loaded with loading scores > 0.4 and Eigenvalues >1. Total variance for the 10 factors was 66.22%. Kaiser-Meyer-Olkin measure was 0.725. Cronbach's alpha coefficients for the first four factors were ≥0.6 identifying four sub-scales to which 13 items correlated. Remaining sub-scales and items with poor internal reliability were deleted. The final 13-item instrument had a four scale structure addressing: 'Self-image and stigmatization'; 'Factors promoting self-efficacy; 'Fear of pain or needles'; and 'Time and family support'. The Chinese Attitudes to Starting Insulin Questionnaire (Ch-ASIQ) appears to be a reliable and valid measure for assessing barriers to starting insulin. This short instrument is easy to administer and may be used by healthcare providers and researchers as an assessment tool for Chinese diabetic primary care patients, including the elderly, who are unwilling to start insulin.

Development and reliability testing of a self-report instrument to measure the office layout as a correlate of occupational sitting.

PubMed

Duncan, Mitch J; Rashid, Mahbub; Vandelanotte, Corneel; Cutumisu, Nicoleta; Plotnikoff, Ronald C

2013-02-04

Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach's α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. The number of items on all scales were reduced, Chronbach's α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys.
Development and reliability testing of a self-report instrument to measure the office layout as a correlate of occupational sitting

PubMed Central

2013-01-01

Background Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. Methods The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach’s α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. Results The number of items on all scales were reduced, Chronbach’s α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). Conclusion All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys. PMID:23379485
Examining Power and Type 1 Error for Step and Item Level Tests of Invariance: Investigating the Effect of the Number of Item Score Levels

ERIC Educational Resources Information Center

Ayodele, Alicia Nicole

2017-01-01

Within polytomous items, differential item functioning (DIF) can take on various forms due to the number of response categories. The lack of invariance at this level is referred to as differential step functioning (DSF). The most common DSF methods in the literature are the adjacent category log odds ratio (AC-LOR) estimator and cumulative…
Differential Item Functioning Detection Using the Multiple Indicators, Multiple Causes Method with a Pure Short Anchor

ERIC Educational Resources Information Center

Shih, Ching-Lin; Wang, Wen-Chung

2009-01-01

The multiple indicators, multiple causes (MIMIC) method with a pure short anchor was proposed to detect differential item functioning (DIF). A simulation study showed that the MIMIC method with an anchor of 1, 2, 4, or 10 DIF-free items yielded a well-controlled Type I error rate even when such tests contained as many as 40% DIF items. In general,…
An Analysis of Variance Approach for the Estimation of Response Time Distributions in Tests

ERIC Educational Resources Information Center

Attali, Yigal

2010-01-01

Generalizability theory and analysis of variance methods are employed, together with the concept of objective time pressure, to estimate response time distributions and the degree of time pressure in timed tests. By estimating response time variance components due to person, item, and their interaction, and fixed effects due to item types and…
A novel multi-item joint replenishment problem considering multiple type discounts.

PubMed

Cui, Ligang; Zhang, Yajun; Deng, Jie; Xu, Maozeng

2018-01-01

In business replenishment, discount offers of multi-item may either provide different discount schedules with a single discount type, or provide schedules with multiple discount types. The paper investigates the joint effects of multiple discount schemes on the decisions of multi-item joint replenishment. In this paper, a joint replenishment problem (JRP) model, considering three discount (all-unit discount, incremental discount, total volume discount) offers simultaneously, is constructed to determine the basic cycle time and joint replenishment frequencies of multi-item. To solve the proposed problem, a heuristic algorithm is proposed to find the optimal solutions and the corresponding total cost of the JRP model. Numerical experiment is performed to test the algorithm and the computational results of JRPs under different discount combinations show different significance in the replenishment cost reduction.
Automated Item Generation with Recurrent Neural Networks.

PubMed

von Davier, Matthias

2018-03-12

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Design Constructibility Reviews.

DTIC Science & Technology

1987-01-01

specifications for base and sub-base courses, and wearing course.I Item 21 - Has provision been made in the specifications for positive control of the temperature...of the bituminous material? Item 22 - Test results on samples of asphalt , aggregate, sand and mix should be obtained from the plant prior to placing...in the * drawings. Item 3 - Make sure that stud types, sizes and pacinqs are spelled out in the plans and sc-i catr-. Item 4 - All welders that will
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

PubMed

Smolen, Tomasz; Chuderski, Adam

2015-01-01

Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
Instructional Sensitivity Statistics Appropriate for Objectives-Based Test Items. CSE Report No. 91.

ERIC Educational Resources Information Center

Kosecoff, Jacqueline B.; Klein, Stephen P.

Two types of sensitivity indices were developed in this paper, one internal to the total test and the second external. To evaluate the success of these statistics the three criteria suggested for a satisfactory index of item quality were considered. The Internal Sensitivity Index appears to meet these demands. Certainly it is easily computed. In…
Performance of Automated Speech Scoring on Different Low- to Medium-Entropy Item Types for Low-Proficiency English Learners. Research Report. ETS RR-17-12

ERIC Educational Resources Information Center

Loukina, Anastassia; Zechner, Klaus; Yoon, Su-Youn; Zhang, Mo; Tao, Jidong; Wang, Xinhao; Lee, Chong Min; Mulholland, Matthew

2017-01-01

This report presents an overview of the "SpeechRater"? automated scoring engine model building and evaluation process for several item types with a focus on a low-English-proficiency test-taker population. We discuss each stage of speech scoring, including automatic speech recognition, filtering models for nonscorable responses, and…
No retrieval-induced forgetting using item-specific independent cues: evidence against a general inhibitory account.

PubMed

Camp, Gino; Pecher, Diane; Schmidt, Henk G

2007-09-01

Retrieval practice with particular items from memory can impair the recall of related items on a later memory test. This retrieval-induced forgetting effect has been ascribed to inhibitory processes (M. C. Anderson & B. A. Spellman, 1995). A critical finding that distinguishes inhibitory from interference explanations is that forgetting is found with independent (or extralist) cues. In 4 experiments, the authors tested whether the forgetting effect is cue-independent. Forgetting was investigated for both studied and unstudied semantically related items. Retrieval-induced forgetting was not found using item-specific independent cues for either studied or unstudied items. However, forgetting was found for both item types when studied categories were used as cues. These results are not in line with a general inhibitory account, because this account predicts retrieval-induced forgetting with independent cues. Interference and context-specific inhibition are discussed as possible explanations for the data. 2007 APA
Constructing three emotion knowledge tests from the invariant measurement approach

PubMed Central

Prieto, Gerardo; Burin, Debora I.

2017-01-01

Background Psychological constructionist models like the Conceptual Act Theory (CAT) postulate that complex states such as emotions are composed of basic psychological ingredients that are more clearly respected by the brain than basic emotions. The objective of this study was the construction and initial validation of Emotion Knowledge measures from the CAT frame by means of an invariant measurement approach, the Rasch Model (RM). Psychological distance theory was used to inform item generation. Methods Three EK tests—emotion vocabulary (EV), close emotional situations (CES) and far emotional situations (FES)—were constructed and tested with the RM in a community sample of 100 females and 100 males (age range: 18–65), both separately and conjointly. Results It was corroborated that data-RM fit was sufficient. Then, the effect of type of test and emotion on Rasch-modelled item difficulty was tested. Significant effects of emotion on EK item difficulty were found, but the only statistically significant difference was that between “happiness” and the remaining emotions; neither type of test, nor interaction effects on EK item difficulty were statistically significant. The testing of gender differences was carried out after corroborating that differential item functioning (DIF) would not be a plausible alternative hypothesis for the results. No statistically significant sex-related differences were found out in EV, CES, FES, or total EK. However, the sign of d indicate that female participants were consistently better than male ones, a result that will be of interest for future meta-analyses. Discussion The three EK tests are ready to be used as components of a higher-level measurement process. PMID:28929013
The Development of Multiple-Choice Items Consistent with the AP Chemistry Curriculum Framework to More Accurately Assess Deeper Understanding

ERIC Educational Resources Information Center

Domyancich, John M.

2014-01-01

Multiple-choice questions are an important part of large-scale summative assessments, such as the advanced placement (AP) chemistry exam. However, past AP chemistry exam items often lacked the ability to test conceptual understanding and higher-order cognitive skills. The redesigned AP chemistry exam shows a distinctive shift in item types toward…
Evaluating the Wald Test for Item-Level Comparison of Saturated and Reduced Models in Cognitive Diagnosis

ERIC Educational Resources Information Center

de la Torre, Jimmy; Lee, Young-Sun

2013-01-01

This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…
Bilingual health literacy assessment using the Talking Touchscreen/la Pantalla Parlanchina: Development and pilot testing.

PubMed

Yost, Kathleen J; Webster, Kimberly; Baker, David W; Choi, Seung W; Bode, Rita K; Hahn, Elizabeth A

2009-06-01

Current health literacy measures are too long, imprecise, or have questionable equivalence of English and Spanish versions. The purpose of this paper is to describe the development and pilot testing of a new bilingual computer-based health literacy assessment tool. We analyzed literacy data from three large studies. Using a working definition of health literacy, we developed new prose, document and quantitative items in English and Spanish. Items were pilot tested on 97 English- and 134 Spanish-speaking participants to assess item difficulty. Items covered topics relevant to primary care patients and providers. English- and Spanish-speaking participants understood the tasks involved in answering each type of question. The English Talking Touchscreen was easy to use and the English and Spanish items provided good coverage of the difficulty continuum. Qualitative and quantitative results provided useful information on computer acceptability and initial item difficulty. After the items have been administered on the Talking Touchscreen (la Pantalla Parlanchina) to 600 English-speaking (and 600 Spanish-speaking) primary care patients, we will develop a computer adaptive test. This health literacy tool will enable clinicians and researchers to more precisely determine the level at which low health literacy adversely affects health and healthcare utilization.
Diagnostic Accuracy of History and Physical Examination of Superior Labrum Anterior-Posterior Lesions

PubMed Central

Michener, Lori A.; Doukas, William C.; Murphy, Kevin P.; Walsworth, Matthew K.

2011-01-01

Context: Type I superior labrum anterior-posterior (SLAP) lesions involve degenerative fraying and probably are not the cause of shoulder pain. Type II to IV SLAP lesions are tears of the labrum. Objective: To determine the diagnostic accuracy of patient history and the active compression, anterior slide, and crank tests for type I and type II to IV SLAP lesions. Design: Cohort study. Setting: Clinic. Patients or Other Participants: Fifty-five patients (47 men, 8 women; age = 40.6 ± 15.1 years) presenting with shoulder pain. Intervention(s): For each patient, an orthopaedic surgeon conducted a clinical examination of history of trauma; sudden onset of symptoms; history of popping, clicking, or catching; age; and active compression, crank, and anterior slide tests. The reference standard was the intraoperative diagnosis. The operating surgeon was blinded to the results of the clinical examination. Main Outcome Measure(s): Diagnostic utility was calculated using the receiver operating characteristic curve and area under the curve (AUC), sensitivity, specificity, positive likelihood ratio (+LR), and negative likelihood ratio (−LR). Forward stepwise binary regression was used to determine a combination of tests for diagnosis. Results: No history item or physical examination test had diagnostic accuracy for type I SLAP lesions (n = 13). The anterior slide test had utility (AUC = 0.70, +LR = 2.25, −LR = 0.44) to confirm and exclude type II to IV SLAP lesions (n = 10). The combination of a history of popping, clicking, or catching and the anterior slide test demonstrated diagnostic utility for confirming type II to IV SLAP lesions (+LR = 6.00). Conclusions: The anterior slide test had limited diagnostic utility for confirming and excluding type II to IV SLAP lesions; diagnostic values indicated only small shifts in probability. However, the combination of the anterior slide test with a history of popping, clicking, or catching had moderate diagnostic utility for confirming type II to IV SLAP lesions. No single item or combination of history items and physical examination tests had diagnostic utility for type I SLAP lesions. PMID:21944065
Performance on large-scale science tests: Item attributes that may impact achievement scores

NASA Astrophysics Data System (ADS)

Gordon, Janet Victoria

Significant differences in achievement among ethnic groups persist on the eighth-grade science Washington Assessment of Student Learning (WASL). The WASL measures academic performance in science using both scenario and stand-alone question types. Previous research suggests that presenting target items connected to an authentic context, like scenario question types, can increase science achievement scores especially in underrepresented groups and thus help to close the achievement gap. The purpose of this study was to identify significant differences in performance between gender and ethnic subgroups by question type on the 2005 eighth-grade science WASL. MANOVA and ANOVA were used to examine relationships between gender and ethnic subgroups as independent variables with achievement scores on scenario and stand-alone question types as dependent variables. MANOVA revealed no significant effects for gender, suggesting that the 2005 eighth-grade science WASL was gender neutral. However, there were significant effects for ethnicity. ANOVA revealed significant effects for ethnicity and ethnicity by gender interaction in both question types. Effect sizes were negligible for the ethnicity by gender interaction. Large effect sizes between ethnicities on scenario question types became moderate to small effect sizes on stand-alone question types. This indicates the score advantage the higher performing subgroups had over the lower performing subgroups was not as large on stand-alone question types compared to scenario question types. A further comparison examined performance on multiple-choice items only within both question types. Similar achievement patterns between ethnicities emerged; however, achievement patterns between genders changed in boys' favor. Scenario question types appeared to register differences between ethnic groups to a greater degree than stand-alone question types. These differences may be attributable to individual differences in cognition, characteristics of test items themselves and/or opportunities to learn. Suggestions for future research are made.
Formalizing Evidence Type Definitions for Drug-Drug Interaction Studies to Improve Evidence Base Curation.

PubMed

Utecht, Joseph; Brochhausen, Mathias; Judkins, John; Schneider, Jodi; Boyce, Richard D

2017-01-01

In this research we aim to demonstrate that an ontology-based system can categorize potential drug-drug interaction (PDDI) evidence items into complex types based on a small set of simple questions. Such a method could increase the transparency and reliability of PDDI evidence evaluation, while also reducing the variations in content and seriousness ratings present in PDDI knowledge bases. We extended the DIDEO ontology with 44 formal evidence type definitions. We then manually annotated the evidence types of 30 evidence items. We tested an RDF/OWL representation of answers to a small number of simple questions about each of these 30 evidence items and showed that automatic inference can determine the detailed evidence types based on this small number of simpler questions. These results show proof-of-concept for a decision support infrastructure that frees the evidence evaluator from mastering relatively complex written evidence type definitions.
Controlling Type I Error Rates in Assessing DIF for Logistic Regression Method Combined with SIBTEST Regression Correction Procedure and DIF-Free-Then-DIF Strategy

ERIC Educational Resources Information Center

Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung

2014-01-01

The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…

Validity and Reliability of the 8-Item Work Limitations Questionnaire.

PubMed

Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

2017-12-01

Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.
Development and Validation of the Chinese Attitudes to Starting Insulin Questionnaire (Ch-ASIQ) for Primary Care Patients with Type 2 Diabetes

PubMed Central

Fu, Sau Nga; Chin, Weng Yee; Wong, Carlos King Ho; Yeung, Vincent Tok Fai; Yiu, Ming Pong; Tsui, Hoi Yee; Chan, Ka Hung

2013-01-01

Objectives To develop and evaluate the psychometric properties of a Chinese questionnaire which assesses the barriers and enablers to commencing insulin in primary care patients with poorly controlled Type 2 diabetes. Research Design and Method Questionnaire items were identified using literature review. Content validation was performed and items were further refined using an expert panel. Following translation, back translation and cognitive debriefing, the translated Chinese questionnaire was piloted on target patients. Exploratory factor analysis and item-scale correlations were performed to test the construct validity of the subscales and items. Internal reliability was tested by Cronbach’s alpha. Results Twenty-seven identified items underwent content validation, translation and cognitive debriefing. The translated questionnaire was piloted on 303 insulin naïve (never taken insulin) Type 2 diabetes patients recruited from 10 government-funded primary care clinics across Hong Kong. Sufficient variability in the dataset for factor analysis was confirmed by Bartlett’s Test of Sphericity (P<0.001). Using exploratory factor analysis with varimax rotation, 10 factors were generated onto which 26 items loaded with loading scores > 0.4 and Eigenvalues >1. Total variance for the 10 factors was 66.22%. Kaiser-Meyer-Olkin measure was 0.725. Cronbach’s alpha coefficients for the first four factors were ≥0.6 identifying four sub-scales to which 13 items correlated. Remaining sub-scales and items with poor internal reliability were deleted. The final 13-item instrument had a four scale structure addressing: ‘Self-image and stigmatization’; ‘Factors promoting self-efficacy; ‘Fear of pain or needles’; and ‘Time and family support’. Conclusion The Chinese Attitudes to Starting Insulin Questionnaire (Ch-ASIQ) appears to be a reliable and valid measure for assessing barriers to starting insulin. This short instrument is easy to administer and may be used by healthcare providers and researchers as an assessment tool for Chinese diabetic primary care patients, including the elderly, who are unwilling to start insulin. PMID:24236071
Testing manifest monotonicity using order-constrained statistical inference.

PubMed

Tijmstra, Jesper; Hessen, David J; van der Heijden, Peter G M; Sijtsma, Klaas

2013-01-01

Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning.

PubMed

Kim, Kyong-Jee; Hwang, Jee-Young

2016-03-01

Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students' experience with ubiquitous testing and its impact on student learning. A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students' experiences of ubiquitous testing. The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings.
Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

PubMed Central

Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

2014-01-01

Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
The Disaggregation of Value-Added Test Scores to Assess Learning Outcomes in Economics Courses

ERIC Educational Resources Information Center

Walstad, William B.; Wagner, Jamie

2016-01-01

This study disaggregates posttest, pretest, and value-added or difference scores in economics into four types of economic learning: positive, retained, negative, and zero. The types are derived from patterns of student responses to individual items on a multiple-choice test. The micro and macro data from the "Test of Understanding in College…
Reliability of the Melbourne assessment of unilateral upper limb function.

PubMed

Randall, M; Carlin, J B; Chondros, P; Reddihough, D

2001-11-01

This study examines the reliability of the Melbourne Assessment of Unilateral Upper Limb Function: a quantitative test of quality of movement in children with neurological impairment. The assessment was administered to 20 children aged from 5 to 16 years (mean age 9 years 10 months, SD 2 years 10 months) who had various types and degrees of cerebral palsy (CP). The performances of the 20 children during assessment were videotaped for subsequent scoring by 15 occupational therapists. Scores were analyzed for internal consistency of test items, inter- and intrarater reliability of scorings of the same videotapes, and test-retest reliability using repeat videotaping. Results revealed very high internal consistency of test items (alpha=0.96), moderate to high agreement both within and between raters for all test items (intraclass correlations of at least 0.7) apart from item 16 (hand to mouth and down), and high interrater reliability (0.95) and intrarater reliability (0.97) for total test scores. Test-retest results revealed moderate to high intrarater reliability for item totals (mean of 0.83 and 0.79) for each rater and high reliability for test totals (0.98 and 0.97). These findings indicate that the Melbourne Assessment of Unilateral Upper Limb Function is a reliable tool for measuring the quality of unilateral upper-limb movement in children with CP.
Reliability and known-group validity of the Arabic version of the 8-item Morisky Medication Adherence Scale among type 2 diabetes mellitus patients.

PubMed

Ashur, S T; Shamsuddin, K; Shah, S A; Bosseri, S; Morisky, D E

2015-12-13

No validation study has previously been made for the Arabic version of the 8-item Morisky Medication Adherence Scale (MMAS-8(©)) as a measure for medication adherence in diabetes. This study in 2013 tested the reliability and validity of the Arabic MMAS-8 for type 2 diabetes mellitus patients attending a referral centre in Tripoli, Libya. A convenience sample of 103 patients self-completed the questionnaire. Reliability was tested using Cronbach alpha, average inter-item correlation and Spearman-Brown coefficient. Known-group validity was tested by comparing MMAS-8 scores of patients grouped by glycaemic control. The Arabic version showed adequate internal consistency (α = 0.70) and moderate split-half reliability (r = 0.65). Known-group validity was supported as a significant association was found between medication adherence and glycaemic control, with a moderate effect size (ϕc = 0.34). The Arabic version displayed good psychometric properties and could support diabetes research and practice in Arab countries.
Combining computer adaptive testing technology with cognitively diagnostic assessment.

PubMed

McGlohen, Meghan; Chang, Hua-Hua

2008-08-01

A major advantage of computerized adaptive testing (CAT) is that it allows the test to home in on an examinee's ability level in an interactive manner. The aim of the new area of cognitive diagnosis is to provide information about specific content areas in which an examinee needs help. The goal of this study was to combine the benefit of specific feedback from cognitively diagnostic assessment with the advantages of CAT. In this study, three approaches to combining these were investigated: (1) item selection based on the traditional ability level estimate (theta), (2) item selection based on the attribute mastery feedback provided by cognitively diagnostic assessment (alpha), and (3) item selection based on both the traditional ability level estimate (theta) and the attribute mastery feedback provided by cognitively diagnostic assessment (alpha). The results from these three approaches were compared for theta estimation accuracy, attribute mastery estimation accuracy, and item exposure control. The theta- and alpha-based condition outperformed the alpha-based condition regarding theta estimation, attribute mastery pattern estimation, and item exposure control. Both the theta-based condition and the theta- and alpha-based condition performed similarly with regard to theta estimation, attribute mastery estimation, and item exposure control, but the theta- and alpha-based condition has an additional advantage in that it uses the shadow test method, which allows the administrator to incorporate additional constraints in the item selection process, such as content balancing, item type constraints, and so forth, and also to select items on the basis of both the current theta and alpha estimates, which can be built on top of existing 3PL testing programs.
Effects of Test Level Discrimination and Difficulty on Answer-Copying Indices

ERIC Educational Resources Information Center

Sunbul, Onder; Yormaz, Seha

2018-01-01

In this study Type I Error and the power rates of omega (?) and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable…
The Effects of Repetition and Time of Post-Test Administration on EFL Learners' Form Recall of Single Words and Collocations

ERIC Educational Resources Information Center

Peters, Elke

2014-01-01

This article examines how form recall of target lexical items by learners of English as a foreign language (EFL) is affected (1) by repetition (1, 3 or 5 number of occurrences), (2) by the type of target item (single words versus collocations), and (3) by the time of post-test administration (immediately or one week after the learning session).…
Developing self-concept instrument for pre-service mathematics teachers

NASA Astrophysics Data System (ADS)

Afgani, M. W.; Suryadi, D.; Dahlan, J. A.

2018-01-01

This study aimed to develop self-concept instrument for undergraduate students of mathematics education in Palembang, Indonesia. Type of this study was development research of non-test instrument in questionnaire form. A Validity test of the instrument was performed with construct validity test by using Pearson product moment and factor analysis, while reliability test used Cronbach’s alpha. The instrument was tested by 65 undergraduate students of mathematics education in one of the universities at Palembang, Indonesia. The instrument consisted of 43 items with 7 aspects of self-concept, that were the individual concern, social identity, individual personality, view of the future, the influence of others who become role models, the influence of the environment inside or outside the classroom, and view of the mathematics. The result of validity test showed there was one invalid item because the value of Pearson’s r was 0.107 less than the critical value (0.244; α = 0.05). The item was included in social identity aspect. After the invalid item was removed, Construct validity test with factor analysis generated only one factor. The Kaiser-Meyer-Olkin (KMO) coefficient was 0.846 and reliability coefficient was 0.91. From that result, we concluded that the self-concept instrument for undergraduate students of mathematics education in Palembang, Indonesia was valid and reliable with 42 items.
TOP 08-2-111B Chemical, Biological, and Radiological (CBR) Contamination Survivability, Small Items of Equipment

DTIC Science & Technology

2016-03-16

e.g., mud, grease, and other). j. Pretest (baseline) and posttest (30 days after the first contamination and/or other defined long-term time...item surface condition (pretest and posttest ), materials of construction, paint type, and surface cleanliness (e.g., mud, grease, decontamination...penetrate. h. Pretest and posttest ME functional performance characteristics used as the measure of the test item’s mission performance before
Psychometric properties of the Arabic version of the 12-item diabetes fatalism scale

PubMed Central

Abi Kharma, Joelle

2018-01-01

Background There are widespread fatalistic beliefs in Arab countries, especially among individuals with diabetes. However, there is no tool to assess diabetes fatalism in this population. This study describes the processes used to create an Arabic version of the Diabetes Fatalism Scale (DFS) and examine its psychometric properties. Methods A descriptive correlational design was used with a convenience sample of Lebanese adults (N = 274) with type 2 diabetes recruited from a major hospital in Beirut, Lebanon and by snowball sampling. The 12- item Diabetes Fatalism Scale- Arabic (12-item DFS-Ar) was back-translated from the original version, pilot tested on 22 adults with type 2 diabetes and then administered to 274 patients to assess the validity and reliability of the scale. Confirmatory factor analysis (CFA) was used to test the hypothesized factor structure. Cronbach’s alpha was used to test for reliability. Results CFA supported the existence of the three factor hypothesis of the original DFS scale. The five items measuring “emotional distress” loaded under Factor 1, the four items measuring “spiritual coping” loaded under factor 2 and the last three items measuring “perceived self-efficacy” of the original scale loaded under Factor 3 (p <0.001 for all three subscales). Goodness of fit indices confirmed adequateness of the CFA model (CFI = 0.97, TLI = 0.96, RMSEA = 0.067 and pclose = 0.05). The 12-item DFS-Ar showed good reliability (Cronbach’s alpha of 0.86) and significantly predicted HbA1c (β = 0.20, p < 0.01). After adjusting for the demographic characteristics and the number of diabetes comorbid conditions, the 12-item DFS-Ar score was independently associated with HbA1c in a multivariable model (β = 0.16, p < 0.05). Conclusions The 12-item DFS-Ar demonstrated good psychometric properties that are comparable to the original scale. It is a valid and reliable measure of diabetes fatalism. Further testing with larger and non-Lebanese Arabic population is needed. PMID:29324827
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning

PubMed Central

Kim, Kyong-Jee; Hwang, Jee-Young

2016-01-01

Purpose: Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students’ experience with ubiquitous testing and its impact on student learning. Methods: A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students’ experiences of ubiquitous testing. Results: The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Conclusion: Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings. PMID:26838569
Validity and reliability of a Malay version of the brief illness perception questionnaire for patients with type 2 diabetes mellitus.

PubMed

Chew, Boon-How; Vos, Rimke C; Heijmans, Monique; Shariff-Ghazali, Sazlina; Fernandez, Aaron; Rutten, Guy E H M

2017-08-03

Illness perceptions involve the personal beliefs that patients have about their illness and may influence health behaviours considerably. Since an instrument to measure these perceptions for Malay population in Malaysia is lacking, we translated and examined the psychometric properties of the Malay version of the Brief Illness Perception Questionnaire (MBIPQ) in adult patients with type 2 diabetes mellitus. The MBIPQ has nine items, all use a 0-10 response scale, except the ninth item about causal factors, which is an open-ended item. A standard procedure was used to translate and adapt the English BIPQ into Malay language. Construct validity was examined comparing item scores and scores on the Diabetes Management Self-Efficacy Scale, the Morisky Medication Adherence Scale, the World Health Organization Quality of Life-brief, the 9-item Patient Health Questionnaire, the 17-item Diabetes Distress Scale, HbA1c and the presence of complications. In addition, 2-week and 4-week test-retest reliability were studied. A total of 312 patients completed the MBIPQ. Out of this, 97 and 215 patients completed the 2- or 4-weeks test-retest reliability questionnaire, respectively. Moderate inter-items correlations were observed between illness perception dimensions (r = -0.31 to 0.53). MBIPQ items showed the expected correlations with self-efficacy (r = 0.35), medication adherence (r = 0.29), quality of life (r = -0.17 to 0.31) and depressive symptoms (r = -0.18 to 0.21). People with severe diabetes-related distress also were more concern (t-test = 4.01, p < 0.001) and experienced lower personal control (t-test = 2.07, p = 0.031). People with any diabetes-related complication perceived the consequences as more serious (t-test = 2.04, p = 0.044). The 2-week and 4-week test-retest reliabilities varied between ICC agreement 0.39 to 0.70 and 0.58 to 0.78, respectively. The psychometric properties of items in the MBIPQ are moderate. The MBIPQ showed good cross-cultural validity and moderate construct validity. Test-retest reliability was moderate. Despite the moderate psychometric properties, the MBIPQ may be useful in clinical practice as it is a useful instrument to elicit and communicate on patient's personal thoughts and feelings. Future research is needed to establish its responsiveness and predictive validity. ClinicalTrials.gov NCT02730754 registered on March 29, 2016; NCT02730078 registered on March 29, 2016.
Development of new selection tests for air traffic controllers.

DOT National Transportation Integrated Search

1977-12-01

This report describes the development of a new Multiplex Controller Aptitude Test for initial screening of FAA Air Traffic Controller applicants. Its content includes the traditional types of aptitude test items used for today's screening. In additio...
Force, velocity, and work: The effects of different contexts on students' understanding of vector concepts using isomorphic problems

NASA Astrophysics Data System (ADS)

Barniol, Pablo; Zavala, Genaro

2014-12-01

In this article we compare students' understanding of vector concepts in problems with no physical context, and with three mechanics contexts: force, velocity, and work. Based on our "Test of Understanding of Vectors," a multiple-choice test presented elsewhere, we designed two isomorphic shorter versions of 12 items each: a test with no physical context, and a test with mechanics contexts. For this study, we administered the items twice to students who were finishing an introductory mechanics course at a large private university in Mexico. The first time, we administered the two 12-item tests to 608 students. In the second, we only tested the items for which we had found differences in students' performances that were difficult to explain, and in this case, we asked them to show their reasoning in written form. In the first administration, we detected no significant difference between the medians obtained in the tests; however, we did identify significant differences in some of the items. For each item we analyze the type of difference found between the tests in the selection of the correct answer, the most common error on each of the tests, and the differences in the selection of incorrect answers. We also investigate the causes of the different context effects. Based on these analyses, we establish specific recommendations for the instruction of vector concepts in an introductory mechanics course. In the Supplemental Material we include both tests for other researchers studying vector learning, and for physics teachers who teach this material.
Evaluation of measurement equivalence of the Family Satisfaction with the End-of-Life Care in an ethnically diverse cohort: Tests of differential item functioning

PubMed Central

Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert

2016-01-01

Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education. PMID:25160692
The role of attention in item-item binding in visual working memory.

PubMed

Peterson, Dwight J; Naveh-Benjamin, Moshe

2017-09-01

An important yet unresolved question regarding visual working memory (VWM) relates to whether or not binding processes within VWM require additional attentional resources compared with processing solely the individual components comprising these bindings. Previous findings indicate that binding of surface features (e.g., colored shapes) within VWM is not demanding of resources beyond what is required for single features. However, it is possible that other types of binding, such as the binding of complex, distinct items (e.g., faces and scenes), in VWM may require additional resources. In 3 experiments, we examined VWM item-item binding performance under no load, articulatory suppression, and backward counting using a modified change detection task. Binding performance declined to a greater extent than single-item performance under higher compared with lower levels of concurrent load. The findings from each of these experiments indicate that processing item-item bindings within VWM requires a greater amount of attentional resources compared with single items. These findings also highlight an important distinction between the role of attention in item-item binding within VWM and previous studies of long-term memory (LTM) where declines in single-item and binding test performance are similar under divided attention. The current findings provide novel evidence that the specific type of binding is an important determining factor regarding whether or not VWM binding processes require attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

Repetition Blindness for Rotated Objects

ERIC Educational Resources Information Center

Hayward, William G.; Zhou, Guomei; Man, Wai-Fung; Harris, Irina M.

2010-01-01

Repetition blindness (RB) is the finding that observers often miss the repetition of an item within a rapid stream of words or objects. Recent studies have shown that RB for objects is largely unaffected by variations in viewpoint between the repeated items. In 5 experiments, we tested RB under different axes of rotation, with different types of…
Measuring Global Physical Health in Children with Cerebral Palsy: Illustration of a Multidimensional Bi-factor Model and Computerized Adaptive Testing

PubMed Central

Haley, Stephen M.; Ni, Pengsheng; Dumas, Helene M.; Fragala-Pinkham, Maria A.; Hambleton, Ronald K.; Montpetit, Kathleen; Bilodeau, Nathalie; Gorton, George E.; Watson, Kyle; Tucker, Carole A

2009-01-01

Purpose The purpose of this study was to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). Methods Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. Results Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. Conclusions The bi-factor MIRT CAT application, especially the 10- and 15-item version, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. PMID:19221892
The effectiveness of an intensive care quick reference checklist manual--a randomized simulation-based trial.

PubMed

Just, Katja S; Hubrich, Svenja; Schmidtke, Daniel; Scheifes, Andrea; Gerbershagen, Mark U; Wappler, Frank; Grensemann, Joern

2015-04-01

We aimed to test the effectiveness of checklists for emergency procedures on medical staff performance in intensive care crises. This is a prospective single-center randomized trial in a high-fidelity simulation center modeling an intensive care unit (ICU) in a tertiary care hospital in Germany. Teams consisted of 1 ICU resident and 2 ICU nurses (in total, n = 48). All completed 4 crisis scenarios, in which they were randomized to use checklists or to perform without any aid. In 2 of the scenarios, checklists could be used immediately (type 1 scenarios); and for the remaining, some further steps, for example, confirming diagnosis, were required first (type 2 scenarios). Outcome measurements were number of predefined items and time to completion of more than 50% and more than 75% of steps, respectively. When using checklists, participants initiated items faster and more completely according to appropriate treatment guidelines (9 vs 7 items with and without checklists, P < .05). Benefit of checklists was better in type 2 scenarios than in type 1 scenarios (2 vs 1 additional item, P < .05). In type 2 scenarios, time to complete 50% and 75% of items was faster with the use of checklists (P < .005). Use of checklists in ICU crises has a benefit on the completion of critical treatment steps. Within the type 2 scenarios, items were fulfilled faster with checklists. The implementation of checklists for intensive care crises is a promising approach that may improve patients' care. Copyright © 2014 Elsevier Inc. All rights reserved.
Improved Classification of Mammograms Following Idealized Training

PubMed Central

Hornsby, Adam N.; Love, Bradley C.

2014-01-01

People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making. PMID:24955325
Improved Classification of Mammograms Following Idealized Training.

PubMed

Hornsby, Adam N; Love, Bradley C

2014-06-01

People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making.
Release from output interference in recognition memory: A test of the attention hypothesis.

PubMed

Criss, Amy H; Salomão, Cristina; Malmberg, Kenneth J; Aue, William; Kılıç, Aslı; Claridge, MarkAvery

2018-05-01

Retrieval results in both costs and benefits to episodic memory. Output interference (OI) refers to the finding that episodic memory accuracy decreases with increasing test trials. Release from OI is the restoration of original accuracy at some point during the test. For example, a release from OI in recognition memory testing occurs when the semantic similarity between stimuli decreases midway through testing, suggesting that item representations stored on early trials cause interference on tests occurring on later trials to the extent that the earlier items share features with the latter items. In two recognition memory experiments, we demonstrate release from OI for words and faces. We also test whether release from OI is the result of interference or is due to a boost in attention caused by reorienting to a novel stimulus type. A test for the foils presented during the initial test list supports the interference account of OI. Implications for models of memory are discussed.
Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

PubMed

Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

2014-05-01

The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
Development and Testing of the Church Environment Audit Tool.

PubMed

Kaczynski, Andrew T; Jake-Schoffman, Danielle E; Peters, Nathan A; Dunn, Caroline G; Wilcox, Sara; Forthofer, Melinda

2018-05-01

In this paper, we describe development and reliability testing of a novel tool to evaluate the physical environment of faith-based settings pertaining to opportunities for physical activity (PA) and healthy eating (HE). Tool development was a multistage process including a review of similar tools, stakeholder review, expert feedback, and pilot testing. Final tool sections included indoor opportunities for PA, outdoor opportunities for PA, food preparation equipment, kitchen type, food for purchase, beverages for purchase, and media. Two independent audits were completed at 54 churches. Interrater reliability (IRR) was determined with Kappa and percent agreement. Of 218 items, 102 were assessed for IRR and 116 could not be assessed because they were not present at enough churches. Percent agreement for all 102 items was over 80%. For 42 items, the sample was too homogeneous to assess Kappa. Forty-six of the remaining items had Kappas greater than 0.60 (25 items 0.80-1.00; 21 items 0.60-0.79), indicating substantial to almost perfect agreement. The tool proved reliable and efficient for assessing church environments and identifying potential intervention points. Future work can focus on applications within faith-based partnerships to understand how church environments influence diverse health outcomes.
Storage and retrieval properties of dual codes for pictures and words in recognition memory.

PubMed

Snodgrass, J G; McClure, P

1975-09-01

Storage and retrieval properties of pictures and words were studied within a recognition memory paradigm. Storage was manipulated by instructing subjects either to image or to verbalize to both picture and word stimuli during the study sequence. Retrieval was manipulated by representing a proportion of the old picture and word items in their opposite form during the recognition test (i.e., some old pictures were tested with their corresponding words and vice versa). Recognition performance for pictures was identical under the two instructional conditions, whereas recognition performance for words was markedly superior under the imagery instruction condition. It was suggested that subjects may engage in dual coding of simple pictures naturally, regardless of instructions, whereas dual coding of words may occur only under imagery instructions. The form of the test item had no effect on recognition performance for either type of stimulus and under either instructional condition. However, change of form of the test item markedly reduced item-by-item correlations between the two instructional conditions. It is tentatively proposed that retrieval is required in recognition, but that the effect of a form change is simply to make the retrieval process less consistent, not less efficient.
A Comparison of the Rasch Separate Calibration and Between-Fit Methods of Detecting Item Bias.

ERIC Educational Resources Information Center

Smith, Richard M.

1996-01-01

The separate calibration t-test approach of B. Wright and M. Stone (1979) and the common calibration between-fit approach of B. Wright, R. Mead, and R. Draba (1976) appeared to have similar Type I error rates and similar power to detect item bias within a Rasch framework. (SLD)
The Complete Automation of the Minnesota Multiphasic Personality Inventory and a Study of its Response Latency.

ERIC Educational Resources Information Center

Dunn, Thomas G.; And Others

The feasibility of completely automating the Minnesota Multiphasic Personality Inventory (MMPI) was tested, and item response latencies were compared with other MMPI item characteristics. A total of 26 scales were successfully scored automatically for 165 subjects. The program also typed a Mayo Clinic interpretive report on a computer terminal,…
Modeling Skipped and Not-Reached Items Using IRTrees

ERIC Educational Resources Information Center

Debeer, Dries; Janssen, Rianne; De Boeck, Paul

2017-01-01

When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…
41 CFR 101-27.204 - Types of shelf-life items.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 41 Public Contracts and Property Management 2 2010-07-01 2010-07-01 true Types of shelf-life items...-Management of Shelf-Life Materials § 101-27.204 Types of shelf-life items. Shelf-life items are classified as nonextendable (Type I) and extendable (Type II). Type I items have a definite storage life after which the item...
41 CFR 101-27.204 - Types of shelf-life items.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 41 Public Contracts and Property Management 2 2011-07-01 2007-07-01 true Types of shelf-life items...-Management of Shelf-Life Materials § 101-27.204 Types of shelf-life items. Shelf-life items are classified as nonextendable (Type I) and extendable (Type II). Type I items have a definite storage life after which the item...
41 CFR 101-27.204 - Types of shelf-life items.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 41 Public Contracts and Property Management 2 2014-07-01 2012-07-01 true Types of shelf-life items...-Management of Shelf-Life Materials § 101-27.204 Types of shelf-life items. Shelf-life items are classified as nonextendable (Type I) and extendable (Type II). Type I items have a definite storage life after which the item...
41 CFR 101-27.204 - Types of shelf-life items.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 41 Public Contracts and Property Management 2 2013-07-01 2012-07-01 true Types of shelf-life items...-Management of Shelf-Life Materials § 101-27.204 Types of shelf-life items. Shelf-life items are classified as nonextendable (Type I) and extendable (Type II). Type I items have a definite storage life after which the item...
Curriculum Type as a Differentiating Factor in Medical Licensing Examinations.

ERIC Educational Resources Information Center

Shen, Linjun

This study assessed the effects of the type of medical curriculum on differential item functioning (DIF) and group differences at the test level in Level 1 of the Comprehensive Osteopathic Medical Licensing Examinations (COMLEX). The study also explored the relationship of the DIF and group differences at the test level. There are generally two…
ERP Subsequent Memory Effects Differ between Inter-Item and Unitization Encoding Tasks

PubMed Central

Kamp, Siri-Maria; Bader, Regine; Mecklinger, Axel

2017-01-01

The “subsequent memory paradigm” is an analysis tool to identify brain activity elicited during episodic encoding that is associated with successful subsequent retrieval. Two commonly observed event-related potential “subsequent memory effects” (SMEs) are the parietal SME in the P300 time window and the frontal slow wave SME, but to date a clear characterization of the circumstances under which each SME is observed is missing. To test the hypothesis that the parietal SME occurs when aspects of an experience are unitized into a single item representation, while inter-item associative encoding is reflected in the frontal slow wave effect, participants were assigned to one of two conditions that emphasized one of the encoding types under otherwise matched study phases of a recognition memory experiment. Word pairs were presented either in the context of a definition that allowed to combine the word pairs into a new concept (unitization or item encoding) or together with a sentence frame (inter-item encoding). Performance on the recognition test did not differ between the groups. The parietal SME was only found in the definition group, supporting the idea that this SME occurs when the components of an association are integrated in a unitized item representation. An early prefrontal negativity also exhibited an SME only in this group, suggesting that the formation of novel units occurs through interactions of multiple brain areas. The frontal slow wave SME was pronounced in both groups and may thus reflect processes generally involved in encoding of associations. Our results provide evidence for a partial dissociation of the eliciting conditions of the two types of SMEs and therefore provide a tool for future studies to characterize the different types of episodic encoding. PMID:28194105
ERP Subsequent Memory Effects Differ between Inter-Item and Unitization Encoding Tasks.

PubMed

Kamp, Siri-Maria; Bader, Regine; Mecklinger, Axel

2017-01-01

The "subsequent memory paradigm" is an analysis tool to identify brain activity elicited during episodic encoding that is associated with successful subsequent retrieval. Two commonly observed event-related potential "subsequent memory effects" (SMEs) are the parietal SME in the P300 time window and the frontal slow wave SME, but to date a clear characterization of the circumstances under which each SME is observed is missing. To test the hypothesis that the parietal SME occurs when aspects of an experience are unitized into a single item representation, while inter-item associative encoding is reflected in the frontal slow wave effect, participants were assigned to one of two conditions that emphasized one of the encoding types under otherwise matched study phases of a recognition memory experiment. Word pairs were presented either in the context of a definition that allowed to combine the word pairs into a new concept (unitization or item encoding) or together with a sentence frame (inter-item encoding). Performance on the recognition test did not differ between the groups. The parietal SME was only found in the definition group, supporting the idea that this SME occurs when the components of an association are integrated in a unitized item representation. An early prefrontal negativity also exhibited an SME only in this group, suggesting that the formation of novel units occurs through interactions of multiple brain areas. The frontal slow wave SME was pronounced in both groups and may thus reflect processes generally involved in encoding of associations. Our results provide evidence for a partial dissociation of the eliciting conditions of the two types of SMEs and therefore provide a tool for future studies to characterize the different types of episodic encoding.
Mere Gifting: Liking a Gift More Because It Is Shared.

PubMed

Polman, Evan; Maglio, Sam J

2017-11-01

We investigated a type of mere similarity that describes owning the same item as someone else. Moreover, we examined this mere similarity in a gift-giving context, whereby givers gift something that they also buy for themselves (a behavior we call "companionizing"). Using a Heiderian account of balancing unit-sentiment relations, we tested whether gift recipients like gifts more when gifts are companionized. Akin to mere ownership, which describes people liking their possessions more merely because they own them, we tested a complementary prediction: whether people like their possessions more merely because others own them too. Thus, in a departure from previous work, we examined a type of similarity based on two people sharing the same material item. We find that this type of sharing causes gift recipients to like their gifts more, and feel closer to gift givers.

Tests for Adult Basic Education Teachers. "28 Suggestions for Classroom Teachers".

ERIC Educational Resources Information Center

Vonderhaar, Kathleen; And Others

An updated and improved listing of test and measurement items useful in Adult Basic Education Classrooms is provided. Diagnostic, placement, achievement, and group and individual intelligence tests are reviewed. Information on test type and purpose, appropriate grade level, test time, number of forms, the manual, scoring, and format is included.…
Evaluation of the Multiple Sclerosis Walking Scale-12 (MSWS-12) in a Dutch sample: Application of item response theory.

PubMed

Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj

2016-12-01

The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
PEMFC for aeronautic applications: A review on the durability aspects

NASA Astrophysics Data System (ADS)

Dyantyi, Noluntu; Parsons, Adrian; Sita, Cordellia; Pasupathi, Sivakumar

2017-11-01

Proton exchange membrane fuel cells (PEMFC) not only offer more efficient electrical energy conversion, relative to on-ground/backup turbines but generate by-products useful in aircraft such as heat for ice prevention, deoxygenated air for fire retardation and drinkable water for use on-board. Consequently, several projects (e.g. DLR-H2 Antares and RAPID2000) have successfully tested PEMFC-powered auxiliary unit (APU) for manned/unmanned aircraft. Despite the progress from flying PEMFC-powered small aircraft with 20 kW power output as high as 1 000 m at 100 km/h to 33 kW at 2 558 m, 176 km/h [1, 2, 3], durability and reliability remain key challenges. This review reports on the inadequate understanding of behaviour of PEMFC under aeronautic conditions and the lack of predictive methods conducive for aircraft that provide real-time information on the State of Health of PEMFCs. -To minimize performance loss due to high altitude and inclination by adjusting cathode stoichiometric ratio. -To improve quality of oxygen-depleted air by controlling operating temperature and stoichiometric ratio. -Need to devise real time prediction methods conducive for determining PEMFC SoH in aircraft.
Why are Optimists Optimistic?

PubMed Central

Sohl, Stephanie J.; Moyer, Anne; Lukin, Konstantin; Knapp-Oliver, Sarah K.

2012-01-01

This study examined what is brought to mind when responding to the items comprising a measure of dispositional optimism. Participants (N = 113) completed the Life Orientation Test and the COPE, a measure of coping style, and described why they responded the way they did to the items assessing optimism. Participants’ explanations comprised eight types of reasoning: (1) faith in a higher power; (2) belief in fate or a just world; (3) personal fortune; (4) belief in the role of one’s own ability; (5) reliance on idioms; (6) beliefs about the usefulness of thinking optimistically; (7) matter-of-fact statements; and (8) a feeling, intuition, or hope. These types were also related to coping styles. Responses to positively-worded items were explained with respect to external forces and responses to negatively-worded items were explained with respect to internal forces. Understanding how people explain their optimism may be the first step in fostering this outlook. PMID:23239937
Monkey Visual Short-Term Memory Directly Compared to Humans

PubMed Central

Elmore, L. Caitlin; Wright, Anthony A.

2015-01-01

Two adult rhesus monkeys were trained to detect which item in an array of memory items had changed using the same stimuli, viewing times, and delays as used with humans. Although the monkeys were extensively trained, they were less accurate than humans with the same array sizes (2, 4, & 6 items), with both stimulus types (colored squares, clip art), and showed calculated memory capacities of about one item (or less). Nevertheless, the memory results from both monkeys and humans for both stimulus types were well characterized by the inverse power-law of display size. This characterization provides a simple and straightforward summary of a fundamental process of visual short-term memory (how VSTM declines with memory load) that emphasizes species similarities based upon similar functional relationships. By more closely matching of monkey testing parameters to those of humans, the similar functional relationships strengthen the evidence suggesting similar processes underlying monkey and human VSTM. PMID:25706544
Old and New Ideas for Data Screening and Assumption Testing for Exploratory and Confirmatory Factor Analysis

PubMed Central

Flora, David B.; LaBrish, Cathy; Chalmers, R. Philip

2011-01-01

We provide a basic review of the data screening and assumption testing issues relevant to exploratory and confirmatory factor analysis along with practical advice for conducting analyses that are sensitive to these concerns. Historically, factor analysis was developed for explaining the relationships among many continuous test scores, which led to the expression of the common factor model as a multivariate linear regression model with observed, continuous variables serving as dependent variables, and unobserved factors as the independent, explanatory variables. Thus, we begin our paper with a review of the assumptions for the common factor model and data screening issues as they pertain to the factor analysis of continuous observed variables. In particular, we describe how principles from regression diagnostics also apply to factor analysis. Next, because modern applications of factor analysis frequently involve the analysis of the individual items from a single test or questionnaire, an important focus of this paper is the factor analysis of items. Although the traditional linear factor model is well-suited to the analysis of continuously distributed variables, commonly used item types, including Likert-type items, almost always produce dichotomous or ordered categorical variables. We describe how relationships among such items are often not well described by product-moment correlations, which has clear ramifications for the traditional linear factor analysis. An alternative, non-linear factor analysis using polychoric correlations has become more readily available to applied researchers and thus more popular. Consequently, we also review the assumptions and data-screening issues involved in this method. Throughout the paper, we demonstrate these procedures using an historic data set of nine cognitive ability variables. PMID:22403561
The development of a computer assisted instruction and assessment system in pharmacology.

PubMed

Madsen, B W; Bell, R C

1977-01-01

We describe the construction of a computer based system for instruction and assessment in pharmacology, utilizing a large bank of multiple choice questions. Items were collected from many sources, edited and coded for student suitability, topic, taxonomy and difficulty and text references. Students reserve a time during the day, specify the type of test desired and questions are presented randomly from the subset satisfying their criteria. Answers are scored after each question and a summary given at the end of every test; details on item performance are recorded automatically. The biggest hurdle in implementation was the assembly, review, classification and editing of items, while the programming was relatively straight-forward. A number of modifications had to be made to the initial plans and changes will undoubtedly continue with further experience. When fully operational the system will possess a number of advantages including: elimination of test preparation, editing and marking; facilitated item review opportunities; increased objectivity, feedback, flexibility and descreased anxiety in students.
An Inexpensive System for Producing Examinations with Minimal Typing and Proofreading.

ERIC Educational Resources Information Center

Mershon, Donald H.

1982-01-01

Describes a method for increasing efficiency of examination production which uses file cards to store and organize test items. The process of reproducing tests directly from master copies made with file cards is discussed. (AM)
Evaluation of Two Types of Differential Item Functioning in Factor Mixture Models with Binary Outcomes

ERIC Educational Resources Information Center

Lee, HwaYoung; Beretvas, S. Natasha

2014-01-01

Conventional differential item functioning (DIF) detection methods (e.g., the Mantel-Haenszel test) can be used to detect DIF only across observed groups, such as gender or ethnicity. However, research has found that DIF is not typically fully explained by an observed variable. True sources of DIF may include unobserved, latent variables, such as…
Universal Documentation System

DTIC Science & Technology

2012-07-01

Follow preparation instructions in Section 5.2.1 for the entries ITEM NO. and REMARKS. FLOTATION DURATION: Enter flotation duration of the test unit...Teflon, carbon steel, copper and copper alloys, and stainless steel (martensitic, ferritic, austenitic). • Quantity: Enter the quantity of components...DESCRIPTION 1470 ITEM NO.: FLOTATION DURATION: ELECTRONIC AIDS • TYPE: • POWER OUT (WATTS): • FREQUENCY (MHz): C-22 • MODULATION
Relationship of college student characteristics and inquiry-based geometrical optics instruction to knowledge of image formation with light-ray tracing

NASA Astrophysics Data System (ADS)

Isik, Hakan

This study is premised on the fact that student conceptions of optics appear to be unrelated to student characteristics of gender, age, years since high school graduation, or previous academic experiences. This study investigated the relationships between student characteristics and student performance on image formation test items and the changes in student conceptions of optics after an introductory inquiry-based physics course. Data was collected from 39 college students who were involved in an inquiry-based physics course teaching topics of geometrical optics. Student data concerning characteristics and previous experiences with optics and mathematics were collected. Assessment of student understanding of optics knowledge for pinholes, plane mirrors, refraction, and convex lenses was collected with, the Test of Image Formation with Light-Ray Tracing instrument. Total scale and subscale scores representing the optics instrument content were derived from student pretest and posttest responses. The types of knowledge, needed to answer each optics item correctly, were categorized as situational, conceptual, procedural, and strategic knowledge. These types of knowledge were associated with student correct and incorrect responses to each item to explain the existences and changes in student scientific and naive conceptions. Correlation and stepwise multiple regression analyses were conducted to identify the student characteristics and academic experiences that significantly predicted scores on the subscales of the test. The results showed that student experience with calculus was a significant predictor of student performance on the total scale as well as on the refraction subscale of the Test of Image Formation with Light-Ray Tracing. A combination of student age and previous academic experience with precalculus was a significant predictor of student performance on the pretest pinhole subscale. Student characteristic of years since high school graduation significantly predicted the gain in student scores on pinhole and plane-mirror items from the pretest to the posttest with those students who were most recent graduates from high school doing better. Multivariate and univariate analyses of variance of the Test of Image Formation with Light-Ray Tracing pinhole scale and individual item changes from the pretest to the posttest resulted in statistically significant mean differences between total scores as well as between various individual pinhole items. There were no significant changes for individual plane-mirror items from pretest to posttest. Results revealed that there is a perceivable relationship between student optics-content knowledge and the types of knowledge required by items. At the pretest, the greatest selection of wrong responses related to the items requiring situational type of knowledge and the fewest selection of wrong responses was relate to the items requiring procedural type of knowledge. Student selection of wrong options for each item revealed the following naive optics conceptions: pinholes do not create reversed images (pretest), size and sharpness of pinhole images are related to the focus of a pinhole camera (pretest and posttest); propagation of light rays are interpreted as being radial rather than directional (pretest and posttest); no conception of image formation and observation for parallel mirrors (pretest and posttest), the place of an image depends on the position of the observer (pretest and posttest), a plane mirror reflects the images of the objects placed at one side of the mirror and the observers who were positioned at the other side of the mirror can see them (pretest and posttest); applying the law of reflection to plane mirrors without considering the variations in angles of incidence and reflection (pretest and posttest), and image observation is confused with the image formation in mirrors placed perpendicular to one another (pretest and posttest). Future research should focus on the acquisition, development, and identification of reliable measures of optics concepts, processes, types of knowledge, and specific optics understanding (i.e., pinhole, plane-mirror). Future research should focus on the identification of the more critical concepts such as changes in size and sharpness of pinhole images, image observation, image formation in general, and image formation and observation in parallel mirrors. Future research can be conducted with a larger set of participants so as to compare different instructional methods and address instructional deficiencies using more efficient statistical methods. Comparative studies can be conducted to investigate the relations of various instructional strategies on student conceptions of optics.
Study deviance-type scale in the development of Korean elder.

PubMed

Cho, Gun-Sang; Yi, Eun-Surk; Hwang, Hee-Jeong

2015-12-01

This research aims to develop a questionnaire of deviant behavior for the Korean elderly people which may make a big contribution to the examination of deviance behavior of the elderly people and may play an important role in providing a methodological basis. In order to accomplish the purpose of the this study, there were three different stages; (a) making preliminary question items, (b) refining the items of the scale through a plot study, and (c) finalizing question items by a main survey. In the first stage, 43 question items were developed using the open-ended questionnaire and structural inquiry of succession from 137 elderly people who are over 65 yr. In the second phase, based on data collected by the 200 elderly people pilot testing was performed through exploratory factor analysis and reliability test. The scale is a 27-item self-report questionnaire. In the main survey conducted by 184 elderly people, 21 items, which consisted of four subfactors, were finalized in order to measure deviance behaviors of the Korean elderly people: social deviance (n=8), economic deviance (n=5), psychological deviance (n=5), and physical deviance (n=3).
Assessing Student Preparation through Placement Tests

NASA Astrophysics Data System (ADS)

McFate, Craig; Olmsted, John, III

1999-04-01

The chemistry department at California State University, Fullerton, uses a placement test of its own design to assess student readiness to enroll in General Chemistry. This test contains items designed to test cognitive skills more than factual knowledge. We have analyzed the ability of this test to predict student success (defined as passing the first-semester course with a C or better) using data for 845 students from four consecutive semesters. In common with other placement tests, we find a weak but statistically significant correlation between test performance and course grades. More meaningfully, there is a strong correlation (R2 = 0.82) between test score and course success, sufficient to use for counseling purposes. An item analysis was conducted to determine what types of questions provide the best predictability. Six questions from the full set of 25 were identified as strong predictors, on the basis of discrimination indices and coefficients of determination that were more than one standard deviation above the mean values for test items. These questions had little in common except for requiring multistep mathematical operations and formal reasoning.
Development of short-form measures to assess four types of elder mistreatment: Findings from an evidence-based study of APS elder abuse substantiation decisions.

PubMed

Beach, Scott R; Liu, Pi-Ju; DeLiema, Marguerite; Iris, Madelyn; Howe, Melissa J K; Conrad, Kendon J

2017-01-01

Improving the standardization and efficiency of adult protective services (APS) investigations is a top priority in APS practice. Using data from the Elder Abuse Decision Support System (EADSS), we developed short-form measures of four types of elder abuse: financial, emotional/psychological, physical, and neglect. The EADSS data set contains 948 elder abuse cases (age 60+) with yes/no abuse substantiation decisions for each abuse type following a 30-day investigation. Item sensitivity/specificity analyses were conducted on long-form items with the substantiation decision for each abuse type as the criterion. Validity was further tested using receiver-operator characteristic (ROC) curve analysis, correlation with long forms and internal consistency. The four resulting short-form measures, containing 36 of the 82 original items, have validity similar to the original long forms. These short forms can be used to standardize and increase efficiency of APS investigations, and may also offer researchers new options for brief elder abuse assessments.
Development of short-form measures to assess four types of elder mistreatment: Findings from an evidence-based study of APS elder abuse substantiation decisions

PubMed Central

Beach, Scott R.; Liu, Pi-Ju; DeLiema, Marguerite; Iris, Madelyn; Howe, Melissa J.K.; Conrad, Kendon J.

2018-01-01

Improving the standardization and efficiency of adult protective services (APS) investigations is a top priority in APS practice. Using data from the Elder Abuse Decision Support System (EADSS), we developed short-form measures of four types of elder abuse: financial, emotional/psychological, physical, and neglect. The EADSS data set contains 948 elder abuse cases (age 60+) with yes/no abuse substantiation decisions for each abuse type following a 30-day investigation. Item sensitivity/specificity analyses were conducted on long-form items with the substantiation decision for each abuse type as the criterion. Validity was further tested using receiver–operator characteristic (ROC) curve analysis, correlation with long forms and internal consistency. The four resulting short-form measures, containing 36 of the 82 original items, have validity similar to the original long forms. These short forms can be used to standardize and increase efficiency of APS investigations, and may also offer researchers new options for brief elder abuse assessments. PMID:28590799
Measuring the effects of online health information: Scale validation for the e-Health Impact Questionnaire.

PubMed

Kelly, Laura; Ziebland, Sue; Jenkinson, Crispin

2015-11-01

Health-related websites have developed to be much more than information sites: they are used to exchange experiences and find support as well as information and advice. This paper documents the development of a tool to compare the potential consequences and experiences a person may encounter when using health-related websites. Questionnaire items were developed following a review of relevant literature and qualitative secondary analysis of interviews relating to experiences of health. Item reduction steps were performed on pilot survey data (n=167). Tests of validity and reliability were subsequently performed (n=170) to determine the psychometric properties of the questionnaire. Two independent item pools entered psychometric testing: (1) Items relating to general views of using the internet in relation to health and, (2) Items relating to the consequences of using a specific health-related website. Identified sub-scales were found to have high construct validity, internal consistency and test-retest reliability. Analyses confirmed good psychometric properties in the eHIQ-Part 1 (11 items) and the eHIQ-Part 2 (26 items). This tool will facilitate the measurement of the potential consequences of using websites containing different types of material (scientific facts and figures, blogs, experiences, images) across a range of health conditions. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Design of a Microcomputer-Based Adaptive Testing System.

ERIC Educational Resources Information Center

Vale, C. David

This paper explores the feasibility of developing a single-user microcomputer-based testing system. Testing literature was surveyed to discover types of test items that might be used in the system and to compile a list of strategies that such a system might use. Potential users were surveyed. Several were interviewed, and a questionnaire was…
An empirically derived short form of the Hypoglycaemia Fear Survey II.

PubMed

Grabman, J; Vajda Bailey, K; Schmidt, K; Cariou, B; Vaur, L; Madani, S; Cox, D; Gonder-Frederick, L

2017-04-01

To develop an empirically derived short version of the Hypoglycaemia Fear Survey II that still accurately measures fear of hypoglycaemia. Item response theory methods were used to generate an 11-item version of the Hypoglycaemia Fear Survey from a sample of 487 people with Type 1 or Type 2 diabetes mellitus. Subsequently, this scale was tested on a sample of 2718 people with Type 1 or insulin-treated Type 2 diabetes taking part in DIALOG, a large observational prospective study of hypoglycaemia in France. The short form of the Hypoglycaemia Fear Survey II matched the factor structure of the long form for respondents with both Type 1 and Type 2 diabetes, while maintaining adequate internal reliability on the total scale and all three subscales. The two forms were highly correlated on both the total scale and each subscale (Pearson's R > 0.89). The short form of the Hypoglycaemia Fear Survey II is an important first step in more efficiently measuring fear of hypoglycaemia. Future prospective studies are needed for further validity testing and exploring the survey's applicability to different populations. © 2016 Diabetes UK.
Comparison of trait and ability measures of emotional intelligence in medical students.

PubMed

Brannick, Michael T; Wahi, Monika M; Arce, Melissa; Johnson, Hazel-Anne; Nazian, Stanley; Goldin, Steven B

2009-11-01

Emotional intelligence (EI), the ability to perceive emotions in the self and others, and to understand, regulate and use such information in productive ways, is believed to be important in health care delivery for both recipients and providers of health care. There are two types of EI measure: ability and trait. Ability and trait measures differ in terms of both the definition of constructs and the methods of assessment. Ability measures conceive of EI as a capacity that spans the border between reason and feeling. Items on such a measure include showing a person a picture of a face and asking what emotion the pictured person is feeling; such items are scored by comparing the test-taker's response to a keyed emotion. Trait measures include a very large array of non-cognitive abilities related to success, such as self-control. Items on such measures ask individuals to rate themselves on such statements as: 'I generally know what other people are feeling.' Items are scored by giving higher scores to greater self-assessments. We compared one of each type of test with the other for evidence of reliability, convergence and overlap with personality. Year 1 and 2 medical students completed the Meyer-Salovey-Caruso Emotional Intelligence Test (MSCEIT, an ability measure), the Wong and Law Emotional Intelligence Scale (WLEIS, a trait measure) and an industry standard personality test (the Neuroticism-Extroversion-Openness [NEO] test). The MSCEIT showed problems with reliability. The MSCEIT and the WLEIS did not correlate highly with one another (overall scores correlated at 0.18). The WLEIS was more highly correlated with personality scales than the MSCEIT. Different tests that are supposed to measure EI do not measure the same thing. The ability measure was not correlated with personality, but the trait measure was correlated with personality.
Remote memory as a function of age and sex.

PubMed

Storandt, M; Grant, E A; Gordon, B C

1978-10-01

Memory for events which occurred between 1910 and 1969 was examined in individuals ranging in age from 20 to 80 years. Two types of events were included: Those which represented happenings of historical significance and those which dealt with the entertainment world of the past. Men were found to recall historical items significantly better than women, while entertainment items were equally well recalled by the two sexes. Age of peak memory for past events from the entertainment world increased with the age of the item; individuals seemed to remember best those events which occurred in their youth or young adulthood. This pattern was not replicated with respect to the historical current events items; however, these items may be a biased test of remote memory in women.

Directed forgetting: Comparing pictures and words.

PubMed

Quinlan, Chelsea K; Taylor, Tracy L; Fawcett, Jonathan M

2010-03-01

The authors investigated directed forgetting as a function of the stimulus type (picture, word) presented at study and test. In an item-method directed forgetting task, study items were presented 1 at a time, each followed with equal probability by an instruction to remember or forget. Participants exhibited greater yes-no recognition of remember than forget items for each of the 4 study-test conditions (picture-picture, picture-word, word-word, word-picture). However, this difference was significantly smaller when pictures were studied than when words were studied. This finding demonstrates that the magnitude of the directed forgetting effect can be reduced by high item memorability, such as when the picture superiority effect is operating. This suggests caution in using pictures at study when the goal of an experiment is to examine potential group differences in the magnitude of the directed forgetting effect. 2010 APA, all rights reserved.
ARRIVE has not ARRIVEd: Support for the ARRIVE (Animal Research: Reporting of in vivo Experiments) guidelines does not improve the reporting quality of papers in animal welfare, analgesia or anesthesia.

PubMed

Leung, Vivian; Rousseau-Blass, Frédérik; Beauchamp, Guy; Pang, Daniel S J

2018-01-01

Poor research reporting is a major contributing factor to low study reproducibility, financial and animal waste. The ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines were developed to improve reporting quality and many journals support these guidelines. The influence of this support is unknown. We hypothesized that papers published in journals supporting the ARRIVE guidelines would show improved reporting compared with those in non-supporting journals. In a retrospective, observational cohort study, papers from 5 ARRIVE supporting (SUPP) and 2 non-supporting (nonSUPP) journals, published before (2009) and 5 years after (2015) the ARRIVE guidelines, were selected. Adherence to the ARRIVE checklist of 20 items was independently evaluated by two reviewers and items assessed as fully, partially or not reported. Mean percentages of items reported were compared between journal types and years with an unequal variance t-test. Individual items and sub-items were compared with a chi-square test. From an initial cohort of 956, 236 papers were included: 120 from 2009 (SUPP; n = 52, nonSUPP; n = 68), 116 from 2015 (SUPP; n = 61, nonSUPP; n = 55). The percentage of fully reported items was similar between journal types in 2009 (SUPP: 55.3 ± 11.5% [SD]; nonSUPP: 51.8 ± 9.0%; p = 0.07, 95% CI of mean difference -0.3-7.3%) and 2015 (SUPP: 60.5 ± 11.2%; nonSUPP; 60.2 ± 10.0%; p = 0.89, 95%CI -3.6-4.2%). The small increase in fully reported items between years was similar for both journal types (p = 0.09, 95% CI -0.5-4.3%). No paper fully reported 100% of items on the ARRIVE checklist and measures associated with bias were poorly reported. These results suggest that journal support for the ARRIVE guidelines has not resulted in a meaningful improvement in reporting quality, contributing to ongoing waste in animal research.
Subjective Learning Discounts Test Type: Evidence from an Associative Learning and Transfer Task

PubMed Central

Touron, Dayna R.; Hertzog, Christopher; Speagle, James Z.

2011-01-01

We evaluated the extent to which memory test format and test transfer influence the dynamics of metacognitive judgments. Participants completed 2 study-test phases for paired-associates, with or without transferring test type, in one of four conditions: (1) recognition then recall, (2) recall then recognition, (3) recognition throughout, or (4) recall throughout. Global judgments were made pre-study, post-study, and post-test for each phase; judgments of learning (JOLs) following item study were also collected. Results suggest that metacognitive judgment accuracy varies substantially by memory test type. Whereas underconfidence in JOLs and global predictions increases with recall practice (Koriat’s underconfidence-with-practice effect), underconfidence decreases with recognition practice. Moreover, performance changes when transferring test type were not fully anticipated by pre-test judgments. PMID:20178957
The emotional carryover effect in memory for words.

PubMed

Schmidt, Stephen R; Schmidt, Constance R

2016-08-01

Emotional material rarely occurs in isolation; rather it is experienced in the spatial and temporal proximity of less emotional items. Some previous researchers have found that emotional stimuli impair memory for surrounding information, whereas others have reported evidence for memory facilitation. Researchers have not determined which types of emotional items or memory tests produce effects that carry over to surrounding items. Six experiments are reported that measured carryover from emotional words varying in arousal to temporally adjacent neutral words. Taboo, non-taboo emotional, and neutral words were compared using different stimulus onset asynchronies (SOAs), recognition and recall tests, and intentional and incidental memory instructions. Strong emotional memory effects were obtained in all six experiments. However, emotional items influenced memory for temporally adjacent words under limited conditions. Words following taboo words were more poorly remembered than words following neutral words when relatively short SOAs were employed. Words preceding taboo words were affected only when recall tests and relatively short retention intervals were used. These results suggest that increased attention to the emotional items sometimes produces emotional carryover effects; however, retrieval processes also contribute to retrograde amnesia and may extend the conditions under which anterograde amnesia is observed.
Effects of Varied Enhancement Strategies (Chunking, Feedback, Gaming) in Complementing Animated Instruction in Facilitating Different Types of Learning Objectives

ERIC Educational Resources Information Center

Munyofu, Mine

2008-01-01

The purpose of this study was to examine the instructional effectiveness of different levels of chunking (simple visual/text and complex visual/text), different forms of feedback (item-by-item feedback, end-of-test feedback and no feedback), and use of instructional gaming (game and no game) in complementing animated programmed instruction on a…
The Quest for Item Types Based on Information Processing: An Analysis of Raven's Advanced Progressive Matrices, with a Consideration of Gender Differences

ERIC Educational Resources Information Center

Vigneau, Francois; Bors, Douglas A.

2008-01-01

Various taxonomies of Raven's Advanced Progressive Matrices (APM) items have been proposed in the literature to account for performance on the test. In the present article, three such taxonomies based on information processing, namely Carpenter, Just and Shell's [Carpenter, P.A., Just, M.A., & Shell, P., (1990). What one intelligence test…
Full-Information Item Bi-Factor Analysis. ONR Technical Report. [Biometric Lab Report No. 90-2.

ERIC Educational Resources Information Center

Gibbons, Robert D.; And Others

A plausible "s"-factor solution for many types of psychological and educational tests is one in which there is one general factor and "s - 1" group- or method-related factors. The bi-factor solution results from the constraint that each item has a non-zero loading on the primary dimension "alpha(sub j1)" and at most…
The Effects of Different Types of Anchor Tests on Observed Score Equating. Research Report. ETS RR-09-41

ERIC Educational Resources Information Center

Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Feigenbaum, Miriam; Curley, Edward

2009-01-01

This study explores the use of a different type of anchor, a "midi anchor", that has a smaller spread of item difficulties than the tests to be equated, and then contrasts its use with the use of a "mini anchor". The impact of different anchors on observed score equating were evaluated and compared with respect to systematic…
Emperical Tests of Acceptance Sampling Plans

NASA Technical Reports Server (NTRS)

White, K. Preston, Jr.; Johnson, Kenneth L.

2012-01-01

Acceptance sampling is a quality control procedure applied as an alternative to 100% inspection. A random sample of items is drawn from a lot to determine the fraction of items which have a required quality characteristic. Both the number of items to be inspected and the criterion for determining conformance of the lot to the requirement are given by an appropriate sampling plan with specified risks of Type I and Type II sampling errors. In this paper, we present the results of empirical tests of the accuracy of selected sampling plans reported in the literature. These plans are for measureable quality characteristics which are known have either binomial, exponential, normal, gamma, Weibull, inverse Gaussian, or Poisson distributions. In the main, results support the accepted wisdom that variables acceptance plans are superior to attributes (binomial) acceptance plans, in the sense that these provide comparable protection against risks at reduced sampling cost. For the Gaussian and Weibull plans, however, there are ranges of the shape parameters for which the required sample sizes are in fact larger than the corresponding attributes plans, dramatically so for instances of large skew. Tests further confirm that the published inverse-Gaussian (IG) plan is flawed, as reported by White and Johnson (2011).
A validation study of the Keyboard Personal Computer Style instrument (K-PeCS) for use with children.

PubMed

Green, Dido; Meroz, Anat; Margalit, Adi Edit; Ratzon, Navah Z

2012-11-01

This study examines a potential instrument for measurement of typing postures of children. This paper describes inter-rater, test-retest reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS), an observational measurement of postures and movements during keyboarding, for use with children. Two trained raters independently rated videos of 24 children (aged 7-10 years). Six children returned one week later for identifying test-retest reliability. Concurrent validity was assessed by comparing ratings obtained using the K-PECS to scores from a 3D motion analysis system. Inter-rater reliability was moderate to high for 12 out of 16 items (Kappa: 0.46 to 1.00; correlation coefficients: 0.77-0.95) and test-retest reliability varied across items (Kappa: 0.25 to 0.67; correlation coefficients: r = 0.20 to r = 0.95). Concurrent validity compared favourably across arm pathlength, wrist extension and ulnar deviation. In light of the limitations of other tools the K-PeCS offers a fairly affordable, reliable and valid instrument to address the gap for measurement of typing styles of children, despite the shortcomings of some items. However further research is required to refine the instrument for use in evaluating typing among children. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

PubMed

Andriessen, Teuntje M J C; de Jong, Ben; Jacobs, Bram; van der Werf, Sieberen P; Vos, Pieter E

2009-04-01

To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). Daily testing was performed in 64 consecutively admitted traumatic brain injured patients, 22 orthopedically injured patients and 26 healthy controls until criteria for resolution of PTA were reached. Subjects were randomly assigned to a test with visual or verbal stimuli. Short delay reproduction was tested after an interval of 3-5 minutes, long delay reproduction was tested after 24 hours. Sensitivity and specificity were calculated over the first 4 test days. The 3-word test showed higher sensitivity than the 3-picture test, while specificity of the two tests was equally high. Free recall was a more effortful task than recognition for both patients and controls. In patients, a longer delay between registration and recall resulted in a significant decrease in the number of items reproduced. Presence of PTA is best assessed with a memory test that incorporates the free recall of words after a long delay.
Evidence for proactive interference in the focus of attention of working memory.

PubMed

Carroll, Lauren M; Jalbert, Annie; Penney, Alexander M; Neath, Ian; Surprenant, Aimée M; Tehan, Gerald

2010-09-01

Proactive interference (PI) occurs when an earlier item interferes with memory for a newer item. Whereas some researchers (e.g., Surprenant & Neath, 2009a) argue that PI can be observed in all memory systems, some multiple systems theorists (e.g., Cowan, 1999) propose that items in the focus of attention of working memory are immune to PI. Two experiments tested whether PI occurs when the to-be-remembered items are assumed, by multiple-systems theorists, to be held in the focus of attention. In each experiment, subjects saw four trials in a row with the same type of to-be-remembered items, followed by four trials in a row with a different type of material. On each trial, only 3 stimuli were shown, which is below the capacity limit of the focus of attention, and subjects were asked if a probe item was one of those 3 items seen. In both experiments, response time increased from Trial 1 to Trial 4, suggesting that items from the earlier trials interfered with memory on the later trials. In addition, release from PI was shown in that response times decreased with a change of materials. The results replicate those first reported by Hanley and Scheirer (1975), and pose a problem for theorists who argue that parts of short-term memory are immune to PI. Copyright 2010 APA, all rights reserved.
Modeling Student Test-Taking Motivation in the Context of an Adaptive Achievement Test

ERIC Educational Resources Information Center

Wise, Steven L.; Kingsbury, G. Gage

2016-01-01

This study examined the utility of response time-based analyses in understanding the behavior of unmotivated test takers. For the data from an adaptive achievement test, patterns of observed rapid-guessing behavior and item response accuracy were compared to the behavior expected under several types of models that have been proposed to represent…
An Investigation of Integrative and Independent Listening Test Tasks in a Computerised Academic English Test

ERIC Educational Resources Information Center

Wei, Wei; Zheng, Ying

2017-01-01

This research provided a comprehensive evaluation and validation of the listening section of a newly introduced computerised test, Pearson Test of English Academic (PTE Academic). PTE Academic contains 11 item types assessing academic listening skills either alone or in combination with other skills. First, task analysis helped identify skills…
Judged Similarity of Aptitude and Achievement Tests in Mathematics.

ERIC Educational Resources Information Center

Donlon, Thomas F.

This study attempts to establish the ability of a panel of five judges with varied mathematics background to distinguish between two types of mathematical tests by separating their component items when they are presented in a mixed pool of aptitude and achievement tests. Typically, the two tests show high correlation. The judges showed about 70%…
Conditional Standard Errors of Measurement for Composite Scores Using IRT

ERIC Educational Resources Information Center

Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan

2012-01-01

Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…
United States History. Annotated Bibliography of Tests.

ERIC Educational Resources Information Center

Educational Testing Service, Princeton, NJ. Test Collection.

The 33 tests in this bibliography cover United States History from the period of exploration of the continent through the Civil War to Post World War II. One test measures knowledge of African American history. Types of measures include credit by examination, item banks, and end-of-course tests. All ages are represented but the majority of tests…
Development of a scale to measure diabetes self-management behaviors among older Koreans with type 2 diabetes, based on the seven domains identified by the American Association of Diabetes Educators.

PubMed

Seo, Kyoungsan; Song, Misoon; Choi, Suyoung; Kim, Se-An; Chang, Sun Ju

2017-04-01

The purpose of this study was to develop the Diabetes Self-Management Behavior for Older Koreans (DSMB-O). This scale is based on the seven relevant domains that have been identified by the American Association of Diabetes Educators (AADE) and is adjusted for sociocultural and age-related characteristics. Four phases were used to develop of the DSMB-O as a criterion-referenced measure. In phases 1 and 2, the DSMB-O adopted the AADE's seven domains and established a self-report questionnaire using a small number of items that are applicable to older Koreans. In phase 3, the DSMB-O was formulated with 16 preliminary items, including seven subitems. By assessing the content validity, 14 items (including five subitems) were selected. The final phase involved evaluating the DSMB-O's psychometric properties, including test-retest reliability, content validity, and criterion-related validity, using data from 150 older Koreans with type 2 diabetes. The coefficients of agreement and Cohen's Kappa for the test-retest reliability test ranged from 0.32 to 1.0 and -0.07 to 1.0, respectively. For the content validity, the values of both the item- and scale-level content validity indices were 1.0. The scores from the DSMB-O were positively correlated with the scores from the Korean version of the Summary of Diabetes Self-Care Activities Questionnaire. The DSMB-O is short and easy for older Koreans to use, as well as having acceptable levels of reliability and validity. Hence, the DSMB-O can be a useful tool to evaluate diabetes self-management behaviors in older Koreans with type 2 diabetes. © 2016 Japan Academy of Nursing Science.
The TASS Method for Scoring Production Work

ERIC Educational Resources Information Center

Orsborn, Karen J.

1977-01-01

Describes and recommends to typing teachers the Time and Speed Scoring (TASS) method for testing typewriting production and figuring production rates. This method measures speed and quality as two separate items, because proficiency in production typewriting should reflect the student's skill in typing both accurately and quickly. (MF)
Development and preliminary evaluation of a music-based attention assessment for patients with traumatic brain injury.

PubMed

Jeong, Eunju; Lesiuk, Teresa L

2011-01-01

Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.

Adaptation of the Practice Environment Scale for military nurses: a psychometric analysis.

PubMed

Swiger, Pauline A; Raju, Dheeraj; Breckenridge-Sproat, Sara; Patrician, Patricia A

2017-09-01

The aim of this study was to confirm the psychometric properties of Practice Environment Scale of the Nursing Work Index in a military population. This study also demonstrates association rule analysis, a contemporary exploratory technique. One of the instruments most commonly used to evaluate the nursing practice environment is the Practice Environment Scale of the Nursing Work Index. Although the instrument has been widely used, the reliability, validity and individual item function are not commonly evaluated. Gaps exist with regard to confirmatory evaluation of the subscale factors, individual item analysis and evaluation in the outpatient setting and with non-registered nursing staff. This was a secondary data analysis of existing survey data. Multiple psychometric methods were used for this analysis using survey data collected in 2014. First, descriptive analyses were conducted, including exploration using association rules. Next, internal consistency was tested and confirmatory factor analysis was performed to test the factor structure. The specified factor structure did not hold; therefore, exploratory factor analysis was performed. Finally, item analysis was executed using item response theory. The differential item functioning technique allowed the comparison of responses by care setting and nurse type. The results of this study indicate that responses differ between groups and that several individual items could be removed without altering the psychometric properties of the instrument. The instrument functions moderately well in a military population; however, researchers may want to consider nurse type and care setting during analysis to identify any meaningful variation in responses. © 2017 John Wiley & Sons Ltd.
The Promise of NLP and Speech Processing Technologies in Language Assessment

ERIC Educational Resources Information Center

Chapelle, Carol A.; Chung, Yoo-Ree

2010-01-01

Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or…
Bi-Factor MIRT Observed-Score Equating for Mixed-Format Tests

ERIC Educational Resources Information Center

Lee, Guemin; Lee, Won-Chan

2016-01-01

The main purposes of this study were to develop bi-factor multidimensional item response theory (BF-MIRT) observed-score equating procedures for mixed-format tests and to investigate relative appropriateness of the proposed procedures. Using data from a large-scale testing program, three types of pseudo data sets were formulated: matched samples,…
Working memory and inhibitory control across the life span: Intrusion errors in the Reading Span Test.

PubMed

Robert, Christelle; Borella, Erika; Fagot, Delphine; Lecerf, Thierry; de Ribaupierre, Anik

2009-04-01

The aim of this study was to examine to what extent inhibitory control and working memory capacity are related across the life span. Intrusion errors committed by children and younger and older adults were investigated in two versions of the Reading Span Test. In Experiment 1, a mixed Reading Span Test with items of various list lengths was administered. Older adults and children recalled fewer correct words and produced more intrusions than did young adults. Also, age-related differences were found in the type of intrusions committed. In Experiment 2, an adaptive Reading Span Test was administered, in which the list length of items was adapted to each individual's working memory capacity. Age groups differed neither on correct recall nor on the rate of intrusions, but they differed on the type of intrusions. Altogether, these findings indicate that the availability of attentional resources influences the efficiency of inhibition across the life span.
The ‘Maltreatment and Abuse Chronology of Exposure’ (MACE) Scale for the Retrospective Assessment of Abuse and Neglect During Development

PubMed Central

Teicher, Martin H.; Parigger, Angelika

2015-01-01

There is increasing interest in childhood maltreatment as a potent stimulus that may alter trajectories of brain development, induce epigenetic modifications and enhance risk for medical and psychiatric disorders. Although a number of useful scales exist for retrospective assessment of abuse and neglect they have significant limitations. Moreover, they fail to provide detailed information on timing of exposure, which is critical for delineation of sensitive periods. The Maltreatment and Abuse Chronology of Exposure (MACE) scale was developed in a sample of 1051 participants using item response theory to gauge severity of exposure to ten types of maltreatment (emotional neglect, non-verbal emotional abuse, parental physical maltreatment, parental verbal abuse, peer emotional abuse, peer physical bullying, physical neglect, sexual abuse, witnessing interparental violence and witnessing violence to siblings) during each year of childhood. Items included in the subscales had acceptable psychometric properties based on infit and outfit mean square statistics, and each subscale passed Andersen’s Likelihood ratio test. The MACE provides an overall severity score and multiplicity score (number of types of maltreatment experienced) with excellent test-retest reliability. Each type of maltreatment showed good reliability as did severity of exposure across each year of childhood. MACE Severity correlated 0.738 with Childhood Trauma Questionnaire (CTQ) score and MACE Multiplicity correlated 0.698 with the Adverse Childhood Experiences scale (ACE). However, MACE accounted for 2.00- and 2.07-fold more of the variance, on average, in psychiatric symptom ratings than CTQ or ACE, respectively, based on variance decomposition. Different types of maltreatment had distinct and often unique developmental patterns. The 52-item MACE, a simpler Maltreatment Abuse and Exposure Scale (MAES) that only assesses overall exposure and the original test instrument (MACE-X) with several additional items plus spreadsheets and R code for scoring are provided to facilitate use and to spur further development. PMID:25714856
Gambling-Related Cognition Scale (GRCS): Are skills-based games at a disadvantage?

PubMed

Lévesque, David; Sévigny, Serge; Giroux, Isabelle; Jacques, Christian

2017-09-01

The Gambling-Related Cognition Scale (GRCS; Raylu & Oei, 2004) was developed to evaluate gambling-related cognitive distortions for all types of gamblers, regardless of their gambling activities (poker, slot machine, etc.). It is therefore imperative to ascertain the validity of its interpretation across different types of gamblers; however, some skills-related items endorsed by players could be interpreted as a cognitive distortion despite the fact that they play skills-related games. Using an intergroup (168 poker players and 73 video lottery terminal [VLT] players) differential item functioning (DIF) analysis, this study examined the possible manifestation of item biases associated with the GRCS. DIF was analyzed with ordinal logistic regressions (OLRs) and Ramsay's (1991) nonparametric kernel smoothing approach with TestGraf. Results show that half of the items display at least moderate DIF between groups and, depending on the type of analysis used, 3 to 7 items displayed large DIF. The 5 items with the most DIF were more significantly endorsed by poker players (uniform DIF) and were all related to skills, knowledge, learning, or probabilities. Poker players' interpretations of some skills-related items may lead to an overestimation of their cognitive distortions due to their total score increased by measurement artifact. Findings indicate that the current structure of the GRCS contains potential biases to be considered when poker players are surveyed. The present study conveys new and important information on bias issues to ponder carefully before using and interpreting the GRCS and other similar wide-range instruments with poker players. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Assessment of Genetics Understanding. Under What Conditions Do Situational Features Have an Impact on Measures?

NASA Astrophysics Data System (ADS)

Schmiemann, Philipp; Nehm, Ross H.; Tornabene, Robyn E.

2017-12-01

Understanding how situational features of assessment tasks impact reasoning is important for many educational pursuits, notably the selection of curricular examples to illustrate phenomena, the design of formative and summative assessment items, and determination of whether instruction has fostered the development of abstract schemas divorced from particular instances. The goal of our study was to employ an experimental research design to quantify the degree to which situational features impact inferences about participants' understanding of Mendelian genetics. Two participant samples from different educational levels and cultural backgrounds (high school, n = 480; university, n = 444; Germany and USA) were used to test for context effects. A multi-matrix test design was employed, and item packets differing in situational features (e.g., plant, animal, human, fictitious) were randomly distributed to participants in the two samples. Rasch analyses of participant scores from both samples produced good item fit, person reliability, and item reliability and indicated that the university sample displayed stronger performance on the items compared to the high school sample. We found, surprisingly, that in both samples, no significant differences in performance occurred among the animal, plant, and human item contexts, or between the fictitious and "real" item contexts. In the university sample, we were also able to test for differences in performance between genders, among ethnic groups, and by prior biology coursework. None of these factors had a meaningful impact upon performance or context effects. Thus some, but not all, types of genetics problem solving or item formats are impacted by situational features.
Evaluating Instrument Quality in Science Education: Rasch-based analyses of a Nature of Science test

NASA Astrophysics Data System (ADS)

Neumann, Irene; Neumann, Knut; Nehm, Ross

2011-07-01

Given the central importance of the Nature of Science (NOS) and Scientific Inquiry (SI) in national and international science standards and science learning, empirical support for the theoretical delineation of these constructs is of considerable significance. Furthermore, tests of the effects of varying magnitudes of NOS knowledge on domain-specific science understanding and belief require the application of instruments validated in accordance with AERA, APA, and NCME assessment standards. Our study explores three interrelated aspects of a recently developed NOS instrument: (1) validity and reliability; (2) instrument dimensionality; and (3) item scales, properties, and qualities within the context of Classical Test Theory and Item Response Theory (Rasch modeling). A construct analysis revealed that the instrument did not match published operationalizations of NOS concepts. Rasch analysis of the original instrument-as well as a reduced item set-indicated that a two-dimensional Rasch model fit significantly better than a one-dimensional model in both cases. Thus, our study revealed that NOS and SI are supported as two separate dimensions, corroborating theoretical distinctions in the literature. To identify items with unacceptable fit values, item quality analyses were used. A Wright Map revealed that few items sufficiently distinguished high performers in the sample and excessive numbers of items were present at the low end of the performance scale. Overall, our study outlines an approach for how Rasch modeling may be used to evaluate and improve Likert-type instruments in science education.
The write way to spell: printing vs. typing effects on orthographic learning

PubMed Central

Ouellette, Gene; Tims, Talisa

2014-01-01

Prior research has shown superior orthographic learning resulting from spelling practice relative to repeated reading. One mechanism proposed to underlie this advantage of spelling in establishing detailed orthographic representations in memory is the motoric component of the manual movements evoked in printing or writing. This study investigated this contention directly by testing the effects of typing vs. printing on the orthographic learning achieved through spelling practice, and further evaluated whether practice modality interacts with pre-existing individual characteristics. Forty students in grade 2 (mean age 7 years 5 months) were introduced to 10 novel non-words. Some of the students practiced spelling the items by printing, while the others practiced spelling them on a keyboard. Participants were tested for recognition and spelling of these items 1 and 7 days later. Results revealed high rates of orthographic learning with no main effects of practice modality, testing time, or post-test modality. Hierarchical regression analyses revealed an interaction between typing proficiency and practice modality, such that pre-existing keyboarding skills constrained or facilitated learning within the typing-practice group. A similar interaction was not found between printing skills and learning within the printing group. Results are discussed with reference to both prominent reading theory and educational applications. PMID:24592247
Multiple balance tests improve the assessment of postural stability in subjects with Parkinson's disease

PubMed Central

Jacobs, J V; Horak, F B; Tran, V K; Nutt, J G

2006-01-01

Objectives Clinicians often base the implementation of therapies on the presence of postural instability in subjects with Parkinson's disease (PD). These decisions are frequently based on the pull test from the Unified Parkinson's Disease Rating Scale (UPDRS). We sought to determine whether combining the pull test, the one‐leg stance test, the functional reach test, and UPDRS items 27–29 (arise from chair, posture, and gait) predicts balance confidence and falling better than any test alone. Methods The study included 67 subjects with PD. Subjects performed the one‐leg stance test, the functional reach test, and the UPDRS motor exam. Subjects also responded to the Activities‐specific Balance Confidence (ABC) scale and reported how many times they fell during the previous year. Regression models determined the combination of tests that optimally predicted mean ABC scores or categorised fall frequency. Results When all tests were included in a stepwise linear regression, only gait (UPDRS item 29), the pull test (UPDRS item 30), and the one‐leg stance test, in combination, represented significant predictor variables for mean ABC scores (r2 = 0.51). A multinomial logistic regression model including the one‐leg stance test and gait represented the model with the fewest significant predictor variables that correctly identified the most subjects as fallers or non‐fallers (85% of subjects were correctly identified). Conclusions Multiple balance tests (including the one‐leg stance test, and the gait and pull test items of the UPDRS) that assess different types of postural stress provide an optimal assessment of postural stability in subjects with PD. PMID:16484639
Study deviance-type scale in the development of Korean elder

PubMed Central

Cho, Gun-Sang; Yi, Eun-Surk; Hwang, Hee-Jeong

2015-01-01

This research aims to develop a questionnaire of deviant behavior for the Korean elderly people which may make a big contribution to the examination of deviance behavior of the elderly people and may play an important role in providing a methodological basis. In order to accomplish the purpose of the this study, there were three different stages; (a) making preliminary question items, (b) refining the items of the scale through a plot study, and (c) finalizing question items by a main survey. In the first stage, 43 question items were developed using the open-ended questionnaire and structural inquiry of succession from 137 elderly people who are over 65 yr. In the second phase, based on data collected by the 200 elderly people pilot testing was performed through exploratory factor analysis and reliability test. The scale is a 27-item self-report questionnaire. In the main survey conducted by 184 elderly people, 21 items, which consisted of four subfactors, were finalized in order to measure deviance behaviors of the Korean elderly people: social deviance (n=8), economic deviance (n=5), psychological deviance (n=5), and physical deviance (n=3). PMID:26730382
Memory in pregnancy and post-partum: Item specific and relational encoding processes in recall and recognition.

PubMed

Spataro, Pietro; Saraulli, Daniele; Oriolo, Debora; Costanzi, Marco; Zanetti, Humberto; Cestari, Vincenzo; Rossi-Arnaud, Clelia

2016-08-01

It has been recently proposed that pregnant women would perform memory tasks by focusing more on item-specific processes and less on relational processing, compared to post-partum women (Mickes, Wixted, Shapiro & Scarff, ). The present cross-sectional study tested this hypothesis by directly manipulating the type of encoding employed in the study phase. Pregnant, post-partum and control women either rated the pleasantness of word meaning (which induced item-specific elaboration) or named the semantic category to which they belonged (which induced relational elaboration). Memory for the encoded words was later tested in free recall (which emphasizes relational processing) and in recognition (which emphasizes item-specific processing). In line with Mickes et al.'s () conclusions, pregnant women in the item-specific condition performed worse than post-partum women in the relational condition in free recall, but not in recognition. However, compared to the other two groups, pregnant women also exhibited lower recognition accuracy in the item-specific condition. Overall, these results confirm that pregnant women rely on relational encoding less than post-partum women, but additionally suggest that the former group might use item-specific processes less efficiently than post-partum and control women. © 2016 Scandinavian Psychological Associations and John Wiley & Sons Ltd.
Measurement properties of the WOMAC LK 3.1 pain scale.

PubMed

Stratford, P W; Kennedy, D M; Woodhouse, L J; Spadoni, G F

2007-03-01

The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is applied extensively to patients with osteoarthritis of the hip or knee. Previous work has challenged the validity of its physical function scale however an extensive evaluation of its pain scale has not been reported. Our purpose was to estimate internal consistency, factorial validity, test-retest reliability, and the standard error of measurement (SEM) of the WOMAC LK 3.1 pain scale. Four hundred and seventy-four patients with osteoarthritis of the hip or knee awaiting arthroplasty were administered the WOMAC. Estimates of internal consistency (coefficient alpha), factorial validity (confirmatory factor analysis), and the SEM based on internal consistency (SEM(IC)) were obtained. Test-retest reliability [Type 2,1 intraclass correlation coefficients (ICC)] and a corresponding SEM(TRT) were estimated on a subsample of 36 patients. Our estimates were: internal consistency alpha=0.84; SEM(IC)=1.48; Type 2,1 ICC=0.77; SEM(TRT)=1.69. Confirmatory factor analysis failed to support a single factor structure of the pain scale with uncorrelated error terms. Two comparable models provided excellent fit: (1) a model with correlated error terms between the walking and stairs items, and between night and sit items (chi2=0.18, P=0.98); (2) a two factor model with walking and stairs items loading on one factor, night and sit items loading on a second factor, and the standing item loading on both factors (chi2=0.18, P=0.98). Our examination of the factorial structure of the WOMAC pain scale failed to support a single factor and internal consistency analysis yielded a coefficient less than optimal for individual patient use. An alternate strategy to summing the five-item responses when considering individual patient application would be to interpret item responses separately or to sum only those items which display homogeneity.
Student Achievement in Turkey, According to Question Types Used in PISA 2003-2012 Mathematic Literacy Tests

ERIC Educational Resources Information Center

Özkan, Yesim Özer; Özaslan, Nesrin

2018-01-01

The aim of this study is to determine the level of achievement of students participating in Programme for International Student Assessment (PISA) 2003 and PISA 2012 tests in Turkey according to questions in the mathematical literacy test. This study is a descriptive survey. Within the scope of the study, the mathematical literacy test items were…
Reading Ability and Print Exposure: Item Response Theory Analysis of the Author Recognition Test

PubMed Central

Moore, Mariah; Gordon, Peter C.

2015-01-01

In the Author Recognition Test (ART) participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, with this predictive ability generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. This large-scale study (1012 college student participants) used Item Response Theory (IRT) to analyze item (author) characteristics to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and to optimize scoring of the ART. Factor analysis suggests a potential two factor structure of the ART differentiating between literary vs. popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of time spent encoding words as measured using eye-tracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Further, they show that frequency data can be used to select items of appropriate difficulty and that frequency data from corpora based on particular time periods and types of text may allow test adaptation for different populations. PMID:25410405
Reading ability and print exposure: item response theory analysis of the author recognition test.

PubMed

Moore, Mariah; Gordon, Peter C

2015-12-01

In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
[Development and validity of workplace bullying in nursing-type inventory (WPBN-TI)].

PubMed

Lee, Younju; Lee, Mihyoung

2014-04-01

The purpose of this study was to develop an instrument to assess bullying of nurses, and test the validity and reliability of the instrument. The initial thirty items of WPBN-TI were identified through a review of the literature on types bullying related to nursing and in-depth interviews with 14 nurses who experienced bullying at work. Sixteen items were developed through 2 content validity tests by 9 experts and 10 nurses. The final WPBN-TI instrument was evaluated by 458 nurses from five general hospitals in the Incheon metropolitan area. SPSS 18.0 program was used to assess the instrument based on internal consistency reliability, construct validity, and criterion validity. WPBN-TI consisted of 16 items with three distinct factors (verbal and nonverbal bullying, work-related bullying, and external threats), which explained 60.3% of the total variance. The convergent validity and determinant validity for WPBN-TI were 100.0%, 89.7%, respectively. Known-groups validity of WPBN-TI was proven through the mean difference between subjective perception of bullying. The satisfied criterion validity for WPBN-TI was more than .70. The reliability of WPBN-TI was Cronbach's α of .91. WPBN-TI with high validity and reliability is suitable to determine types of bullying in nursing workplace.
Establishing Reliability and Validity of the Criterion Referenced Exam of GeoloGy Standards EGGS

NASA Astrophysics Data System (ADS)

Guffey, S. K.; Slater, S. J.; Slater, T. F.; Schleigh, S.; Burrows, A. C.

2016-12-01

Discipline-based geoscience education researchers have considerable need for a criterion-referenced, easy-to-administer and -score conceptual diagnostic survey for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing education research across the geosciences, we are continuing to rigorously and systematically work to firmly establish the reliability and validity of the recently released Exam of GeoloGy Standards, EGGS. In educational testing, reliability refers to the consistency or stability of test scores whereas validity refers to the accuracy of the inferences or interpretations one makes from test scores. There are several types of reliability measures being applied to the iterative refinement of the EGGS survey, including test-retest, alternate form, split-half, internal consistency, and interrater reliability measures. EGGS rates strongly on most measures of reliability. For one, Cronbach's alpha provides a quantitative index indicating the extent to which if students are answering items consistently throughout the test and measures inter-item correlations. Traditional item analysis methods further establish the degree to which a particular item is reliably assessing students is actually quantifiable, including item difficulty and item discrimination. Validity, on the other hand, is perhaps best described by the word accuracy. For example, content validity is the to extent to which a measurement reflects the specific intended domain of the content, stemming from judgments of people who are either experts in the testing of that particular content area or are content experts. Perhaps more importantly, face validity is a judgement of how representative an instrument is reflective of the science "at face value" and refers to the extent to which a test appears to measure a the targeted scientific domain as viewed by laypersons, examinees, test users, the public, and other invested stakeholders.
Glider Pilot Written Test Guide: Private and Commercial.

ERIC Educational Resources Information Center

Federal Aviation Administration (DOT), Washington, DC. Flight Standards Service.

The intent of this guide is to define the scope and narrow the field of study as far as possible to the aeronautical knowledge required for qualifying for the private or commercial pilot (glider) certificate. Briefly summarized are type of test items used, hints for taking the test, and certificate requirements. The study outline is the basic…
Examination of Test and Item Statistics from Visual and Verbal Mathematics Questions

ERIC Educational Resources Information Center

Alpayar, Cagla; Gulleroglu, H. Deniz

2017-01-01

The aim of this research is to determine whether students' test performance and approaches to test questions change based on the type of mathematics questions (visual or verbal) administered to them. This research is based on a mixed-design model. The quantitative data are gathered from 297 seventh grade students, attending seven different middle…

Evaluation of the CATSIB DIF Procedure in a Pretest Setting

ERIC Educational Resources Information Center

Nandakumar, Ratna; Roussos, Louis

2004-01-01

A new procedure, CATSIB, for assessing differential item functioning (DIF) on computerized adaptive tests (CATs) is proposed. CATSIB, a modified SIBTEST procedure, matches test takers on estimated ability and controls for impact-induced Type 1 error inflation by employing a CAT version of the IBTEST "regression correction." The…
40 CFR 51.353 - Network type and program evaluation.

Code of Federal Regulations, 2010 CFR

2010-07-01

... § 51.351 or 51.352 of this subpart. For decentralized programs other than those meeting the design... presumptively equivalent to a centralized, test-only system including comparable test elements. States may allow...-serve gasoline, pre-packaged oil, or other, non-automotive, convenience store items. At the State's...
40 CFR 51.353 - Network type and program evaluation.

Code of Federal Regulations, 2012 CFR

2012-07-01

... § 51.351 or 51.352 of this subpart. For decentralized programs other than those meeting the design... presumptively equivalent to a centralized, test-only system including comparable test elements. States may allow...-serve gasoline, pre-packaged oil, or other, non-automotive, convenience store items. At the State's...
40 CFR 51.353 - Network type and program evaluation.

Code of Federal Regulations, 2014 CFR

2014-07-01

... § 51.351 or 51.352 of this subpart. For decentralized programs other than those meeting the design... presumptively equivalent to a centralized, test-only system including comparable test elements. States may allow...-serve gasoline, pre-packaged oil, or other, non-automotive, convenience store items. At the State's...
Effects of bilateral eye movements on the retrieval of item, associative, and contextual information.

PubMed

Parker, Andrew; Relph, Sarah; Dagnall, Neil

2008-01-01

Two experiments are reported that investigate the effects of saccadic bilateral eye movements on the retrieval of item, associative, and contextual information. Experiment 1 compared the effects of bilateral versus vertical versus no eye movements on tests of item recognition, followed by remember-know responses and associative recognition. Supporting previous research, bilateral eye movements enhanced item recognition by increasing the hit rate and decreasing the false alarm rate. Analysis of remember-know responses indicated that eye movement effects were accompanied by increases in remember responses. The test of associative recognition found that bilateral eye movements increased correct responses to intact pairs and decreased false alarms to rearranged pairs. Experiment 2 assessed the effects of eye movements on the recall of intrinsic (color) and extrinsic (spatial location) context. Bilateral eye movements increased correct recall for both types of context. The results are discussed within the framework of dual-process models of memory and the possible neural underpinnings of these effects are considered.
Development and Testing of the Nurse Manager EBP Competency Scale.

PubMed

Shuman, Clayton J; Ploutz-Snyder, Robert J; Titler, Marita G

2018-02-01

The purpose of this study was to develop and evaluate the validity and reliability of an instrument to measure nurse manager competencies regarding evidence-based practice (EBP). The Nurse Manager EBP Competency Scale consists of 16 items for respondents to indicate their perceived level of competency on a 0 to 3 Likert-type scale. Content validity was demonstrated through expert panel review and pilot testing. Principal axis factoring and Cronbach's alpha evaluated construct validity and internal consistency reliability, respectively. Eighty-three nurse managers completed the scale. Exploratory factor analysis resulted in a 16-item scale with two subscales, EBP Knowledge ( n = 6 items, α = .90) and EBP Activity ( n = 10 items, α = .94). Cronbach's alpha for the entire scale was .95. The Nurse Manager EBP Competency Scale is a brief measure of nurse manager EBP competency with evidence of validity and reliability. The scale can enhance our understanding in future studies regarding how nurse manager EBP competency affects implementation.
Identification of User Needs. EDIS Task I Report.

ERIC Educational Resources Information Center

Howard Research Co., Arlington, VA.

This report presents the identification of user needs in the Army research, development, test and evaluation (RDT&E) community. Two types of information are provided in this report. The first type includes discussions of the RDT&E cycle, the level of informational need, time response, item categories and other factors as they relate to the…
Evidence against global attention filters selective for absolute bar-orientation in human vision.

PubMed

Inverso, Matthew; Sun, Peng; Chubb, Charles; Wright, Charles E; Sperling, George

2016-01-01

The finding that an item of type A pops out from an array of distractors of type B typically is taken to support the inference that human vision contains a neural mechanism that is activated by items of type A but not by items of type B. Such a mechanism might be expected to yield a neural image in which items of type A produce high activation and items of type B low (or zero) activation. Access to such a neural image might further be expected to enable accurate estimation of the centroid of an ensemble of items of type A intermixed with to-be-ignored items of type B. Here, it is shown that as the number of items in stimulus displays is increased, performance in estimating the centroids of horizontal (vertical) items amid vertical (horizontal) distractors degrades much more quickly and dramatically than does performance in estimating the centroids of white (black) items among black (white) distractors. Together with previous findings, these results suggest that, although human vision does possess bottom-up neural mechanisms sensitive to abrupt local changes in bar-orientation, and although human vision does possess and utilize top-down global attention filters capable of selecting multiple items of one brightness or of one color from among others, it cannot use a top-down global attention filter capable of selecting multiple bars of a given absolute orientation and filtering bars of the opposite orientation in a centroid task.
Retrieval-induced forgetting without competition: testing the retrieval specificity assumption of the inhibition theory.

PubMed

Raaijmakers, Jeroen G W; Jakab, Emoke

2012-01-01

According to the inhibition theory of forgetting (Anderson, Journal of Memory and Language 49:415-445, 2003; Anderson, Bjork, & Bjork, Psychonomic Bulletin & Review 7:522-530, 2000), retrieval practice on a subset of target items leads to forgetting for the other, nontarget items, due to the fact that these other items interfere during the retrieval process and have to be inhibited in order to resolve the interference. In this account, retrieval-induced forgetting occurs only when competition takes place between target and nontarget items during target item practice, since only in such a case is inhibition of the nontarget items necessary. Strengthening of the target item without active retrieval should not lead to such an impairment. In two experiments, we investigated this assumption by using noncompetitive retrieval during the practice phase. We strengthened the cue-target item association during practice by recall of the category name instead of the target item, and thus eliminated competition between the different item types (as in Anderson et al., Psychonomic Bulletin & Review 7:522-530 2000). In contrast to the expectations of the inhibition theory, retrieval-induced forgetting occurred even without competition, and thus the present study does not support the retrieval specificity assumption.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

PubMed

Schweizer, Karl; Troche, Stefan

2018-02-01

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.
Revised Hammersmith Scale for spinal muscular atrophy: A SMA specific clinical outcome assessment tool.

PubMed

Ramsey, Danielle; Scoto, Mariacristina; Mayhew, Anna; Main, Marion; Mazzone, Elena S; Montes, Jacqueline; de Sanctis, Roberto; Dunaway Young, Sally; Salazar, Rachel; Glanzman, Allan M; Pasternak, Amy; Quigley, Janet; Mirek, Elizabeth; Duong, Tina; Gee, Richard; Civitello, Matthew; Tennekoon, Gihan; Pane, Marika; Pera, Maria Carmela; Bushby, Kate; Day, John; Darras, Basil T; De Vivo, Darryl; Finkel, Richard; Mercuri, Eugenio; Muntoni, Francesco

2017-01-01

Recent translational research developments in Spinal Muscular Atrophy (SMA), outcome measure design and demands from regulatory authorities require that clinical outcome assessments are 'fit for purpose'. An international collaboration (SMA REACH UK, Italian SMA Network and PNCRN USA) undertook an iterative process to address discontinuity in the recorded performance of the Hammersmith Functional Motor Scale Expanded and developed a revised functional scale using Rasch analysis, traditional psychometric techniques and the application of clinical sensibility via expert panels. Specifically, we intended to develop a psychometrically and clinically robust functional clinician rated outcome measure to assess physical abilities in weak SMA type 2 through to strong ambulant SMA type 3 patients. The final scale, the Revised Hammersmith Scale (RHS) for SMA, consisting of 36 items and two timed tests, was piloted in 138 patients with type 2 and 3 SMA in an observational cross-sectional multi-centre study across the three national networks. Rasch analysis demonstrated very good fit of all 36 items to the construct of motor performance, good reliability with a high Person Separation Index PSI 0.98, logical and hierarchical scoring in 27/36 items and excellent targeting with minimal ceiling. The RHS differentiated between clinically different groups: SMA type, World Health Organisation (WHO) categories, ambulatory status, and SMA type combined with ambulatory status (all p < 0.001). Construct and concurrent validity was also confirmed with a strong significant positive correlation with the WHO motor milestones rs = 0.860, p < 0.001. We conclude that the RHS is a psychometrically sound and versatile clinical outcome assessment to test the broad range of physical abilities of patients with type 2 and 3 SMA. Further longitudinal testing of the scale with regards change in scores over 6 and 12 months are required prior to its adoption in clinical trials.
An approach for estimating item sensitivity to within-person change over time: An illustration using the Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog).

PubMed

Dowling, N Maritza; Bolt, Daniel M; Deng, Sien

2016-12-01

When assessments are primarily used to measure change over time, it is important to evaluate items according to their sensitivity to change, specifically. Items that demonstrate good sensitivity to between-person differences at baseline may not show good sensitivity to change over time, and vice versa. In this study, we applied a longitudinal factor model of change to a widely used cognitive test designed to assess global cognitive status in dementia, and contrasted the relative sensitivity of items to change. Statistically nested models were estimated introducing distinct latent factors related to initial status differences between test-takers and within-person latent change across successive time points of measurement. Models were estimated using all available longitudinal item-level data from the Alzheimer's Disease Assessment Scale-Cognitive subscale, including participants representing the full-spectrum of disease status who were enrolled in the multisite Alzheimer's Disease Neuroimaging Initiative. Five of the 13 Alzheimer's Disease Assessment Scale-Cognitive items demonstrated noticeably higher loadings with respect to sensitivity to change. Attending to performance change on only these 5 items yielded a clearer picture of cognitive decline more consistent with theoretical expectations in comparison to the full 13-item scale. Items that show good psychometric properties in cross-sectional studies are not necessarily the best items at measuring change over time, such as cognitive decline. Applications of the methodological approach described and illustrated in this study can advance our understanding regarding the types of items that best detect fine-grained early pathological changes in cognition. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Development and Preliminary Testing of the Food Choice Priorities Survey (FCPS): Assessing the Importance of Multiple Factors on College Students' Food Choices.

PubMed

Vilaro, Melissa J; Zhou, Wenjun; Colby, Sarah E; Byrd-Bredbenner, Carol; Riggsbee, Kristin; Olfert, Melissa D; Barnett, Tracey E; Mathews, Anne E

2017-12-01

Understanding factors that influence food choice may help improve diet quality. Factors that commonly affect adults' food choices have been described, but measures that identify and assess food choice factors specific to college students are lacking. This study developed and tested the Food Choice Priorities Survey (FCPS) among college students. Thirty-seven undergraduates participated in two focus groups ( n = 19; 11 in the male-only group, 8 in the female-only group) and interviews ( n = 18) regarding typical influences on food choice. Qualitative data informed the development of survey items with a 5-point Likert-type scale (1 = not important, 5 = extremely important). An expert panel rated FCPS items for clarity, relevance, representativeness, and coverage using a content validity form. To establish test-retest reliability, 109 first-year college students completed the 14-item FCPS at two time points, 0-48 days apart ( M = 13.99, SD = 7.44). Using Cohen's weighted κ for responses within 20 days, 11 items demonstrated moderate agreement and 3 items had substantial agreement. Factor analysis revealed a three-factor structure (9 items). The FCPS is designed for college students and provides a way to determine the factors of greatest importance regarding food choices among this population. From a public health perspective, practical applications include using the FCPS to tailor health communications and behavior change interventions to factors most salient for food choices of college students.
Device Comparability of Tablets and Computers for Assessment Purposes

ERIC Educational Resources Information Center

Davis, Laurie Laughlin; Kong, Xiaojing; McBride, Yuanyuan; Morrison, Kristin M.

2017-01-01

The definition of what it means to take a test online continues to evolve with the inclusion of a broader range of item types and a wide array of devices used by students to access test content. To assure the validity and reliability of test scores for all students, device comparability research should be conducted to evaluate the impact of…
integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory.

PubMed

Tong, Pan; Coombes, Kevin R

2012-11-15

Identifying genes altered in cancer plays a crucial role in both understanding the mechanism of carcinogenesis and developing novel therapeutics. It is known that there are various mechanisms of regulation that can lead to gene dysfunction, including copy number change, methylation, abnormal expression, mutation and so on. Nowadays, all these types of alterations can be simultaneously interrogated by different types of assays. Although many methods have been proposed to identify altered genes from a single assay, there is no method that can deal with multiple assays accounting for different alteration types systematically. In this article, we propose a novel method, integration using item response theory (integIRTy), to identify altered genes by using item response theory that allows integrated analysis of multiple high-throughput assays. When applied to a single assay, the proposed method is more robust and reliable than conventional methods such as Student's t-test or the Wilcoxon rank-sum test. When used to integrate multiple assays, integIRTy can identify novel-altered genes that cannot be found by looking at individual assay separately. We applied integIRTy to three public cancer datasets (ovarian carcinoma, breast cancer, glioblastoma) for cross-assay type integration which all show encouraging results. The R package integIRTy is available at the web site http://bioinformatics.mdanderson.org/main/OOMPA:Overview. kcoombes@mdanderson.org. Supplementary data are available at Bioinformatics online.
Selecting Items for Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Linear Logistic Test Modeling with R

ERIC Educational Resources Information Center

Baghaei, Purya; Kubinger, Klaus D.

2015-01-01

The present paper gives a general introduction to the linear logistic test model (Fischer, 1973), an extension of the Rasch model with linear constraints on item parameters, along with eRm (an R package to estimate different types of Rasch models; Mair, Hatzinger, & Mair, 2014) functions to estimate the model and interpret its parameters. The…
Issues Related to Assessing Listening Ability.

ERIC Educational Resources Information Center

Mead, Nancy A.

The National Assessment of Educational Progress (NAEP) and the Speech Communication Association (SCA) initiated a pilot study to test the feasibility of assessing speaking and listening skills. A pool of 56 items was developed and then field tested at four sites which represented a variety of national regions, of size and type of cities, and of…
Construct validity of the Nutrition and Activity Knowledge Scale in a French sample of adolescents with mild to moderate intellectual disability.

PubMed

Maïano, Christophe; Bégarie, Jérôme; Morin, Alexandre J S; Garbarino, Jean-Marie; Ninot, Grégory

2010-01-01

The purpose of this study was to test the reliability (i.e. internal consistency and test-retest reliability) and construct validity (i.e. content validity, factor validity, measurement invariance, and latent mean invariance) of the Nutrition and Activity Knowledge Scale (NAKS) in a sample of French adolescents with mild to moderate Intellectual Disability (ID). A total sample of 260 adolescents (144 boys and 116 girls), aged between 12 and 18 years old, with mild to moderate ID was involved in two studies. In the first study, analysis of items' content reveals that many words from the original version were not understood or induced confusion. These items were reworded and simplified while retaining their original meaning. In the second study, results provided support for: (i) the factor validity and reliability of a 15-item French version of the NAKS; (ii) the measurement invariance of the resulting NAKS across genders and ID levels; (iii) the partial measurement invariance of the resulting NAKS across age groups and type of school placement. In addition, the latent means of the 15-item French version of the NAKS proved to be invariant across gender, age categories, and ID levels, but to vary across type of school placement (with adolescents schooled in self-contained classes from regular schools presenting higher levels of NAK than adolescents placed in specialized establishments). The present results thus provide preliminary evidence regarding the construct validity of a 15-item French version of the NAKS in a sample of adolescents with ID.
Scaling of theory-of-mind tasks.

PubMed

Wellman, Henry M; Liu, David

2004-01-01

Two studies address the sequence of understandings evident in preschoolers' developing theory of mind. The first, preliminary study provides a meta-analysis of research comparing different types of mental state understandings (e.g., desires vs. beliefs, ignorance vs. false belief). The second, primary study tests a theory-of-mind scale for preschoolers. In this study 75 children (aged 2 years, 11 months to 6 years, 6 months) were tested on 7 tasks tapping different aspects of understanding persons' mental states. Responses formed a consistent developmental progression, where for most children if they passed a later item they passed all earlier items as well, as confirmed by Guttman and Rasch measurement model analyses.

Food Choices of Minority and Low-Income Employees

PubMed Central

Levy, Douglas E.; Riis, Jason; Sonnenberg, Lillian M.; Barraclough, Susan J.; Thorndike, Anne N.

2012-01-01

Background Effective strategies are needed to address obesity, particularly among minority and low-income individuals. Purpose To test whether a two-phase point-of-purchase intervention improved food choices across racial, socioeconomic (job type) groups. Design A 9-month longitudinal study from 2009 to 2010 assessing person-level changes in purchases of healthy and unhealthy foods following sequentially introduced interventions. Data were analyzed in 2011. Setting/participants Participants were 4642 employees of a large hospital in Boston MA who were regular cafeteria patrons. Interventions The first intervention was a traffic light–style color-coded labeling system encouraging patrons to purchase healthy items (labeled green) and avoid unhealthy items (labeled red). The second intervention manipulated “choice architecture” by physically rearranging certain cafeteria items, making green-labeled items more accessible, red-labeled items less accessible. Main outcome measures Proportion of green- (or red-) labeled items purchased by an employee. Subanalyses tracked beverage purchases, including calories and price per beverage. Results Employees self-identified as white (73%), black (10%), Latino (7%), and Asian (10%). Compared to white employees, Latino and black employees purchased a higher proportion of red items at baseline (18%, 28%, and 33%, respectively, p<0.001) and a lower proportion of green (48%, 38%, and 33%, p<0.001). Labeling decreased all employees’ red item purchases (−11.2% [95% CI= −13.6%, −8.9%]) and increased green purchases (6.6% [95% CI=5.2%, 7.9%]). Red beverage purchases decreased most (−23.8% [95% CI= −28.1%, −19.6%]). The choice architecture intervention further decreased red purchases after the labeling. Intervention effects were similar across all race/ethnicity and job types (p>0.05 for interaction between race or job type and intervention). Mean calories per beverage decreased similarly over the study period for all racial groups and job types, with no increase in per-beverage spending. Conclusions Despite baseline differences in healthy food purchases, a simple color-coded labeling and choice architecture intervention improved food and beverage choices among employees from all racial and socioeconomic backgrounds. PMID:22898116
Food choices of minority and low-income employees: a cafeteria intervention.

PubMed

Levy, Douglas E; Riis, Jason; Sonnenberg, Lillian M; Barraclough, Susan J; Thorndike, Anne N

2012-09-01

Effective strategies are needed to address obesity, particularly among minority and low-income individuals. To test whether a two-phase point-of-purchase intervention improved food choices across racial, socioeconomic (job type) groups. A 9-month longitudinal study from 2009 to 2010 assessing person-level changes in purchases of healthy and unhealthy foods following sequentially introduced interventions. Data were analyzed in 2011. Participants were 4642 employees of a large hospital in Boston MA who were regular cafeteria patrons. The first intervention was a traffic light-style color-coded labeling system encouraging patrons to purchase healthy items (labeled green) and avoid unhealthy items (labeled red). The second intervention manipulated "choice architecture" by physically rearranging certain cafeteria items, making green-labeled items more accessible and red-labeled items less accessible. Proportion of green- (or red-) labeled items purchased by an employee. Subanalyses tracked beverage purchases, including calories and price per beverage. Employees self-identified as white (73%); black (10%); Latino (7%); and Asian (10%). Compared to white employees, Latino and black employees purchased a higher percentage of red items at baseline (18%, 28%, and 33%, respectively, p<0.001) and a lower percentage of green (48%, 38%, and 33%, p<0.001). Labeling decreased all employees' red item purchases (-11.2%, 95% CI= -13.6%, -8.9%) and increased green purchases (6.6%, 95% CI=5.2%, 7.9%). Red beverage purchases decreased most (-23.8%, 95% CI= -28.1%, -19.6%). The choice architecture intervention further decreased red purchases after the labeling. Intervention effects were similar across all race/ethnicity and job types (p>0.05 for interaction between race or job type and intervention). Mean calories per beverage decreased similarly over the study period for all racial groups and job types, with no increase in per-beverage spending. Despite baseline differences in healthy food purchases, a simple color-coded labeling and choice architecture intervention improved food and beverage choices among employees from all racial and socioeconomic backgrounds. Copyright © 2012 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Validation of a general measure of treatment satisfaction, the Treatment Satisfaction Questionnaire for Medication (TSQM), using a national panel study of chronic disease

PubMed Central

Atkinson, Mark J; Sinha, Anusha; Hass, Steven L; Colman, Shoshana S; Kumar, Ritesh N; Brod, Meryl; Rowland, Clayton R

2004-01-01

Background The objective of this study was to develop and psychometrically evaluate a general measure of patients' satisfaction with medication, the Treatment Satisfaction Questionnaire for Medication (TSQM). Methods The content and format of 55 initial questions were based on a formal conceptual framework, an extensive literature review, and the input from three patient focus groups. Patient interviews were used to select the most relevant questions for further evaluation (n = 31). The psychometric performance of items and resulting TSQM scales were examined using eight diverse patient groups (arthritis, asthma, major depression, type I diabetes, high cholesterol, hypertension, migraine, and psoriasis) recruited from a national longitudinal panel study of chronic illness (n = 567). Participants were then randomized to complete the test items using one of two alternate scaling methods (Visual Analogue vs. Likert-type). Results A factor analysis (principal component extraction with varimax rotation) of specific items revealed three factors (Eigenvalues > 1.7) explaining 75.6% of the total variance; namely Side effects (4 items, 28.4%, Cronbach's Alpha = .87), Effectiveness (3 items, 24.1%, Cronbach's Alpha = .85), and Convenience (3 items, 23.1%, Cronbach's Alpha = .87). A second factor analysis of more generally worded items yielded a Global Satisfaction scale (3 items, Eigenvalue = 2.3, 79.1%, Cronbach's Alpha = .85). The final four scales possessed good psychometric properties, with the Likert-type scaling method performing better than the VAS approach. Significant differences were found on the TSQM by the route of medication administration (oral, injectable, topical, inhalable), level of illness severity, and length of time on medication. Regression analyses using the TSQM scales accounted for 40–60% of variation in patients' ratings of their likelihood to persist with their current medication. Conclusion The TSQM is a psychometrically sound and valid measure of the major dimensions of patients' satisfaction with medication. Preliminary evidence suggests that the TSQM may also be a good predictor of patients' medication adherence across different types of medication and patient populations. PMID:14987333
DOE Office of Scientific and Technical Information (OSTI.GOV)

B. Gardiner; L.Graton; J.Longo

Classified removable electronic media (CREM) are tracked in several different ways at the Laboratory. To ensure greater security for CREM, we are creating a single, Laboratory-wide system to track CREM. We are researching technology that can be used to electronically tag and detect CREM, designing a database to track the movement of CREM, and planning to test the system at several locations around the Laboratory. We focus on affixing ''smart tags'' to items we want to track and installing gates at pedestrian portals to detect the entry or exit of tagged items. By means of an enterprise database, the systemmore » will track the entry and exit of tagged items into and from CREM storage vaults, vault-type rooms, access corridors, or boundaries of secure areas, as well as the identity of the person carrying an item. We are considering several options for tracking items that can give greater security, but at greater expense.« less
Effects of script types of Japanese loan words on priming performance.

PubMed

Hayashi, Chiyoko

2005-04-01

23 female undergraduate students (M=20 yr., 10 mo., SD=15 mo.) were given a word-fragment completion task, containing a study and nonstudy list. In the present study, the effect of orthographic familiarity (e.g., script type) of a test item on a word-fragment completion task was examined. The script types of word stimuli (Katakana and Hiragana) were manipulated between a study and test phase. Priming effect was greater when the script type was the same between a study and test phase than in the cross-script condition. Further, even if the script type of word stimulus was different between study and test phases, a significant priming effect was obtained when the test fragment was orthographically familiar. These results suggested that not only the consistency of the perceptual feature of the stimulus word between study and test phases, but also orthographic familiarity of the stimulus word in the test phase facilitated priming effect in a word-fragment completion test.
Popularity and Novelty Dynamics in Evolving Networks.

PubMed

Abbas, Khushnood; Shang, Mingsheng; Abbasi, Alireza; Luo, Xin; Xu, Jian Jun; Zhang, Yu-Xia

2018-04-20

Network science plays a big role in the representation of real-world phenomena such as user-item bipartite networks presented in e-commerce or social media platforms. It provides researchers with tools and techniques to solve complex real-world problems. Identifying and predicting future popularity and importance of items in e-commerce or social media platform is a challenging task. Some items gain popularity repeatedly over time while some become popular and novel only once. This work aims to identify the key-factors: popularity and novelty. To do so, we consider two types of novelty predictions: items appearing in the popular ranking list for the first time; and items which were not in the popular list in the past time window, but might have been popular before the recent past time window. In order to identify the popular items, a careful consideration of macro-level analysis is needed. In this work we propose a model, which exploits item level information over a span of time to rank the importance of the item. We considered ageing or decay effect along with the recent link-gain of the items. We test our proposed model on four various real-world datasets using four information retrieval based metrics.
Rasch model based analysis of the Force Concept Inventory

NASA Astrophysics Data System (ADS)

Planinic, Maja; Ivanjek, Lana; Susac, Ana

2010-06-01

The Force Concept Inventory (FCI) is an important diagnostic instrument which is widely used in the field of physics education research. It is therefore very important to evaluate and monitor its functioning using different tools for statistical analysis. One of such tools is the stochastic Rasch model, which enables construction of linear measures for persons and items from raw test scores and which can provide important insight in the structure and functioning of the test (how item difficulties are distributed within the test, how well the items fit the model, and how well the items work together to define the underlying construct). The data for the Rasch analysis come from the large-scale research conducted in 2006-07, which investigated Croatian high school students’ conceptual understanding of mechanics on a representative sample of 1676 students (age 17-18 years). The instrument used in research was the FCI. The average FCI score for the whole sample was found to be (27.7±0.4)% , indicating that most of the students were still non-Newtonians at the end of high school, despite the fact that physics is a compulsory subject in Croatian schools. The large set of obtained data was analyzed with the Rasch measurement computer software WINSTEPS 3.66. Since the FCI is routinely used as pretest and post-test on two very different types of population (non-Newtonian and predominantly Newtonian), an additional predominantly Newtonian sample ( N=141 , average FCI score of 64.5%) of first year students enrolled in introductory physics course at University of Zagreb was also analyzed. The Rasch model based analysis suggests that the FCI has succeeded in defining a sufficiently unidimensional construct for each population. The analysis of fit of data to the model found no grossly misfitting items which would degrade measurement. Some items with larger misfit and items with significantly different difficulties in the two samples of students do require further examination. The analysis revealed some problems with item distribution in the FCI and suggested that the FCI may function differently in non-Newtonian and predominantly Newtonian population. Some possible improvements of the test are suggested.
Development and Psychometric Evaluation of the Gay Male Sexual Difficulties Scale.

PubMed

McDonagh, Lorraine K; Stewart, Ian; Morrison, Melanie A; Morrison, Todd G

2016-08-01

Sexual difficulties (i.e., disturbances in normal sexual responding) have the potential to significantly and negatively affect men's social and psychological well-being. However, a review of published measurement tools indicates that most have limited applicability to gay men, and none offer a nuanced understanding of sexual difficulties, as experienced by members of this population. To address this omission, the Gay Male Sexual Difficulties Scale (GMSDS) was developed using a sequential mixed-methods approach. The 25-item GMSDS uses a 6-point frequency Likert-type response format and examines: difficulties with receptive and insertive anal intercourse (5 items each); erectile difficulties (4 items); foreskin difficulties (4 items); body embarrassment (4 items); and seminal fluid concerns (3 items). The measure's scale score dimensionality, assessed using both exploratory and confirmatory factor analyses, as well as scale score reliability and validity (e.g., known-groups and convergent) was tested and deemed to be satisfactory. Limitations of the current series of studies and directions for future research are discussed.
Development of a scale of executive functioning for the RBANS.

PubMed

Spencer, Robert J; Kitchen Andren, Katherine A; Tolle, Kathryn A

2018-01-01

The Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) is a cognitive battery that contains scales of several cognitive abilities, but no scale in the instrument is exclusively dedicated to executive functioning. Although the subtests allow for observation of executive-type errors, each error is of fairly low base rate, and healthy and clinical normative data are lacking on the frequency of these types of errors, making their significance difficult to interpret in isolation. The aim of this project was to create an RBANS executive errors scale (RBANS EE) with items comprised of qualitatively dysexecutive errors committed throughout the test. Participants included Veterans referred for outpatient neuropsychological testing. Items were initially selected based on theoretical literature and were retained based on item-total correlations. The RBANS EE (a percentage calculated by dividing the number of dysexecutive errors by the total number of responses) was moderately related to each of seven established measures of executive functioning and was strongly predictive of dichotomous classification of executive impairment. Thus, the scale had solid concurrent validity, justifying its use as a supplementary scale. The RBANS EE requires no additional administration time and can provide a quantified measure of otherwise unmeasured aspects of executive functioning.
Development of the pediatric quality of life inventory neurofibromatosis type 1 module items for children, adolescents and young adults: qualitative methods.

PubMed

Nutakki, Kavitha; Varni, James W; Steinbrenner, Sheila; Draucker, Claire B; Swigonski, Nancy L

2017-03-01

Health-related quality of life (HRQOL) is arguably one of the most important measures in evaluating effectiveness of clinical treatments. At present, there is no disease-specific outcome measure to assess the HRQOL of children, adolescents and young adults with Neurofibromatosis Type 1 (NF1). This study aimed to develop the items and support the content validity for the Pediatric Quality of Life Inventory™ (PedsQL™) NF1 Module for children, adolescents and young adults. The iterative process included multiphase qualitative methods including a literature review, survey of expert opinions, semi-structured interviews, cognitive interviews and pilot testing. Fifteen domains were derived from the qualitative methods, with content saturation achieved, resulting in 115 items. The domains include skin, pain, pain impact, pain management, cognitive functioning, speech, fine motor, balance, vision, perceived physical appearance, communication, worry, treatment, medicines and gastrointestinal symptoms. This study is limited because all participants are recruited from a single-site. Qualitative methods support the content validity for the PedsQL™ NF1 Module for children, adolescents and young adults. The PedsQL™ NF1 Module is now undergoing national multisite field testing for the psychometric validation of the instrument development.
Psychometric evaluation of a coping questionnaire in two independent samples of people with diabetes.

PubMed

Persson, Lars-Olof; Erichsen, Magdalena; Wändell, Per; Gåfvels, Catharina

2013-10-01

The study examines internal item/scale structure and concurrent validity of a newly developed 48-item questionnaire [General Coping Questionnaire (GCQ)] that measures 10 aspects of coping with chronic illness (self-trust, problem-reducing actions, change of values, social trust, minimization, fatalism, resignation, protest, isolation and intrusion). The tests were performed in two independent samples of persons with diabetes mellitus. The first sample consisted of 119 subjects with type I diabetes and the second sample of 184 subjects with type II diabetes. Concurrent validity was examined by comparisons with measures of health-related quality of life (SF-36), a measure of metabolic control (HbA1c) and incidence of diabetic complications. The item/scale structure was found to be similar and very good in both samples. The 10 dimensions correlated as expected with the measure of mental health, although the 'negative' dimensions of the GCQ correlated higher compared with the 'positive' dimensions. Weaker relations with metabolic control were also found in one of the samples. These tests provide further evidence that GCQ is a well-structured, relevant and reliable instrument for assessing coping reactions in chronic somatic conditions. Copyright © 2012 John Wiley & Sons, Ltd.
Measuring patient-provider communication skills in Rwanda: Selection, adaptation and assessment of psychometric properties of the Communication Assessment Tool.

PubMed

Cubaka, Vincent Kalumire; Schriver, Michael; Vedsted, Peter; Makoul, Gregory; Kallestrup, Per

2018-04-23

To identify, adapt and validate a measure for providers' communication and interpersonal skills in Rwanda. After selection, translation and piloting of the measure, structural validity, test-retest reliability, and differential item functioning were assessed. Identification and adaptation: The 14-item Communication Assessment Tool (CAT) was selected and adapted. Content validation found all items highly relevant in the local context except two, which were retained upon understanding the reasoning applied by patients. Eleven providers and 291 patients were involved in the field-testing. Confirmatory factor analysis showed a good fit for the original one factor model. Test-retest reliability assessment revealed a mean quadratic weighted Kappa = 0.81 (range: 0.69-0.89, N = 57). The average proportion of excellent scores was 15.7% (SD: 24.7, range: 9.9-21.8%, N = 180). Differential item functioning was not observed except for item 1, which focuses on greetings, for age groups (p = 0.02, N = 180). The Kinyarwanda version of CAT (K-CAT) is a reliable and valid patient-reported measure of providers' communication and interpersonal skills. K-CAT was validated on nurses and its use on other types of providers may require further validation. K-CAT is expected to be a valuable feedback tool for providers in practice and in training. Copyright © 2018 Elsevier B.V. All rights reserved.
Analysis of Skin Humidity Variation Between Sasang Types

PubMed Central

Jung, Soon-Oh; Park, Soo-Jin; Chae, Han; Park, Soo Hyun; Hwang, Minwoo; Kim, Sang-Hyuk

2009-01-01

The purpose of this study was to examine the relationship between variations in skin humidity (SH) induced by perspiration across Sasang types and to identify novel and effective Sasang classification factors. We also analyzed the responses of each Sasang type to sweating-related QSCC II items. The results revealed a significant difference in SH across gender and significant differences in SH before and after perspiration between Tae-Eum and So-Eum men. In addition, Tae-Eum women showed significant differences in SH compared with women classified as another Sasang type. Furthermore, evaluation of the items related to sweating in the QSCC II and their relationship to each constitution revealed a significant difference between Tae-Eum and other Sasang types. Overall, the results of this study indicate that there is a distinct SH difference following perspiration between Tae-Eum and other Sasang types. Such findings may aid in Sasang typology diagnostic testing with the support of further sophisticated clinical studies. PMID:19745016
Evaluation of the Clinical LOINC (Logical Observation Identifiers, Names, and Codes) Semantic Structure as a Terminology Model for Standardized Assessment Measures

PubMed Central

Bakken, Suzanne; Cimino, James J.; Haskell, Robert; Kukafka, Rita; Matsumoto, Cindi; Chan, Garrett K.; Huff, Stanley M.

2000-01-01

Objective: The purpose of this study was to test the adequacy of the Clinical LOINC (Logical Observation Identifiers, Names, and Codes) semantic structure as a terminology model for standardized assessment measures. Methods: After extension of the definitions, 1,096 items from 35 standardized assessment instruments were dissected into the elements of the Clinical LOINC semantic structure. An additional coder dissected at least one randomly selected item from each instrument. When multiple scale types occurred in a single instrument, a second coder dissected one randomly selected item representative of each scale type. Results: The results support the adequacy of the Clinical LOINC semantic structure as a terminology model for standardized assessments. Using the revised definitions, the coders were able to dissect into the elements of Clinical LOINC all the standardized assessment items in the sample instruments. Percentage agreement for each element was as follows: component, 100 percent; property, 87.8 percent; timing, 82.9 percent; system/sample, 100 percent; scale, 92.6 percent; and method, 97.6 percent. Discussion: This evaluation was an initial step toward the representation of standardized assessment items in a manner that facilitates data sharing and re-use. Further clarification of the definitions, especially those related to time and property, is required to improve inter-rater reliability and to harmonize the representations with similar items already in LOINC. PMID:11062226
Inattentional blindness and the von Restorff effect.

PubMed

Schmidt, Stephen R; Schmidt, Constance R

2015-02-01

Sometimes we fail to notice distinctive or unusual items (inattentional blindness), while other times we remember distinctive items more than expected items (the von Restorff effect). A three-factor framework is presented and tested in two experiments in an attempt to reconcile these seemingly contradictory phenomena. Memory for different types of unexpected stimuli was tested after an easy or difficult Stroop color-naming task. Highly arousing taboo words were well remembered even when the difficult Stroop task limited attentional resources. However, a conceptual isolation effect was only observed when the nature of the category change was highlighted by the Stroop task, the Stroop task was easy, and/or the isolated targets enjoyed a retrieval advantage relative to comparison targets. As proposed in the three-factor framework, the arousing qualities of the stimuli, the attentional demands of the primary task, and the relevance of isolated features at encoding and retrieval combine to produce inattentional blindness and the von Restorff effect.
An Item Gains and Losses Analysis of False Memories Suggests Critical Items Receive More Item-Specific Processing than List Items

ERIC Educational Resources Information Center

Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.

2006-01-01

In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
Preliminary psychometric testing of the Fox Simple Quality-of-Life Scale.

PubMed

Fox, Sherry

2004-06-01

Although quality of life is extensively defined as subjective and multidimensional with both affective and cognitive components, few instruments capture important dimensions of the construct, and few are both conceptually congruent and user friendly for the clinical setting. The aim of this study was to develop and test a measure that would be easy to use clinically and capture both cognitive and affective components of quality of life. Initial item sources for the Fox Simple Quality-of-Life Scale (FSQOLS) were literature-based. Thirty items were compiled for content validity assessment by a panel of expert healthcare clinicians from various disciplines, predominantly nursing. Five items were removed as a result of the review because they reflected negatively worded or redundant items. The 25-item scale was mailed to 177 people with lung, colon, and ovarian cancer in various stages. Cancer types were selected theoretically, based on similarity in prognosis, degree of symptom burden, and possible meaning and experience. Of the 145 participants, all provided complete data on the FSQOLS. Psychometric evaluation of the FSQOLS included item-total correlations, principal components analysis with varimax rotation revealing two factors explaining 50% variance, reliability estimation using alpha estimates, and item-factor correlations. The FSQOLS exhibited significant convergent validity with four popular quality-of-life instruments: the Ferrans and Powers Quality of Life Index, the Functional Assessment of Cancer Therapy Scale, the Short-Form-36 Health Survey, and the General Well-Being Scale. Content validity of the scale was explored and supported using qualitative interviews of 14 participants with lung, colon and ovarian cancer, who were a subgroup of the sample for the initial instrument testing.
Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Development and psychometric properties of a new social support scale for self-care in middle-aged patients with type II diabetes (S4-MAD)

PubMed Central

2012-01-01

Background Social support has proved to be one of the most effective factors on the success of diabetic self-care. This study aimed to develop a scale for evaluating social support for self-care in middle-aged patients (30–60 years old) with type II diabetes. Methods This was a two-phase qualitative and quantitative study. The study was conducted during 2009 to 2011 in Tehran, Iran. In the qualitative part, a sample of diabetic patients participated in four focus group discussions in order to develop a preliminary item pool. Consequently, content and face validity were performed to provide a pre-final version of the questionnaire. Then, in a quantitative study, reliability (internal consistency and test-retest analysis), validity and factor analysis (both exploratory and confirmatory) were performed to assess psychometric properties of the scale. Results A 38-item questionnaire was developed through the qualitative phase. It was reduced to a 33-item after content validity. Exploratory factor analysis loaded a 30-item with a five-factor solution (nutrition, physical activity, self monitoring of blood glucose, foot care and smoking) that jointly accounted for 72.3% of observed variance. The confirmatory factor analysis indicated a good fit to the data. The Cronbach’s alpha coefficient showed excellent internal consistency (alpha=0.94), and test-retest of the scale with 2-weeks intervals indicated an appropriate stability for the scale (ICC=0.87). Conclusion The findings showed that the designed questionnaire was a valid and reliable instrument for measuring social support for self-care in middle-aged patients with type II diabetes. It is an easy to use questionnaire and contains the most significant diabetes related behaviors that need continuous support for self-care. PMID:23190685
Adaptation, test-retest reliability, and construct validity of the Physical Activity Neighborhood Environment Scale in Nigeria (PANES-N).

PubMed

Oyeyemi, Adewale L; Sallis, James F; Oyeyemi, Adetoyeje Y; Amin, Mariam M; De Bourdeaudhuij, Ilse; Deforche, Benedicte

2013-11-01

This study adapted the Physical Activity Neighborhood Environment Scale (PANES) to the Nigerian context and assessed the test-retest reliability and construct validity of the Nigerian version (PANESN). A multidisciplinary panel of experts adapted the original PANES to reflect the built and social environment of Nigeria. The adapted PANES was subjected to cognitive testing and test retest reliability in a diverse sample of Nigerian adults (N = 132) from different neighborhood types. Intraclass Correlation Coefficients (ICC) was used to assess test-retest reliability, and construct validity was investigated with Analysis of Covariance for differences in environmental attributes between neighborhoods. Four of the 17 items on the original PANES were significantly modified, 3 were removed and 2 new items were incorporated into the final version of adapted PANES-N. Test-retest reliability was substantial to almost perfect (ICC = 0.62-1.00) for all items on the PANES-N, and residents of neighborhoods in the inner city reported higher residential density, land use mix and safety, but lower pedestrian facilities and aesthetics than did residents of government reserved area/new layout neighborhoods. The PANES-N appears promising for assessing environmental perceptions related to physical activity in Nigeria, but further testing is required to assess its applicability across Africa.

Effective communication of molecular genetic test results to primary care providers.

PubMed

Scheuner, Maren T; Edelen, Maria Orlando; Hilborne, Lee H; Lubin, Ira M

2013-06-01

We evaluated a template for molecular genetic test reports that was developed as a strategy to reduce communication errors between the laboratory and ordering clinician. We surveyed 1,600 primary care physicians to assess satisfaction, ease of use, and effectiveness of genetic test reports developed using our template and reports developed by clinical laboratories. Mean score differences of responses between the reports were compared using t-tests. Two-way analysis of variance evaluated the effect of template versus standard reports and the influence of physician characteristics. There were 396 (24%) respondents. Template reports had higher scores than the standard reports for each survey item. The gender and specialty of the physician did not influence scores; however, younger physicians gave higher scores regardless of report type. There was significant interaction between report type and whether physicians ordered or reviewed any genetic tests (none versus at least one) in the past year, P = 0.005. For each survey item assessing satisfaction, ease of use, and effectiveness, physicians gave higher ratings to genetic test reports developed with the template than standard reports used by clinical laboratories. Physicians least familiar with genetic test reports, and possibly having the greatest need for better communication, were best served by the template reports.
Filter Leaf. Operational Control Tests for Wastewater Treatment Facilities. Instructor's Manual [and] Student Workbook.

ERIC Educational Resources Information Center

Wooley, John F.

In the operation of vacuum filters and belt filters, it is desirable to evaluate the performance of different types of filter media and conditioning processes. The filter leaf test, which is used to evaluate these items, is described. Designed for individuals who have completed National Pollutant Discharge Elimination System (NPDES) level 1…
49 CFR 175.10 - Exceptions for passengers, crewmembers, and air operators.

Code of Federal Regulations, 2014 CFR

2014-10-01

... mobility aid equipped with a lithium ion battery, when carried as checked baggage, provided— (i) The lithium ion battery must be of a type that successfully passed each test in the UN Manual of Tests and... the movement of baggage, mail, service items, or other cargo; (v) Where a lithium ion battery-powered...
49 CFR 175.10 - Exceptions for passengers, crewmembers, and air operators.

Code of Federal Regulations, 2013 CFR

2013-10-01

... mobility aid equipped with a lithium ion battery, when carried as checked baggage, provided— (i) The lithium ion battery must be of a type that successfully passed each test in the UN Manual of Tests and... the movement of baggage, mail, service items, or other cargo; (v) Where a lithium ion battery-powered...
Semiotic Structure and Meaning Making: The Performance of English Language Learners on Mathematics Tests

ERIC Educational Resources Information Center

Solano-Flores, Guillermo; Barnett-Clarke, Carne; Kachchaf, Rachel R.

2013-01-01

We examined the performance of English language learners (ELLs) and non-ELLs on Grade 4 and Grade 5 mathematics content knowledge (CK) and academic language (AL) tests. CK and AL items had different semiotic loads (numbers of different types of semiotic features) and different semiotic structures (relative frequencies of different semiotic…
Detecting Different Types of Reading Difficulties: A Comparison of Tests

ERIC Educational Resources Information Center

Moore, Danielle M.; Porter, Melanie A.; Kohnen, Saskia; Castles, Anne

2012-01-01

The focus of this paper is on the assessment of the two main processes that children must acquire at the single word reading level: word recognition (lexical) and decoding (nonlexical) skills. Guided by the framework of the dual route model, this study aimed to (1) investigate the impact of item characteristics on test performance, and (2)…
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data.

PubMed

Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L J; Maris, Gunter

2016-01-01

We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two 'one-process' models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a 'two-process' model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses.
Evolution of a Test Item

ERIC Educational Resources Information Center

Spaan, Mary

2007-01-01

This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Development and psychometric testing of a barriers to HIV testing scale among individuals with HIV infection in Sweden; The Barriers to HIV testing scale-Karolinska version.

PubMed

Wiklander, Maria; Brännström, Johanna; Svedhem, Veronica; Eriksson, Lars E

2015-11-19

Barriers to HIV testing experienced by individuals at risk for HIV can result in treatment delay and further transmission of the disease. Instruments to systematically measure barriers are scarce, but could contribute to improved strategies for HIV testing. Aims of this study were to develop and test a barriers to HIV testing scale in a Swedish context. An 18-item scale was developed, based on an existing scale with addition of six new items related to fear of the disease or negative consequences of being diagnosed as HIV-infected. Items were phrased as statements about potential barriers with a three-point response format representing not important, somewhat important, and very important. The scale was evaluated regarding missing values, floor and ceiling effects, exploratory factor analysis, and internal consistencies. The questionnaire was completed by 292 adults recently diagnosed with HIV infection, of whom 7 were excluded (≥9 items missing) and 285 were included (≥12 items completed) in the analyses. The participants were 18-70 years old (mean 40.5, SD 11.5), 39 % were females and 77 % born outside Sweden. Routes of transmission were heterosexual transmission 63 %, male to male sex 20 %, intravenous drug use 5 %, blood product/transfusion 2 %, and unknown 9 %. All scale items had <3 % missing values. The data was feasible for factor analysis (KMO = 0.92) and a four-factor solution was chosen, based on level of explained common variance (58.64 %) and interpretability of factor structure. The factors were interpreted as; personal consequences, structural barriers, social and economic security, and confidentiality. Ratings on the minimum level (suggested barrier not important) were common, resulting in substantial floor effects on the scales. The scales were internally consistent (Cronbach's α 0.78-0.91). This study gives preliminary evidence of the scale being feasible, reliable and valid to identify different types of barriers to HIV testing.
The optimal sequence and selection of screening test items to predict fall risk in older disabled women: the Women's Health and Aging Study.

PubMed

Lamb, Sarah E; McCabe, Chris; Becker, Clemens; Fried, Linda P; Guralnik, Jack M

2008-10-01

Falls are a major cause of disability, dependence, and death in older people. Brief screening algorithms may be helpful in identifying risk and leading to more detailed assessment. Our aim was to determine the most effective sequence of falls screening test items from a wide selection of recommended items including self-report and performance tests, and to compare performance with other published guidelines. Data were from a prospective, age-stratified, cohort study. Participants were 1002 community-dwelling women aged 65 years old or older, experiencing at least some mild disability. Assessments of fall risk factors were conducted in participants' homes. Fall outcomes were collected at 6 monthly intervals. Algorithms were built for prediction of any fall over a 12-month period using tree classification with cross-set validation. Algorithms using performance tests provided the best prediction of fall events, and achieved moderate to strong performance when compared to commonly accepted benchmarks. The items selected by the best performing algorithm were the number of falls in the last year and, in selected subpopulations, frequency of difficulty balancing while walking, a 4 m walking speed test, body mass index, and a test of knee extensor strength. The algorithm performed better than that from the American Geriatric Society/British Geriatric Society/American Academy of Orthopaedic Surgeons and other guidance, although these findings should be treated with caution. Suggestions are made on the type, number, and sequence of tests that could be used to maximize estimation of the probability of falling in older disabled women.
Instrument Formatting with Computer Data Entry in Mind.

ERIC Educational Resources Information Center

Boser, Judith A.; And Others

Different formats for four types of research items were studied for ease of computer data entry. The types were: (1) numeric response items; (2) individual multiple choice items; (3) multiple choice items with the same response items; and (4) card column indicator placement. Each of the 13 experienced staff members of a major university's Data…
41 CFR 101-30.302 - Types of items excluded from cataloging.

Code of Federal Regulations, 2014 CFR

2014-07-01

... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...
41 CFR 101-30.302 - Types of items excluded from cataloging.

Code of Federal Regulations, 2012 CFR

2012-07-01

... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...
41 CFR 101-30.302 - Types of items excluded from cataloging.

Code of Federal Regulations, 2011 CFR

2011-07-01

... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...
41 CFR 101-30.302 - Types of items excluded from cataloging.

Code of Federal Regulations, 2010 CFR

2010-07-01

... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...
41 CFR 101-30.302 - Types of items excluded from cataloging.

Code of Federal Regulations, 2013 CFR

2013-07-01

... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...
The Effect of the Position of an Item within a Test on the Item Difficulty Value.

ERIC Educational Resources Information Center

Rubin, Lois S.; Mott, David E. W.

An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
Relevance of Item Analysis in Standardizing an Achievement Test in Teaching of Physical Science in B.Ed Syllabus

ERIC Educational Resources Information Center

Marie, S. Maria Josephine Arokia; Edannur, Sreekala

2015-01-01

This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Development of the outcome expectancy scale for self-care among periodontal disease patients.

PubMed

Kakudate, Naoki; Morita, Manabu; Fukuhara, Shunichi; Sugai, Makoto; Nagayama, Masato; Isogai, Emiko; Kawanami, Masamitsu; Chiba, Itsuo

2011-12-01

The theory of self-efficacy states that specific efficacy expectations affect behaviour. Two types of efficacy expectations are described within the theory. Self-efficacy expectations are the beliefs in the capacity to perform a specific behaviour. Outcome expectations are the beliefs that carrying out a specific behaviour will lead to a desired outcome. To develop and examine the reliability and validity of an outcome expectancy scale for self-care (OESS) among periodontal disease patients. A 34-item scale was tested on 101 patients at a dental clinic. Accuracy was improved by item analysis, and internal consistency and test-retest stability were investigated. Concurrent validity was tested by examining associations of the OESS score with the self-efficacy scale for self-care (SESS) score and plaque index score. Construct validity was examined by comparing OESS scores between periodontal patients at initial visit (group 1) and those continuing maintenance care (group 2). Item analysis identified 13 items for the OESS. Factor analysis extracted three factors: social-, oral- and self-evaluative outcome expectancy. Cronbach's alpha coefficient for the OESS was 0.90. A significant association was observed between test and retest scores, and between the OESS and SESS and plaque index scores. Further, group 2 had a significantly higher mean OESS score than group 1. We developed a 13-item OESS with high reliability and validity which may be used to assess outcome expectancy for self-care. A patient's psychological condition with regard to behaviour and affective status can be accurately evaluated using the OESS with SESS. © 2011 Blackwell Publishing Ltd.

Test item linguistic complexity and assessments for deaf students.

PubMed

Cawthon, Stephanie

2011-01-01

Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
Mental health in primary care: an evaluation using the Item Response Theory.

PubMed

Rocha, Hugo André da; Santos, Alaneir de Fátima Dos; Reis, Ilka Afonso; Santos, Marcos Antônio da Cunha; Cherchiglia, Mariângela Leal

2018-01-01

OBJECTIVE To determine the items of the Brazilian National Program for Improving Access and Quality of Primary Care that better evaluate the capacity to provide mental health care. METHODS This is a cross-sectional study carried out using the Graded Response Model of the Item Response Theory using secondary data from the second cycle of the National Program for Improving Access and Quality of Primary Care, which evaluates 30,523 primary care teams in the period from 2013 to 2014 in Brazil. The internal consistency, correlation between items, and correlation between items and the total score were tested using the Cronbach's alpha, Spearman's correlation, and point biserial coefficients, respectively. The assumptions of unidimensionality and local independence of the items were tested. Word clouds were used as one way to present the results. RESULTS The items with the greatest ability to discriminate were scheduling of the agenda according to risk stratification, keeping of records of the most serious cases of users in psychological distress, and provision of group care. The items that required a higher level of mental health care in the parameter of location were the provision of any type of group care and the provision of educational and mental health promotion activities. Total Cronbach's alpha coefficient was 0.87. The items that obtained the highest correlation with total score were the recording of the most serious cases of users in psychological distress and scheduling of the agenda according to risk stratification. The final scores obtained oscillated between -2.07 (minimum) and 1.95 (maximum). CONCLUSIONS There are important aspects in the discrimination of the capacity to provide mental health care by primary health care teams: risk stratification for care management, follow-up of the most serious cases, group care, and preventive and health promotion actions.
Concealed semantic and episodic autobiographical memory electrified.

PubMed

Ganis, Giorgio; Schendan, Haline E

2012-01-01

Electrophysiology-based concealed information tests (CIT) try to determine whether somebody possesses concealed information about a crime-related item (probe) by comparing event-related potentials (ERPs) between this item and comparison items (irrelevants). Although the broader field is sometimes referred to as "memory detection," little attention has been paid to the precise type of underlying memory involved. This study begins addressing this issue by examining the key distinction between semantic and episodic memory in the autobiographical domain within a CIT paradigm. This study also addresses the issue of whether multiple repetitions of the items over the course of the session habituate the brain responses. Participants were tested in a 3-stimulus CIT with semantic autobiographical probes (their own date of birth) and episodic autobiographical probes (a secret date learned just before the study). Results dissociated these two memory conditions on several ERP components. Semantic probes elicited a smaller frontal N2 than episodic probes, consistent with the idea that the frontal N2 decreases with greater pre-existing knowledge about the item. Likewise, semantic probes elicited a smaller central N400 than episodic probes. Semantic probes also elicited a larger P3b than episodic probes because of their richer meaning. In contrast, episodic probes elicited a larger late positive complex (LPC) than semantic probes, because of the recent episodic memory associated with them. All these ERPs showed a difference between probes and irrelevants in both memory conditions, except for the N400, which showed a difference only in the semantic condition. Finally, although repetition affected the ERPs, it did not reduce the difference between probes and irrelevants. These findings show that the type of memory associated with a probe has both theoretical and practical importance for CIT research.
Concealed semantic and episodic autobiographical memory electrified

PubMed Central

Ganis, Giorgio; Schendan, Haline E.

2013-01-01

Electrophysiology-based concealed information tests (CIT) try to determine whether somebody possesses concealed information about a crime-related item (probe) by comparing event-related potentials (ERPs) between this item and comparison items (irrelevants). Although the broader field is sometimes referred to as “memory detection,” little attention has been paid to the precise type of underlying memory involved. This study begins addressing this issue by examining the key distinction between semantic and episodic memory in the autobiographical domain within a CIT paradigm. This study also addresses the issue of whether multiple repetitions of the items over the course of the session habituate the brain responses. Participants were tested in a 3-stimulus CIT with semantic autobiographical probes (their own date of birth) and episodic autobiographical probes (a secret date learned just before the study). Results dissociated these two memory conditions on several ERP components. Semantic probes elicited a smaller frontal N2 than episodic probes, consistent with the idea that the frontal N2 decreases with greater pre-existing knowledge about the item. Likewise, semantic probes elicited a smaller central N400 than episodic probes. Semantic probes also elicited a larger P3b than episodic probes because of their richer meaning. In contrast, episodic probes elicited a larger late positive complex (LPC) than semantic probes, because of the recent episodic memory associated with them. All these ERPs showed a difference between probes and irrelevants in both memory conditions, except for the N400, which showed a difference only in the semantic condition. Finally, although repetition affected the ERPs, it did not reduce the difference between probes and irrelevants. These findings show that the type of memory associated with a probe has both theoretical and practical importance for CIT research. PMID:23355816
The Selection of Test Items for Decision Making with a Computer Adaptive Test.

ERIC Educational Resources Information Center

Spray, Judith A.; Reckase, Mark D.

The issue of test-item selection in support of decision making in adaptive testing is considered. The number of items needed to make a decision is compared for two approaches: selecting items from an item pool that are most informative at the decision point or selecting items that are most informative at the examinee's ability level. The first…
Development and psychometric evaluation of an information literacy self-efficacy survey and an information literacy knowledge test.

PubMed

Tepe, Rodger; Tepe, Chabha

2015-03-01

To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. In this test-retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. The IL self-efficacy survey demonstrated good reliability (test-retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test-retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments.
Development of a refractive error quality of life scale for Thai adults (the REQ-Thai).

PubMed

Sukhawarn, Roongthip; Wiratchai, Nonglak; Tatsanavivat, Pyatat; Pitiyanuwat, Somwung; Kanato, Manop; Srivannaboon, Sabong; Guyatt, Gordon H

2011-08-01

To develop a scale for measuring refractive error quality of life (QOL) for Thai adults. The full survey comprised 424 respondents from 5 medical centers in Bangkok and from 3 medical centers in Chiangmai, Songkla and KhonKaen provinces. Participants were emmetropes and persons with refractive correction with visual acuity of 20/30 or better An item reduction process was employed by combining 3 methods-expert opinion, impact method and item-total correlation methods. The classical reliability testing and the validity testing including convergent, discriminative and construct validity was performed. The developed questionnaire comprised 87 items in 6 dimensions: 1) quality of vision, 2) visual function, 3) social function, 4) psychological function, 5) symptoms and 6) refractive correction problems. It is the 5-level Likert scale type. The Cronbach's Alpha coefficients of its dimensions ranged from 0.756 to 0. 979. All validity testing were shown to be valid. The construct validity was validated by the confirmatory factor analysis. A short version questionnaire comprised 48 items with good reliability and validity was also developed. This is the first validated instrument for measuring refractive error quality of life for Thai adults that was developed with strong research methodology and large sample size.
A Process for Reviewing and Evaluating Generated Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2016-01-01

Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…
Group-Specific Effects of Matching Subtest Contamination on the Identification of Differential Item Functioning

ERIC Educational Resources Information Center

Keiffer, Elizabeth Ann

2011-01-01

A differential item functioning (DIF) simulation study was conducted to explore the type and level of impact that contamination had on type I error and power rates in DIF analyses when the suspect item favored the same or opposite group as the DIF items in the matching subtest. Type I error and power rates were displayed separately for the…
What's in a Topic? Exploring the Interaction between Test-Taker Age and Item Content in High-Stakes Testing

ERIC Educational Resources Information Center

Banerjee, Jayanti; Papageorgiou, Spiros

2016-01-01

The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Electrophysiological distinctions between recognition memory with and without awareness

PubMed Central

Ko, Philip C.; Duda, Bryant; Hussey, Erin P.; Ally, Brandon A.

2013-01-01

The influence of implicit memory representations on explicit recognition may help to explain cases of accurate recognition decisions made with high uncertainty. During a recognition task, implicit memory may enhance the fluency of a test item, biasing decision processes to endorse it as “old”. This model may help explain recognition-without-identification, a remarkable phenomenon in which participants make highly accurate recognition decisions despite the inability to identify the test item. The current study investigated whether recognition-without-identification for pictures elicits a similar pattern of neural activity as other types of accurate recognition decisions made with uncertainty. Further, this study also examined whether recognition-without-identification for pictures could be attained by the use of perceptual and conceptual information from memory. To accomplish this, participants studied pictures and then performed a recognition task under difficult viewing conditions while event-related potentials (ERPs) were recorded. Behavioral results showed that recognition was highly accurate even when test items could not be identified, demonstrating recognition-without identification. The behavioral performance also indicated that recognition-without-identification was mediated by both perceptual and conceptual information, independently of one another. The ERP results showed dramatically different memory related activity during the early 300 to 500 ms epoch for identified items that were studied compared to unidentified items that were studied. Similar to previous work highlighting accurate recognition without retrieval awareness, test items that were not identified, but correctly endorsed as “old,” elicited a negative posterior old/new effect (i.e., N300). In contrast, test items that were identified and correctly endorsed as “old,” elicited the classic positive frontal old/new effect (i.e., FN400). Importantly, both of these effects were elicited under conditions when participants used perceptual information to make recognition decisions. Conceptual information elicited very different ERPs than perceptual information, showing that the informational wealth of pictures can evoke multiple routes to recognition even without awareness of memory retrieval. These results are discussed within the context of current theories regarding the N300 and the FN400. PMID:23287567
Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey.

PubMed

Peyre, Hugo; Leplège, Alain; Coste, Joël

2011-03-01

Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias <2%) in all studied situations. Whereas multiple imputation and full information maximum likelihood are confirmed as reference methods, the personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.
Trading Up: Chimpanzees (Pan troglodytes) Show Self-Control Through Their Exchange Behavior

PubMed Central

Beran, Michael J.; Rossettie, Mattea S.; Parrish, Audrey E.

2015-01-01

Self-control is defined as the ability or capacity to obtain an objectively more valuable outcome rather than an objectively less valuable outcome though tolerating a longer delay or a greater effort requirement (or both) in obtaining that more valuable outcome. A number of tests have been devised to assess self-control in nonhuman animals, including exchange tasks. In this study, three chimpanzees (Pan troglodytes) participated in a delay of gratification task that required food exchange as the behavioral response that reflected self-control. The chimpanzees were offered opportunities to inhibit eating and instead exchange a currently possessed food item for a different (and sometimes better) item, often needing to exchange several food items before obtaining the highest-valued reward. We manipulated reward type, reward size, reward visibility, delay to exchange, and location of the highest-valued reward in the sequence of exchange events to compare performance within the same individuals. The chimpanzees successfully traded until obtaining the best item in most cases, although there were individual differences among participants in some variations of the test. These results support the idea that self-control is robust in chimpanzees even in contexts in which they perhaps anticipate future rewards and sustain delay of gratification until they can obtain the ultimately most-valuable item. PMID:26325355
Trading up: chimpanzees (Pan troglodytes) show self-control through their exchange behavior.

PubMed

Beran, Michael J; Rossettie, Mattea S; Parrish, Audrey E

2016-01-01

Self-control is defined as the ability or capacity to obtain an objectively more valuable outcome rather than an objectively less valuable outcome though tolerating a longer delay or a greater effort requirement (or both) in obtaining that more valuable outcome. A number of tests have been devised to assess self-control in non-human animals, including exchange tasks. In this study, three chimpanzees (Pan troglodytes) participated in a delay of gratification task that required food exchange as the behavioral response that reflected self-control. The chimpanzees were offered opportunities to inhibit eating and instead exchange a currently possessed food item for a different (and sometimes better) item, often needing to exchange several food items before obtaining the highest valued reward. We manipulated reward type, reward size, reward visibility, delay to exchange, and location of the highest valued reward in the sequence of exchange events to compare performance within the same individuals. The chimpanzees successfully traded until obtaining the best item in most cases, although there were individual differences among participants in some variations of the test. These results support the idea that self-control is robust in chimpanzees even in contexts in which they perhaps anticipate future rewards and sustain delay of gratification until they can obtain the ultimately most valuable item.
Item validity vs. item discrimination index: a redundancy?

NASA Astrophysics Data System (ADS)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
The Effect on Prospective Teachers of the Learning Environment Supported by Dynamic Statistics Software

ERIC Educational Resources Information Center

Koparan, Timur

2016-01-01

In this study, the effect on the achievement and attitudes of prospective teachers is examined. With this aim ahead, achievement test, attitude scale for statistics and interviews were used as data collection tools. The achievement test comprises 8 problems based on statistical data, and the attitude scale comprises 13 Likert-type items. The study…
Designing, Testing, and Validating an Attitudinal Survey on an Environmental Topic: A Groundwater Pollution Survey Instrument for Secondary School Students

ERIC Educational Resources Information Center

Lacosta-Gabari, Idoya; Fernandez-Manzanal, Rosario; Sanchez-Gonzalez, Dolores

2009-01-01

Research in environmental attitudes' assessment has significantly increased in recent years. The development of specific attitude scales for specific environmental problems has often been proposed. This paper describes the Groundwater Pollution Test (GPT), a 19-item survey instrument using a Likert-type scale. The survey has been used with…
Survey Response-Related Biases in Contingent Valuation: Concepts, Remedies, and Empirical Application to Valuing Aquatic Plant Management

Treesearch

Mark L. Messonnier; John C. Bergstrom; Chrisopher M. Cornwell; R. Jeff Teasley; H. Ken Cordell

2000-01-01

Simple nonresponse and selection biases that may occur in survey research such as contingent valuation applications are discussed and tested. Correction mechanisms for these types of biases are demonstrated. Results indicate the importance of testing and correcting for unit and item nonresponse bias in contingent valuation survey data. When sample nonresponse and...
Sentence comprehension in specific language impairment: a task designed to distinguish between cognitive capacity and syntactic complexity.

PubMed

Leonard, Laurence B; Deevy, Patricia; Fey, Marc E; Bredin-Oja, Shelley L

2013-04-01

This study examined sentence comprehension in children with specific language impairment (SLI) in a manner designed to separate the contribution of cognitive capacity from the effects of syntactic structure. Nineteen children with SLI, 19 typically developing children matched for age (TD-A), and 19 younger typically developing children (TD-Y) matched according to sentence comprehension test scores responded to sentence comprehension items that varied in either length or their demands on cognitive capacity, based on the nature of the foils competing with the target picture. The TD-A children were accurate across all item types. The SLI and TD-Y groups were less accurate than the TD-A group on items with greater length and, especially, on items with the greatest demands on cognitive capacity. The types of errors were consistent with failure to retain details of the sentence apart from syntactic structure. The difficulty in the more demanding conditions seemed attributable to interference. Specifically, the children with SLI and the TD-Y children appeared to have difficulty retaining details of the target sentence when the information reflected in the foils closely resembled the information in the target sentence.

Food marketing targeting youth and families: what do we know about stores where moms actually shop?

PubMed

Grigsby-Toussaint, Diana S; Rooney, Mary R

2013-01-01

Although efforts are underway to examine marketing that targets the youth and families in the retail food store environment, few studies have specifically focused on stores that families identify as their primary sites for food shopping. Between November 2011 and April 2012, we examined the frequency and types of marketing techniques of 114 packaged and nonpackaged items in 24 food stores that mothers of young children in Champaign County, IL, said they commonly frequented. Chi-square tests were used to determine whether significant differences existed between items with regard to marketing by store type, store food-assistance-program acceptance (i.e., WIC), and claims. Overall, stores accepting WIC and convenience stores had higher frequencies of marketing compared to non-WIC and grocery stores. Fruits and vegetables had the lowest frequency of any marketing claim, while salty snacks and soda had the highest frequency of marketing claims. Nutrition claims were the most common across all items, followed by taste, suggested use, fun, and convenience. Television tie-ins and cartoons were observed more often than movie tie-ins and giveaways. Our results suggest an opportunity to promote healthful items more efficiently by focusing efforts on stores where mothers actually shop.
Quantitative analysis of organizational culture in occupational health research: a theory-based validation in 30 workplaces of the organizational culture profile instrument.

PubMed

Marchand, Alain; Haines, Victor Y; Dextras-Gauthier, Julie

2013-05-04

This study advances a measurement approach for the study of organizational culture in population-based occupational health research, and tests how different organizational culture types are associated with psychological distress, depression, emotional exhaustion, and well-being. Data were collected over a sample of 1,164 employees nested in 30 workplaces. Employees completed the 26-item OCP instrument. Psychological distress was measured with the General Health Questionnaire (12-item); depression with the Beck Depression Inventory (21-item); and emotional exhaustion with five items from the Maslach Burnout Inventory general survey. Exploratory factor analysis evaluated the dimensionality of the OCP scale. Multilevel regression models estimated workplace-level variations, and the contribution of organizational culture factors to mental health and well-being after controlling for gender, age, and living with a partner. Exploratory factor analysis of OCP items revealed four factors explaining about 75% of the variance, and supported the structure of the Competing Values Framework. Factors were labeled Group, Hierarchical, Rational and Developmental. Cronbach's alphas were high (0.82-0.89). Multilevel regression analysis suggested that the four culture types varied significantly between workplaces, and correlated with mental health and well-being outcomes. The Group culture type best distinguished between workplaces and had the strongest associations with the outcomes. This study provides strong support for the use of the OCP scale for measuring organizational culture in population-based occupational health research in a way that is consistent with the Competing Values Framework. The Group organizational culture needs to be considered as a relevant factor in occupational health studies.
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

ERIC Educational Resources Information Center

Sahin, Alper; Anil, Duygu

2017-01-01

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
[Perceptions on item disclosure for the Korean medical licensing examination].

PubMed

Yang, Eunbae B

2015-09-01

This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
A new course and textbook on Physical Models of Living Systems, for science and engineering undergraduates

NASA Astrophysics Data System (ADS)

Nelson, Philip

2015-03-01

I'll describe an intermediate-level course on ``Physical Models of Living Systems.'' The only prerequisite is first-year university physics and calculus. The course is a response to rapidly growing interest among undergraduates in a broad range of science and engineering majors. Students acquire several research skills that are often not addressed in traditional courses: Basic modeling skills Probabilistic modeling skills Data analysis methods Computer programming using a general-purpose platform like MATLAB or Python Dynamical systems, particularly feedback control. These basic skills, which are relevant to nearly any field of science or engineering, are presented in the context of case studies from living systems, including: Virus dynamics Bacterial genetics and evolution of drug resistance Statistical inference Superresolution microscopy Synthetic biology Naturally evolved cellular circuits. Work supported by NSF Grants EF-0928048 and DMR-0832802.
Method for automatic measurement of second language speaking proficiency

NASA Astrophysics Data System (ADS)

Bernstein, Jared; Balogh, Jennifer

2005-04-01

Spoken language proficiency is intuitively related to effective and efficient communication in spoken interactions. However, it is difficult to derive a reliable estimate of spoken language proficiency by situated elicitation and evaluation of a person's communicative behavior. This paper describes the task structure and scoring logic of a group of fully automatic spoken language proficiency tests (for English, Spanish and Dutch) that are delivered via telephone or Internet. Test items are presented in spoken form and require a spoken response. Each test is automatically-scored and primarily based on short, decontextualized tasks that elicit integrated listening and speaking performances. The tests present several types of tasks to candidates, including sentence repetition, question answering, sentence construction, and story retelling. The spoken responses are scored according to the lexical content of the response and a set of acoustic base measures on segments, words and phrases, which are scaled with IRT methods or parametrically combined to optimize fit to human listener judgments. Most responses are isolated spoken phrases and sentences that are scored according to their linguistic content, their latency, and their fluency and pronunciation. The item development procedures and item norming are described.
Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

PubMed

Hagell, Peter; Westergren, Albert

Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).
A Review of Classical Methods of Item Analysis.

ERIC Educational Resources Information Center

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Modeling Item-Position Effects within an IRT Framework

ERIC Educational Resources Information Center

Debeer, Dries; Janssen, Rianne

2013-01-01

Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
Measuring emotion socialization in families affected by pediatric cancer: Refinement and reduction of the Parents' Beliefs about Children's Emotions questionnaire.

PubMed

Beitra, Danette; El-Behadli, Ana F; Faith, Melissa A

2018-01-01

The aim of this study is to conduct a multimethod psychometric reduction in the Parents' Beliefs about Children's Emotions (PBCE) questionnaire using an item response theory framework with a pediatric oncology sample. Participants were 216 pediatric oncology caregivers who completed the PBCE. The PBCE contains 105 items (11 subscales) rated on a 6-point Likert-type scale. We evaluated the PBCE subscale performance by applying a partial credit model in WINSTEPS. Sixty-six statistically weak items were removed, creating a 44-item PBCE questionnaire with 10 subscales and 3 response options per item. The refined scale displayed good psychometric properties and correlated .910 with the original PBCE. Additional analyses examined dimensionality, item-level (e.g. difficulty), and person-level (e.g. ethnicity) characteristics. The refined PBCE questionnaire provides better test information, improves instrument reliability, and reduces burden on families, providers, and researchers. With this improved measure, providers can more easily identify families who may benefit from psychosocial interventions targeting emotion socialization. The results of the multistep approach presented should be considered preliminary, given the limited sample size.
Learning to Fail in Aphasia: An Investigation of Error Learning in Naming

PubMed Central

Middleton, Erica L.; Schwartz, Myrna F.

2013-01-01

Purpose To determine if the naming impairment in aphasia is influenced by error learning and if error learning is related to type of retrieval strategy. Method Nine participants with aphasia and ten neurologically-intact controls named familiar proper noun concepts. When experiencing tip-of-the-tongue naming failure (TOT) in an initial TOT-elicitation phase, participants were instructed to adopt phonological or semantic self-cued retrieval strategies. In the error learning manipulation, items evoking TOT states during TOT-elicitation were randomly assigned to a short or long time condition where participants were encouraged to continue to try to retrieve the name for either 20 seconds (short interval) or 60 seconds (long). The incidence of TOT on the same items was measured on a post test after 48-hours. Error learning was defined as a higher rate of recurrent TOTs (TOT at both TOT-elicitation and post test) for items assigned to the long (versus short) time condition. Results In the phonological condition, participants with aphasia showed error learning whereas controls showed a pattern opposite to error learning. There was no evidence for error learning in the semantic condition for either group. Conclusion Error learning is operative in aphasia, but dependent on the type of strategy employed during naming failure. PMID:23816662
Cue quality and criterion setting in recognition memory.

PubMed

Kent, Christopher; Lamberts, Koen; Patton, Richard

2018-02-02

Previous studies on how people set and modify decision criteria in old-new recognition tasks (in which they have to decide whether or not a stimulus was seen in a study phase) have almost exclusively focused on properties of the study items, such as presentation frequency or study list length. In contrast, in the three studies reported here, we manipulated the quality of the test cues in a scene-recognition task, either by degrading through Gaussian blurring (Experiment 1) or by limiting presentation duration (Experiment 2 and 3). In Experiments 1 and 2, degradation of the test cue led to worse old-new discrimination. Most importantly, however, participants were more liberal in their responses to degraded cues (i.e., more likely to call the cue "old"), demonstrating strong within-list, item-by-item, criterion shifts. This liberal response bias toward degraded stimuli came at the cost of increasing the false alarm rate while maintaining a constant hit rate. Experiment 3 replicated Experiment 2 with additional stimulus types (words and faces) but did not provide accuracy feedback to participants. The criterion shifts in Experiment 3 were smaller in magnitude than Experiments 1 and 2 and varied in consistency across stimulus type, suggesting, in line with previous studies, that feedback is important for participants to shift their criteria.
Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests

ERIC Educational Resources Information Center

van der Linden, Wim J.; Ariel, Adelaide; Veldkamp, Bernard P.

2006-01-01

Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

ERIC Educational Resources Information Center

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

2016-01-01

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Item Specifications, Science Grade 8. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Item Specifications, Science Grade 6. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…

Criterion-Referenced Test Items for Welding.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…
Optimal Test Design with Rule-Based Item Generation

ERIC Educational Resources Information Center

Geerlings, Hanneke; van der Linden, Wim J.; Glas, Cees A. W.

2013-01-01

Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining…
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data

PubMed Central

Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L. J.; Maris, Gunter

2016-01-01

We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two ‘one-process’ models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a ‘two-process’ model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses. PMID:27167518
Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
An item response theory analysis of the Executive Interview and development of the EXIT8: A Project FRONTIER Study.

PubMed

Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E

2015-01-01

The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.
Criterion-Referenced Test Items for Small Engines.

ERIC Educational Resources Information Center

Herd, Amon

This notebook contains criterion-referenced test items for testing students' knowledge of small engines. The test items are based upon competencies found in the Missouri Small Engine Competency Profile. The test item bank is organized in 18 sections that cover the following duties: shop procedures; tools and equipment; fasteners; servicing fuel…
An Investigation of the Impact of Guessing on Coefficient α and Reliability

PubMed Central

2014-01-01

Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
41 CFR 101-28.304-1 - Types of items.

Code of Federal Regulations, 2010 CFR

2010-07-01

... Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 28-STORAGE AND DISTRIBUTION 28.3-Customer Supply Centers § 101-28.304-1 Types of items. Items stocked in customer supply centers...
Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André

2016-01-01

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Development and psychometric evaluation of an information literacy self-efficacy survey and an information literacy knowledge test*

PubMed Central

Tepe, Rodger; Tepe, Chabha

2015-01-01

Objective To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. Methods In this test–retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. Results The IL self-efficacy survey demonstrated good reliability (test–retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test–retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). Conclusions This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments. PMID:25517736
Integrating Test-Form Formatting into Automated Test Assembly

ERIC Educational Resources Information Center

Diao, Qi; van der Linden, Wim J.

2013-01-01

Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…
Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2013-01-01

Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
Definite Integral Automatic Analysis Mechanism Research and Development Using the "Find the Area by Integration" Unit as an Example

ERIC Educational Resources Information Center

Ting, Mu Yu

2017-01-01

Using the capabilities of expert knowledge structures, the researcher prepared test questions on the university calculus topic of "finding the area by integration." The quiz is divided into two types of multiple choice items (one out of four and one out of many). After the calculus course was taught and tested, the results revealed that…
Applications of Decision Theory to Test-Based Decision Making. Project Psychometric Aspects of Item Banking No. 23. Research Report 87-9.

ERIC Educational Resources Information Center

van der Linden, Wim J.

The use of Bayesian decision theory to solve problems in test-based decision making is discussed. Four basic decision problems are distinguished: (1) selection; (2) mastery; (3) placement; and (4) classification, the situation where each treatment has its own criterion. Each type of decision can be identified as a specific configuration of one or…
Sex Differences on the Mental Rotation Test: An Analysis of Item Types

ERIC Educational Resources Information Center

Bors, Douglas A.; Vigneau, Francois

2011-01-01

Replicating a finding now common in the literature, the present study revealed a significant difference between the performance of men (M = 19.66; SD = 5.34; SK = 0.52) and the performance of women (M = 14.85; SD = 6.06; SK = -0.38, Cohen's d = 0.90) on the Mental Rotation Test (Vandenberg & Kuse, 1978). In an attempt to identify determinants of…
Development and evaluation of a standardized registry for diabetes in pregnancy using data from the Northern, North West and East Anglia regional audits.

PubMed

Holman, N; Lewis-Barned, N; Bell, R; Stephens, H; Modder, J; Gardosi, J; Dornhorst, A; Hillson, R; Young, B; Murphy, H R

2011-07-01

To develop and evaluate a standardized data set for measuring pregnancy outcomes in women with Type 1 and Type 2 diabetes and to compare recent outcomes with those of the 2002-2003 Confidential Enquiry into Maternal and Child Health. Existing regional, national and international data sets were compared for content, consistency and validity to develop a standardized data set for diabetes in pregnancy of 46 key clinical items. The data set was tested retrospectively using data from 2007-2008 pregnancies included in three regional audits (Northern, North West and East Anglia). Obstetric and neonatal outcomes of pregnancies resulting in a stillbirth or live birth were compared with those from the same regions during 2002-2003. Details of 1381 pregnancies, 812 (58.9%) in women with Type 1 diabetes and 556 (40.3%) in women with Type 2 diabetes, were available to test the proposed standardized data set. Of the 46 data items proposed, only 16 (34.8%), predominantly the delivery and neonatal items, achieved ≥ 85% completeness. Ethnic group data were available for 746 (54.0%) pregnancies and BMI for 627 (46.5%) pregnancies. Glycaemic control data were most complete-available for 1217 pregnancies (88.1%), during the first trimester. Only 239 women (19.9%) had adequate pregnancy preparation, defined as pre-conception folic acid and first trimester HbA(1c) ≤ 7% (≤ 53 mmol/mol). Serious adverse outcome rates (major malformation and perinatal mortality) were 55/1000 and had not improved since 2002-2003. A standardized data set for diabetes in pregnancy may improve consistency of data collection and allow for more meaningful evaluation of pregnancy outcomes in women with pregestational diabetes. © 2011 The Authors. Diabetic Medicine © 2011 Diabetes UK.
Applying Item Response Theory methods to design a learning progression-based science assessment

NASA Astrophysics Data System (ADS)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all the defined boundaries. This ensures the accuracy of the classification. Third, when item threshold parameters vary a bit, the scoring rubrics and the items need to be reviewed to make the threshold parameters similar across items. This is because one important design criterion of the learning progression-based items is that ideally, a student should be at the same level across items, which means that the item threshold parameters (d1, d 2 and d3) should be similar across items. To design a learning progression-based science assessment, we need to understand whether the assessment measures a single construct or several constructs and how items are associated with the constructs being measured. Results from dimension analyses indicate that items of different carbon transforming processes measure different aspects of the carbon cycle construct. However, items of different practices assess the same construct. In general, there are high correlations among different processes or practices. It is not clear whether the strong correlations are due to the inherent links among these process/practice dimensions or due to the fact that the student sample does not show much variation in these process/practice dimensions. Future data are needed to examine the dimensionalities in terms of process/practice in detail. Finally, based on item characteristics analysis, recommendations are made to write more discriminative CR items and better OMC, MTF options. Item writers can follow these recommendations to write better learning progression-based items.
A Procedure To Detect Test Bias Present Simultaneously in Several Items.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

A statistical procedure is presented that is designed to test for unidirectional test bias existing simultaneously in several items of an ability test, based on the assumption that test bias is incipient within the two groups' ability differences. The proposed procedure--Simultaneous Item Bias (SIB)--is based on a multidimensional item response…

An Item Response Theory Model for Test Bias.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
Semi Automated Ferrous Material Scouring System (SAFMSS)

DTIC Science & Technology

2016-03-14

represent real world conditions with various shrubs or grasses entangled with the debris preventing easy removal. Our second tests were performed at...would expect on a range. Soil types, compaction, shrubs , grasses and roots as well as ferrous content vs item weight all have an effect on actual
Scaling: An Items Module

ERIC Educational Resources Information Center

Tong, Ye; Kolen, Michael J.

2010-01-01

"Scaling" is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of…
Using Reliability and Item Analysis to Evaluate a Teacher-Developed Test in Educational Measurement and Evaluation

ERIC Educational Resources Information Center

Quaigrain, Kennedy; Arhin, Ato Kwamina

2017-01-01

Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
Audio Adapted Assessment Data: Does the Addition of Audio to Written Items Modify the Item Calibration?

ERIC Educational Resources Information Center

Snyder, James

2010-01-01

This dissertation research examined the changes in item RIT calibration that occurred when adding audio to a set of currently calibrated RIT items and then placing these new items as field test items in the modified assessments on the NWEA MAP test platform. The researcher used test results from over 600 students in the Poway School District in…
Controlling the judge variable in grading essay-type items: an application of Rasch analyses to the recruitment exam for Korean public school teachers.

PubMed

Chae, S

1998-01-01

The purpose of this paper is to show how the Rasch measurement model can be used to control the effects of judge variable on the grading of essay-type items in the recruitment test for Korean teachers. Special attention is given to two aspects of judges' involvement in the grading. One is to identify a way to minimize the variation of grading due to judge severity. The other concern is to figure out a way to reduce the number of judges without threatening objectivity of ability estimates. Results from the FACETS analyses tell us not only how much grading standards vary among judges and how to adjust them but also it produces comparably reliable ability estimates with fewer judges.
41 CFR 101-27.204 - Types of shelf-life items.

Code of Federal Regulations, 2012 CFR

2012-07-01

... storage life after which the item or material is considered to be no longer usable for its primary function and should be discarded. Type II items are those for which successive reinspection dates can be...
Student science achievement and the integration of Indigenous knowledge on standardized tests

NASA Astrophysics Data System (ADS)

Dupuis, Juliann; Abrams, Eleanor

2017-09-01

In this article, we examine how American Indian students in Montana performed on standardized state science assessments when a small number of test items based upon traditional science knowledge from a cultural curriculum, "Indian Education for All", were included. Montana is the first state in the US to mandate the use of a culturally relevant curriculum in all schools and to incorporate this curriculum into a portion of the standardized assessment items. This study compares White and American Indian student test scores on these particular test items to determine how White and American Indian students perform on culturally relevant test items compared to traditional standard science test items. The connections between student achievement on adapted culturally relevant science test items versus traditional items brings valuable insights to the fields of science education, research on student assessments, and Indigenous studies.
Validation of the italian version of the 15-item Myasthenia Gravis Quality-of-Life questionnaire.

PubMed

Raggi, Alberto; Leonardi, Matilde; Ayadi, Roberta; Antozzi, Carlo; Maggi, Lorenzo; Baggi, Fulvio; Mantegazza, Renato

2017-10-01

In this study we assess the Italian version of the 15-item Myasthenia Gravis Quality-of-Life questionnaire (MG-QOL15). The validation protocol included the MG-QOL15, the 36-item Short Form (SF-36), the Besta Neurological Institute Rating Scale for Myasthenia Gravis, and the MG-Composite. We used the Cronbach α to test reliability, the Spearman correlation to test short-term test-retest, the Kruskal-Wallis test to assess differences in MG-QOL15 between patients with different disease severity, and the Wilcoxon signed-rank test to assess sensitivity to change. Seventy-two patients were enrolled in the study. The mean MG-QOL15 score was 15.2 ± 12.2, with α = 0.93 and test-retest correlation = 0.93. Compared with the SF-36, the MG-QOL15 was superior in differentiating patients with different MG types (P = 0.041) and severity (P = 0.004), showed higher sensitivity to change (P = 0.003 for improved and P = 0.024 for worsened patients), and had higher correlations with the MG-Composite (rho = 0.367 vs. -0.213 and -0.154). The Italian version of the MG-QOL15 is valid, reliable, stable, and sensitive to changes. Muscle Nerve 56: 716-720, 2017. © 2016 Wiley Periodicals, Inc.
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

ERIC Educational Resources Information Center

Aybek, Eren Can; Demirtasli, R. Nukhet

2017-01-01

This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
An Effect Size Measure for Raju's Differential Functioning for Items and Tests

ERIC Educational Resources Information Center

Wright, Keith D.; Oshima, T. C.

2015-01-01

This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah

2011-01-01

Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Influence of Fallible Item Parameters on Test Information During Adaptive Testing.

ERIC Educational Resources Information Center

Wetzel, C. Douglas; McBride, James R.

Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…
A Guide to Item Banking in Education. (Third Edition).

ERIC Educational Resources Information Center

Naccarato, Richard W.

The current status of banks of test items existing across the United States was determined through a survey conducted between September and December 1987. Item "bank" in this context does not imply that the test items are available in computerized form, but simply that "deposited" test items can be withdrawn for use. Emphasis…
Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.

PubMed

Chen, Senlin; Zhu, Xihe; Kang, Minsoo

2017-05-01

A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.
Impact on DARCOM of Nonstandard MTOE.

DTIC Science & Technology

1981-03-01

DIFFERENCES BETWEEN TOE AND MTOE TYPE NR RECORDS TOTAL QTY OF TOTAL QTY OF TOTAL DIFF ORGN IDENTIFIED ITEMS AUTH ITEMS AUTH BETWEEN TOE BY TOE BY MTOE AND MTOE...BETWEEN TOE AND MTOE TYPE NR RECORDS TOTAL QTY OF TOTAL QTY OF TOTAL DIFF ORGN IDENTIFIED ITEMS AUTH ITEMS AUTH BETWEEN TOE BY TOE BY MTOE AND MTOE 01
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

ERIC Educational Resources Information Center

Baghaei, Purya; Ravand, Hamdollah

2016-01-01

In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
Machine Shop. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This drafting criterion-referenced test item bank is keyed to the machine shop competency profile developed by industry and education professionals in Missouri. The 16 references used for drafting the test items are listed. Test items are arranged under these categories: orientation to machine shop; performing mathematical calculations; performing…
Rescuing Computerized Testing by Breaking Zipf's Law.

ERIC Educational Resources Information Center

Wainer, Howard

2000-01-01

Suggests that because of the nonlinear relationship between item usage and item security, the problems of test security posed by continuous administration of standardized tests cannot be resolved merely by increasing the size of the item pool. Offers alternative strategies to overcome these problems, distributing test items so as to avoid the…

A computer program (MACPUMP) for interactive aquifer-test analysis

USGS Publications Warehouse

Day-Lewis, F. D.; Person, M.A.; Konikow, Leonard F.

1995-01-01

This report introduces MACPUMP (Version 1.0), an aquifer-test-analysis package for use with Macintosh4 computers. The report outlines the input- data format, describes the solutions encoded in the program, explains the menu-items, and offers a tutorial illustrating the use of the program. The package reads list-directed aquifer-test data from a file, plots the data to the screen, generates and plots type curves for several different test conditions, and allows mouse-controlled curve matching. MACPUMP features pull-down menus, a simple text viewer for displaying data-files, and optional on-line help windows. This version includes the analytical solutions for nonleaky and leaky confined aquifers, using both type curves and straight-line methods, and for the analysis of single-well slug tests using type curves. An executable version of the code and sample input data sets are included on an accompanying floppy disk.
Quantitative analysis of organizational culture in occupational health research: a theory-based validation in 30 workplaces of the organizational culture profile instrument

PubMed Central

2013-01-01

Background This study advances a measurement approach for the study of organizational culture in population-based occupational health research, and tests how different organizational culture types are associated with psychological distress, depression, emotional exhaustion, and well-being. Methods Data were collected over a sample of 1,164 employees nested in 30 workplaces. Employees completed the 26-item OCP instrument. Psychological distress was measured with the General Health Questionnaire (12-item); depression with the Beck Depression Inventory (21-item); and emotional exhaustion with five items from the Maslach Burnout Inventory general survey. Exploratory factor analysis evaluated the dimensionality of the OCP scale. Multilevel regression models estimated workplace-level variations, and the contribution of organizational culture factors to mental health and well-being after controlling for gender, age, and living with a partner. Results Exploratory factor analysis of OCP items revealed four factors explaining about 75% of the variance, and supported the structure of the Competing Values Framework. Factors were labeled Group, Hierarchical, Rational and Developmental. Cronbach’s alphas were high (0.82-0.89). Multilevel regression analysis suggested that the four culture types varied significantly between workplaces, and correlated with mental health and well-being outcomes. The Group culture type best distinguished between workplaces and had the strongest associations with the outcomes. Conclusions This study provides strong support for the use of the OCP scale for measuring organizational culture in population-based occupational health research in a way that is consistent with the Competing Values Framework. The Group organizational culture needs to be considered as a relevant factor in occupational health studies. PMID:23642223
Improving the Quality of Innovative Item Types: Four Tasks for Design and Development

ERIC Educational Resources Information Center

Parshall, Cynthia G.; Harmes, J. Christine

2009-01-01

Many exam programs have begun to include innovative item types in their operational assessments. While innovative item types appear to have great promise for expanding measurement, there can also be genuine challenges to their successful implementation. In this paper we present a set of four activities that can be beneficially incorporated into…
Test-retest reliability of Brazilian version of Memorial Symptom Assessment Scale for assessing symptoms in cancer patients.

PubMed

Menezes, Josiane Roberta de; Luvisaro, Bianca Maria Oliveira; Rodrigues, Claudia Fernandes; Muzi, Camila Drumond; Guimarães, Raphael Mendonça

2017-01-01

To assess the test-retest reliability of the Memorial Symptom Assessment Scale translated and culturally adapted into Brazilian Portuguese. The scale was applied in an interview format for 190 patients with various cancers type hospitalized in clinical and surgical sectors of the Instituto Nacional de Câncer José de Alencar Gomes da Silva and reapplied in 58 patients. Data from the test-retest were double typed into a Microsoft Excel spreadsheet and analyzed by the weighted Kappa. The reliability of the scale was satisfactory in test-retest. The weighted Kappa values obtained for each scale item had to be adequate, the largest item was 0.96 and the lowest was 0.69. The Kappa subscale was also evaluated and values were 0.84 for high frequency physic symptoms, 0.81 for low frequency physical symptoms, 0.81 for psychological symptoms, and 0.78 for Global Distress Index. High level of reliability estimated suggests that the process of measurement of Memorial Symptom Assessment Scale aspects was adequate. Avaliar a confiabilidade teste-reteste da versão traduzida e adaptada culturalmente para o português do Brasil do Memorial Symptom Assessment Scale. A escala foi aplicada em forma de entrevista em 190 pacientes com diversos tipos de câncer internados nos setores clínicos e cirúrgicos do Instituto Nacional de Câncer José de Alencar Gomes da Silva e reaplicada em 58 pacientes. Os dados dos testes-retestes foram inseridos num banco de dados por dupla digitação independente em Excel e analisados pelo Kappa ponderado. A confiabilidade da escala mostrou-se satisfatória nos testes-retestes. Os valores do Kappa ponderado obtidos para cada item da escala apresentaram-se adequados, sendo o maior item de 0,96 e o menor de 0,69. Também se avaliou o Kappa das subescalas, sendo de 0,84 para sintomas físicos de alta frequência, de 0,81 para sintomas físicos de baixa frequência, de 0,81 também para sintomas psicológicos, e de 0,78 para Índice Geral de Sofrimento. Altos níveis de confiabilidade estimados permitem concluir que o processo de aferição dos itens do Memorial Symptom Assessment Scale foi adequado.
Do the Guideline Violations Influence Test Difficulty of High-Stake Test?: An Investigation on University Entrance Examination in Turkey

ERIC Educational Resources Information Center

Atalmis, Erkan Hasan

2016-01-01

Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
The CAT: A Gender-Inclusive Measure of Controlling and Abusive Tactics.

PubMed

Hamel, John; Jones, Daniel N; Dutton, Donald G; Graham-Kevan, Nicola

2015-01-01

Research has consistently found that partner violence, defined as physical abuse between married, cohabitating, or dating partners, is not the only type of abuse with long-term deleterious effects on victims. Male and female victims alike report that emotional abuse, along with controlling behaviors, are often as or more traumatic. Existing instruments used to measure emotional abuse and control have either been limited to male-perpetrated behaviors, as conceived in the well-known Duluth "Power and Control" wheel, or field tested on dating or general population samples. This study discusses the genesis and evolution of a gender-inclusive instrument, the Controlling and Abusive Tactics (CAT) Questionnaire, which was field tested on males and females with both a clinical and general population sample. For perpetration, a preliminary comparison across gender found no significant differences across gender for the great majority of items, with women reporting significantly higher rates on 9 items, and men reporting significantly higher rates on 6 items. Women reported higher rates of received abuse than men on 28 of 30 items in which gender differences were found to be significant, but both males and females reported higher victimization than perpetration rates on all items. Exploratory and confirmatory factor analyses resulted in the CAT-2, a valid and reliable instrument appropriate for clinical use by treatment providers as well as for research purposes.
Eye Movement Analysis of Information Processing under Different Testing Conditions.

ERIC Educational Resources Information Center

Dillon, Ronna F.

1985-01-01

Undergraduates were given complex figural analogies items, and eye movements were observed under three types of feedback: (1) elaborate feedback; (2) subjects verbalized their thinking and application of rules; and (3) no feedback. Both feedback conditions enhanced the rule-governed information processing during inductive reasoning. (Author/GDC)
Digit Symbol Performance in Mild Dementia and Depression.

ERIC Educational Resources Information Center

Hart, Robert P.; And Others

1987-01-01

Patients with mild dementia of the Alzheimer's type (DAT), patients with major depression, and normal control subjects completed the Wechsler Adult Intelligence Scale (WAIS) Digit Symbol test of incidental memory. Though mild DAT and depressed patients had equivalent deficits in psychomotor speed, DAT patients recalled fewer digit-symbol items.…
49 CFR 383.110 - General requirement.

Code of Federal Regulations, 2011 CFR

2011-10-01

... STANDARDS; REQUIREMENTS AND PENALTIES Required Knowledge and Skills § 383.110 General requirement. All drivers of CMVs must have the knowledge and skills necessary to operate a CMV safely as contained in this subpart. The specific types of items that a State must include in the knowledge and skills tests that it...
Item difficulty and item validity for the Children's Group Embedded Figures Test.

PubMed

Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S

1994-02-01

The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
Weapon Performance Testing and Analysis: The MODI-PAC Round, the Number 4 Lead-Shot Round, and the Flying Baton

DTIC Science & Technology

1976-01-01

items. The items tested were the MODI-PAC, a proprietary item of Reming)on Arms Company, a standard 12 - gauge round of No. 4 lead shot, and an...to refrain from testing this item. Therefore, the final selection of items for testing were (1) the MODI-PAC, (2) a standard 12 - gauge shotgun round of...The first item evaluated was the MODI-PAC5. The MOQ1-PAC which standsfor “modified impact “ is a 12 - gauge shotgun shell loaded with approximately 320
Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.

ERIC Educational Resources Information Center

Commons, C., Ed.; Martin, P., Ed.

Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…
Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.

ERIC Educational Resources Information Center

Commons, C., Ed.; Martin, P., Ed.

The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…
Interactions Between Item Content And Group Membership on Achievement Test Items.

ERIC Educational Resources Information Center

Linn, Robert L.; Harnisch, Delwyn L.

The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.

ERIC Educational Resources Information Center

Hertz, Norman R.; Chinn, Roberta N.

This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement.

PubMed

McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H

2018-01-23

Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item PRISMA diagnostic test accuracy checklist provides specific guidance for reporting of systematic reviews. The PRISMA diagnostic test accuracy guideline can facilitate the transparent reporting of reviews, and may assist in the evaluation of validity and applicability, enhance replicability of reviews, and make the results from systematic reviews of diagnostic test accuracy studies more useful.
An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Han, Kyung T.

2012-01-01

Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

ERIC Educational Resources Information Center

Arendasy, Martin E.; Sommer, Markus

2012-01-01

The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot

ERIC Educational Resources Information Center

Magis, David; Facon, Bruno

2013-01-01

Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Food Marketing Targeting Youth and Families: What Do We Know about Stores Where Moms Actually Shop?

PubMed Central

Grigsby-Toussaint, Diana S.; Rooney, Mary R.

2013-01-01

Although efforts are underway to examine marketing that targets the youth and families in the retail food store environment, few studies have specifically focused on stores that families identify as their primary sites for food shopping. Between November 2011 and April 2012, we examined the frequency and types of marketing techniques of 114 packaged and nonpackaged items in 24 food stores that mothers of young children in Champaign County, IL, said they commonly frequented. Chi-square tests were used to determine whether significant differences existed between items with regard to marketing by store type, store food-assistance-program acceptance (i.e., WIC), and claims. Overall, stores accepting WIC and convenience stores had higher frequencies of marketing compared to non-WIC and grocery stores. Fruits and vegetables had the lowest frequency of any marketing claim, while salty snacks and soda had the highest frequency of marketing claims. Nutrition claims were the most common across all items, followed by taste, suggested use, fun, and convenience. Television tie-ins and cartoons were observed more often than movie tie-ins and giveaways. Our results suggest an opportunity to promote healthful items more efficiently by focusing efforts on stores where mothers actually shop. PMID:24163701

Mental health in primary care: an evaluation using the Item Response Theory

PubMed Central

da Rocha, Hugo André; dos Santos, Alaneir de Fátima; Reis, Ilka Afonso; Santos, Marcos Antônio da Cunha; Cherchiglia, Mariângela Leal

2018-01-01

ABSTRACT OBJECTIVE To determine the items of the Brazilian National Program for Improving Access and Quality of Primary Care that better evaluate the capacity to provide mental health care. METHODS This is a cross-sectional study carried out using the Graded Response Model of the Item Response Theory using secondary data from the second cycle of the National Program for Improving Access and Quality of Primary Care, which evaluates 30,523 primary care teams in the period from 2013 to 2014 in Brazil. The internal consistency, correlation between items, and correlation between items and the total score were tested using the Cronbach’s alpha, Spearman’s correlation, and point biserial coefficients, respectively. The assumptions of unidimensionality and local independence of the items were tested. Word clouds were used as one way to present the results. RESULTS The items with the greatest ability to discriminate were scheduling of the agenda according to risk stratification, keeping of records of the most serious cases of users in psychological distress, and provision of group care. The items that required a higher level of mental health care in the parameter of location were the provision of any type of group care and the provision of educational and mental health promotion activities. Total Cronbach’s alpha coefficient was 0.87. The items that obtained the highest correlation with total score were the recording of the most serious cases of users in psychological distress and scheduling of the agenda according to risk stratification. The final scores obtained oscillated between -2.07 (minimum) and 1.95 (maximum). CONCLUSIONS There are important aspects in the discrimination of the capacity to provide mental health care by primary health care teams: risk stratification for care management, follow-up of the most serious cases, group care, and preventive and health promotion actions. PMID:29489992
Thermal Simulation Facilities Handbook.

DTIC Science & Technology

1983-02-01

tower provide incident radiation angles of 900 or less. Since each heliostat Is Individually controlled, the size of a test Item depends on application...designed such that it can be used for many other applications. (See also Section 3.) The solar furnace uses both a flat mirror ( heliostat ) that track...type solar thermal facility. It consists of four main components: (1) heliostat , (2) attenua- tor, (3) concentrator, and (4) test and control chamber
[Difference analysis among majors in medical parasitology exam papers by test item bank proposition].

PubMed

Jia, Lin-Zhi; Ya-Jun, Ma; Cao, Yi; Qian, Fen; Li, Xiang-Yu

2012-04-30

The quality index among "Medical Parasitology" exam papers and measured data for students in three majors from the university in 2010 were compared and analyzed. The exam papers were formed from the test item bank. The alpha reliability coefficients of the three exam papers were above 0.70. The knowledge structure and capacity structure of the exam papers were basically balanced. But the alpha reliability coefficients of the second major was the lowest, mainly due to quality of test items in the exam paper and the failure of revising the index of test item bank in time. This observation demonstrated that revising the test items and their index in the item bank according to the measured data can improve the quality of test item bank proposition and reduce the difference among exam papers.
The Role of Item Models in Automatic Item Generation

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2012-01-01

Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Can dual processing theory explain physics students' performance on the Force Concept Inventory?

NASA Astrophysics Data System (ADS)

Wood, Anna K.; Galloway, Ross K.; Hardy, Judy

2016-12-01

According to dual processing theory there are two types, or modes, of thinking: system 1, which involves intuitive and nonreflective thinking, and system 2, which is more deliberate and requires conscious effort and thought. The Cognitive Reflection Test (CRT) is a widely used and robust three item instrument that measures the tendency to override system 1 thinking and to engage in reflective, system 2 thinking. Each item on the CRT has an intuitive (but wrong) answer that must be rejected in order to answer the item correctly. We therefore hypothesized that performance on the CRT may give useful insights into the cognitive processes involved in learning physics, where success involves rejecting the common, intuitive ideas about the world (often called misconceptions) and instead carefully applying physical concepts. This paper presents initial results from an ongoing study examining the relationship between students' CRT scores and their performance on the Force Concept Inventory (FCI), which tests students' understanding of Newtonian mechanics. We find that a higher CRT score predicts a higher FCI score for both precourse and postcourse tests. However, we also find that the FCI normalized gain is independent of CRT score. The implications of these results are discussed.
Evaluation of the methodological quality of studies of the performance of diagnostic tests for bovine tuberculosis using QUADAS.

PubMed

Downs, Sara H; More, Simon J; Goodchild, Anthony V; Whelan, Adam O; Abernethy, Darrell A; Broughan, Jennifer M; Cameron, Angus; Cook, Alasdair J; Ricardo de la Rua-Domenech, R; Greiner, Matthias; Gunn, Jane; Nuñez-Garcia, Javier; Rhodes, Shelley; Rolfe, Simon; Sharp, Michael; Upton, Paul; Watson, Eamon; Welsh, Michael; Woolliams, John A; Clifton-Hadley, Richard S; Parry, Jessica E

2018-05-01

There has been little assessment of the methodological quality of studies measuring the performance (sensitivity and/or specificity) of diagnostic tests for animal diseases. In a systematic review, 190 studies of tests for bovine tuberculosis (bTB) in cattle (published 1934-2009) were assessed by at least one of 18 reviewers using the QUADAS (Quality Assessment of Diagnostic Accuracy Studies) checklist adapted for animal disease tests. VETQUADAS (VQ) included items measuring clarity in reporting (n = 3), internal validity (n = 9) and external validity (n = 2). A similar pattern for compliance was observed in studies of different diagnostic test types. Compliance significantly improved with year of publication for all items measuring clarity in reporting and external validity but only improved in four of the nine items measuring internal validity (p < 0.05). 107 references, of which 83 had performance data eligible for inclusion in a meta-analysis were reviewed by two reviewers. In these references, agreement between reviewers' responses was 71% for compliance, 32% for unsure and 29% for non-compliance. Mean compliance with reporting items was 2, 5.2 for internal validity and 1.5 for external validity. The index test result was described in sufficient detail in 80.1% of studies and was interpreted without knowledge of the reference standard test result in only 33.1%. Loss to follow-up was adequately explained in only 31.1% of studies. The prevalence of deficiencies observed may be due to inadequate reporting but may also reflect lack of attention to methodological issues that could bias the results of diagnostic test performance estimates. QUADAS was a useful tool for assessing and comparing the quality of studies measuring the performance of diagnostic tests but might be improved further by including explicit assessment of population sampling strategy. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Item Discrimination and Type I Error in the Detection of Differential Item Functioning

ERIC Educational Resources Information Center

Li, Yanju; Brooks, Gordon P.; Johanson, George A.

2012-01-01

In 2009, DeMars stated that when impact exists there will be Type I error inflation, especially with larger sample sizes and larger discrimination parameters for items. One purpose of this study is to present the patterns of Type I error rates using Mantel-Haenszel (MH) and logistic regression (LR) procedures when the mean ability between the…
Item-level psychometrics of the ADL instrument of the Korean National Survey on persons with physical disabilities.

PubMed

Hong, Ickpyo; Lee, Mi Jung; Kim, Moon Young; Park, Hae Yean

2017-10-01

The aim of this study is to investigate the psychometrics of the 12 items of an instrument assessing activities of daily living (ADL) using an item response theory model. A total of 648 adults with physical disabilities and having difficulties in ADLs were retrieved from the 2014 Korean National Survey on People with Disabilities. The psychometric testing included factor analysis, internal consistency, precision, and differential item functioning (DIF) across categories including sex, older age, marital status, and physical impairment area. The sample had a mean age of 69.7 years old (SD = 13.7). The majority of the sample had lower extremity impairments (62.0%) and had at least 2.1 chronic conditions. The instrument demonstrated unidimensional construct and good internal consistency (Cronbach's alpha = 0.95). The instrument precisely estimated person measures within a wide range of theta values (-2.22 logits < θ < 0.27 logits) with a reliability of 0.9. Only the changing position item demonstrated misfit (χ 2 = 36.6, df = 17, p = 0.0038), and the dressing item demonstrated DIF on the impairment type (upper extremity/others, McFadden's Pseudo R 2 > 5.0%). Our findings indicate that the dressing item would need to be modified to improve its psychometrics. Overall, the ADL instrument demonstrates good psychometrics, and thus, it may be used as a standardized instrument for measuring disability in rehabilitation contexts. However, the findings are limited to adults with physical disabilities. Future studies should replicate psychometric testing for survey respondents with other disorders and for children.
Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study

PubMed Central

2010-01-01

Background Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared. Methods Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified. Results When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods. Conclusion Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula. PMID:20338031
Calibrating the Medical Council of Canada's Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs.

PubMed

De Champlain, Andre F; Boulais, Andre-Philippe; Dallas, Andrew

2016-01-01

The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada's Qualifying Examination Part I (MCCQEI) based on item response theory. Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4. The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%). Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.
Development and testing of the KERNset: an instrument to assess the quality of telephone triage in out-of-hours primary care services.

PubMed

Smits, Marleen; Keizer, Ellen; Ram, Paul; Giesen, Paul

2017-12-02

Telephone triage is a core but vulnerable part of the care process at out-of-hours general practitioner (GP) cooperatives. In the Netherlands, different instruments have been used for assessing the quality of telephone triage. These instruments focussed mainly on communicational aspects, and less on the medical quality of triage decisions. Our aim was to develop and test a minimum set of items to assess the quality of telephone triage. A national survey among all GP cooperatives in the Netherlands was performed to examine the most important aspects of telephone triage. Next, corresponding items from existing instruments were searched on these topics. Subsequently, an expert panel judged these items on importance, completeness and formulation. The concept KERNset consisted of 24 items about the telephone conversation: 13 medical, ten communicational and one regarding both types. It was pilot tested on measurement characteristics, reliability, validity and variation between triagists. In this pilot study, 114 anonymous calls from four GP cooperatives spread across the Netherlands were judged by three out of eight raters, both internal and external raters. Cronbach's alpha was .94 for the medical items and .75 for the communicational items. Inter-rater reliability: complete agreement between the external raters was 45% and reasonable agreement 73% (difference of maximally one point on the five-point scale). Intra-rater reliability: complete agreement within raters was 55% and reasonable agreement 84%. There were hardly any differences between internal and external raters, but there were differences in strictness between individual raters. The construct validity was confirmed by the high correlation between the general impression of the call and the items of the KERNset. Of the differences within items 19% could be explained by differences between triage nurses, which means the KERNset is able to demonstrate differences between triage nurses. The KERNset can be used to assess the quality of telephone triage. The validity is good and differences between calls and between triage nurses can be measured. A more intensive training for raters could improve the reliability.
Item Review and the Rearrangement Procedure: Its Process and Its Results

ERIC Educational Resources Information Center

Papanastasiou, Elena C.

2005-01-01

Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive…
A Model-Based Method for Content Validation of Automatically Generated Test Items

ERIC Educational Resources Information Center

Zhang, Xinxin; Gierl, Mark

2016-01-01

The purpose of this study is to describe a methodology to recover the item model used to generate multiple-choice test items with a novel graph theory approach. Beginning with the generated test items and working backward to recover the original item model provides a model-based method for validating the content used to automatically generate test…
Comparison of student confidence and perceptions of biochemistry concepts using a team-based learning versus traditional lecture-based format.

PubMed

Gryka, Rebecca; Kiersma, Mary E; Frame, Tracy R; Cailor, Stephanie M; Chen, Aleda M H

To evaluate differences in student confidence and perceptions of biochemistry concepts using a team-based learning (TBL) format versus a traditional lecture-based format at two universities. Two pedagogies (TBL vs lecture-based) were utilized to deliver biochemistry concepts at two universities in a first-professional year, semester-long biochemistry course. A 21-item instrument was created and administered pre-post semester to assess changes in confidence in learning biochemistry concepts using Bandura's Social Cognitive Theory (eight items, 5-point, Likert-type) and changes in student perceptions of biochemistry utilizing the theory of planned behavior (TPB) domains (13 items, 7- point, Likert-type). Wilcoxon signed-rank tests were used to evaluate pre-post changes, and Mann Whitney U tests for differences between universities. All students (N=111) had more confidence in biochemistry concepts post-semester, but TBL students (N=53) were significantly more confident. TBL students also had greater agreement that they are expected to actively engage in science courses post-semester, according to the perceptions of biochemistry subscale. No other differences between lecture and TBL were observed post-semester. Students in a TBL course had greater gains in confidence. Since students often engage in tasks where they feel confident, TBL can be a useful pedagogy to promote student learning. Copyright © 2017 Elsevier Inc. All rights reserved.
State Assessment Program Item Banks: Model Language for Request for Proposals (RFP) and Contracts

ERIC Educational Resources Information Center

Swanson, Leonard C.

2010-01-01

This document provides recommendations for request for proposal (RFP) and contract language that state education agencies can use to specify their requirements for access to test item banks. An item bank is a repository for test items and data about those items. Item banks are used by state agency staff to view items and associated data; to…
The effect of response modality on immediate serial recall in dementia of the Alzheimer type.

PubMed

Macé, Anne-Laure; Ergis, Anne-Marie; Caza, Nicole

2012-09-01

Contrary to traditional models of verbal short-term memory (STM), psycholinguistic accounts assume that temporary retention of verbal materials is an intrinsic property of word processing. Therefore, memory performance will depend on the nature of the STM tasks, which vary according to the linguistic representations they engage. The aim of this study was to explore the effect of response modality on verbal STM performance in individuals with dementia of the Alzheimer Type (DAT), and its relationship with the patients' word-processing deficits. Twenty individuals with mild DAT and 20 controls were tested on an immediate serial recall (ISR) task using the same items across two response modalities (oral and picture pointing) and completed a detailed language assessment. When scoring of ISR performance was based on item memory regardless of item order, a response modality effect was found for all participants, indicating that they recalled more items with picture pointing than with oral response. However, this effect was less marked in patients than in controls, resulting in an interaction. Interestingly, when recall of both item and order was considered, results indicated similar performance between response modalities in controls, whereas performance was worse for pointing than for oral response in patients. Picture-naming performance was also reduced in patients relative to controls. However, in the word-to-picture matching task, a similar pattern of responses was found between groups for incorrectly named pictures of the same items. The finding of a response modality effect in item memory for all participants is compatible with the assumption that semantic influences are greater in picture pointing than in oral response, as predicted by psycholinguistic models. Furthermore, patients' performance was modulated by their word-processing deficits, showing a reduced advantage relative to controls. Overall, the response modality effect observed in this study for item memory suggests that verbal STM performance is intrinsically linked with word processing capacities in both healthy controls and individuals with mild DAT, supporting psycholinguistic models of STM.
The Impact of Receiving the Same Items on Consecutive Computer Adaptive Test Administrations.

ERIC Educational Resources Information Center

O'Neill, Thomas; Lunz, Mary E.; Thiede, Keith

2000-01-01

Studied item exposure in a computerized adaptive test when the item selection algorithm presents examinees with questions they were asked in a previous test administration. Results with 178 repeat examinees on a medical technologists' test indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate…
Helping Poor Readers Demonstrate Their Science Competence: Item Characteristics Supporting Text-Picture Integration

ERIC Educational Resources Information Center

Saß, Steffani; Schütte, Kerstin

2016-01-01

Solving test items might require abilities in test-takers other than the construct the test was designed to assess. Item and student characteristics such as item format or reading comprehension can impact the test result. This experiment is based on cognitive theories of text and picture comprehension. It examines whether integration aids, which…
Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly

ERIC Educational Resources Information Center

Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.

2013-01-01

Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…
Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis; Li, Johnson

2013-01-01

The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…

Reliability of a store observation tool in measuring availability of alcohol and selected foods.

PubMed

Cohen, Deborah A; Schoeff, Diane; Farley, Thomas A; Bluthenthal, Ricky; Scribner, Richard; Overton, Adrian

2007-11-01

Alcohol and food items can compromise or contribute to health, depending on the quantity and frequency with which they are consumed. How much people consume may be influenced by product availability and promotion in local retail stores. We developed and tested an observational tool to objectively measure in-store availability and promotion of alcoholic beverages and selected food items that have an impact on health. Trained observers visited 51 alcohol outlets in Los Angeles and southeastern Louisiana. Using a standardized instrument, two independent observations were conducted documenting the type of outlet, the availability and shelf space for alcoholic beverages and selected food items, the purchase price of standard brands, the placement of beer and malt liquor, and the amount of in-store alcohol advertising. Reliability of the instrument was excellent for measures of item availability, shelf space, and placement of malt liquor. Reliability was lower for alcohol advertising, beer placement, and items that measured the "least price" of apples and oranges. The average kappa was 0.87 for categorical items and the average intraclass correlation coefficient was 0.83 for continuous items. Overall, systematic observation of the availability and promotion of alcoholic beverages and food items was feasible, acceptable, and reliable. Measurement tools such as the one we evaluated should be useful in studies of the impact of availability of food and beverages on consumption and on health outcomes.
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

ERIC Educational Resources Information Center

Guo, Rui; Zheng, Yi; Chang, Hua-Hua

2015-01-01

An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
A Study of Cavitation Erosion

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hiromu Isaka; Masatsugu Tsutsumi; Tadashi Shiraishi

2002-07-01

The authors performed experimental study for the purpose of the following two items from a viewpoint of cavitation erosion of a cylindrical orifice in view of a problem at the letdown orifice in PWR (Pressurized Water Reactor). 1. To get the critical cavitation parameter of the cylindrical orifice to establish the design criteria for prevention of cavitation erosion, and 2. to ascertain the erosion rate in such an eventuality that the cavitation erosion occurs with the orifice made of stainless steel with precipitation hardening (17-4-Cu hardening type stainless steel), so that we confirm the appropriateness of the design criteria. Regardingmore » the 1. item, we carried out the cavitation tests to get the critical cavitation parameters inside and downstream of the orifice. The test results showed that the cavitation parameter at inception is independent of the length or the diameter of the orifice. Moreover, the design criteria of cavitation erosion of cylindrical orifices have been established. Regarding the 2. item, we tested the erosion rate under high-pressure conditions. The cavitation erosion actually occurred in the cylindrical orifice at the tests that was strongly resemble to the erosion occurred at the plant. It will be seldom to reproduce resemble cavitation erosion in a cylindrical orifice with the hard material used at plants. We could establish the criteria for preventing the cavitation erosion from the test results. (authors)« less
The promise and challenge of including multimedia items in medical licensure examinations: some insights from an empirical trial.

PubMed

Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank

2010-10-01

The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
A MATERIAL COST-MINIMIZATION ANALYSIS FOR HERNIA REPAIRS AND MINOR PROCEDURES DURING A SURGICAL MISSION IN THE DOMINICAN REPUBLIC

PubMed Central

Cavallo, Jaime A.; Ousley, Jenny; Barrett, Christopher D.; Baalman, Sara; Ward, Kyle; Borchardt, Malgorzata; Thomas, J. Ross; Perotti, Gary; Frisella, Margaret M.; Matthews, Brent D.

2013-01-01

INTRODUCTION Expenditures on material supplies and medications constitute the greatest per capita costs for surgical missions. We hypothesized that supply acquisition at nonprofit organization (NPO) costs would lead to significant cost-savings compared to supply acquisition at US academic institution costs from the provider perspective for hernia repairs and minor procedures during a surgical mission in the Dominican Republic (DR). METHODS Items acquired for a surgical mission were uniquely QR-coded for accurate consumption accounting. Both NPO and US academic institution unit costs were associated with each item in an electronic inventory system. Medication doses were recorded and QR-codes for consumed items were scanned into a record for each sampled procedure. Mean material costs and cost savings ± SDs were calculated in US dollars for each procedure type. Cost-minimization analyses between the NPO and the US academic institution platforms for each procedure type ensued using a two-tailed Wilcoxon matched-pairs test with α=0.05. Item utilization analyses generated lists of most frequently used materials by procedure type. RESULTS The mean cost savings of supply acquisition at NPO costs for each procedure type were as follows: $482.86 ± $683.79 for unilateral inguinal hernia repair (IHR, n=13); $332.46 ± $184.09 for bilateral inguinal hernia repair (BIHR, n=3); $127.26 ± $13.18 for hydrocelectomy (HC, n=9); $232.92 ± $56.49 for femoral hernia repair (FHR, n=3); $120.90 ± $30.51 for umbilical hernia repair (UHR, n=8); $36.59 ± $17.76 for minor procedures (MP, n=26); and $120.66 ± $14.61 for pediatric inguinal hernia repair (PIHR, n=7). CONCLUSION Supply acquisition at NPO costs leads to significant cost-savings compared to supply acquisition at US academic institution costs from the provider perspective for IHR, HC, UHR, MP, and PIHR during a surgical mission to DR. Item utilization analysis can generate minimum-necessary material lists for each procedure type to reproduce cost-savings for subsequent missions. PMID:24162140
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

PubMed

Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

2016-01-01

The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Psychometric Properties of the Family Support Scale Adapted for African American Women with Type 2 Diabetes Mellitus.

PubMed

Littlewood, Kerry; Cummings, Doyle M; Lutes, Lesley; Solar, Chelsey

2015-01-01

The purpose of our study was two-fold: 1) adapt and test a social support measure specific to the experiences of African American women with type 2 diabetes mellitus (T2DM); 2) examine its relationship to psychosocial measures. 200 rural African American women with uncontrolled T2DM participating in a randomized controlled trial completed surveys at baseline on their social support, empowerment, self-care, self-efficacy, depression and diabetes distress. Exploratory factor analysis and correlation analysis were conducted to test the psychometric properties of the Dunst Family Support Scale adapted for AA women with T2DM (FSS-AA T2DM) and its relationship with other psychosocial measures. The 16 items of the FSS-AA T2DM loaded onto three distinct factors: parent and spouse/partner support, community and medical support, and extended family and friends support. Reliability for the entire scale was good (Cronbach's α = .90) and was acceptable to high across these three factors (Cronbach's α of .86, .83, and .83 respectively). All three factors were significantly correlated with self-reported empowerment, self-care, self-efficacy, depression and diabetes distress, although the pattern was different for each factor. FSS-AA-T2DM showed good concurrent validity when compared with similar items on the Diabetes Distress Scale. The FSS-AA T2DM, a 16-item scale measuring social support among rural African American women with T2DM, is internally consistent and reliable. Findings support the utility of this screening tool in this population, although additional testing is needed with other groups in additional settings.
Reliability and Validity of the TIMPSI for Infants With Spinal Muscular Atrophy Type I

PubMed Central

Krosschell, Kristin J.; Maczulski, Jo Anne; Scott, Charles; King, Wendy; Hartman, Jill T.; Case, Laura E.; Viazzo-Trussell, Donata; Wood, Janine; Roman, Carolyn A.; Hecker, Eva; Meffert, Marianne; Léveillé, Maude; Kienitz, Krista; Swoboda, Kathryn J.

2014-01-01

Purpose This study examined the reliability and validity of the Test of Infant Motor Performance Screening Items (TIMPSI) in infants with type I spinal muscular atrophy (SMA). Methods After training, 12 evaluators scored 4 videos of infants with type I SMA to assess interrater reliability. Intrarater and test-retest reliability was further assessed for 9 evaluators during a SMA type I clinical trial, with 9 evaluators testing a total of 38 infants twice. Relatedness of the TIMPSI score to ability to reach and ventilatory support was also examined. Results Excellent interrater video score reliability was noted (intraclass correlation coefficient, 0.97–0.98). Intrarater reliability was excellent (intraclass correlation coefficient, 0.91–0.98) and test-retest reliability ranged from r = 0.82 to r = 0.95. The TIMPSI score was related to the ability to reach (P ≤ .05). Conclusion The TIMPSI can reliably be used to assess motor function in infants with type I SMA. In addition, the TIMPSI scores are related to the ability to reach, an important functional skill in children with type I SMA. PMID:23542189
Item Analysis in Introductory Economics Testing.

ERIC Educational Resources Information Center

Tinari, Frank D.

1979-01-01

Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)
Differential Item Functioning (DIF) among Spanish-Speaking English Language Learners (ELLs) in State Science Tests

NASA Astrophysics Data System (ADS)

Ilich, Maria O.

Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Optimizing data collection for public health decisions: a data mining approach

PubMed Central

2014-01-01

Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484
Optimizing data collection for public health decisions: a data mining approach.

PubMed

Partington, Susan N; Papakroni, Vasil; Menzies, Tim

2014-06-12

Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.
Development and evaluation of a thermochemistry concept inventory for college-level general chemistry

NASA Astrophysics Data System (ADS)

Wren, David A.

The research presented in this dissertation culminated in a 10-item Thermochemistry Concept Inventory (TCI). The development of the TCI can be divided into two main phases: qualitative studies and quantitative studies. Both phases focused on the primary stakeholders of the TCI, college-level general chemistry instructors and students. Each phase was designed to collect evidence for the validity of the interpretations and uses of TCI testing data. A central use of TCI testing data is to identify student conceptual misunderstandings, which are represented as incorrect options of multiple-choice TCI items. Therefore, quantitative and qualitative studies focused heavily on collecting evidence at the item-level, where important interpretations may be made by TCI users. Qualitative studies included student interviews (N = 28) and online expert surveys (N = 30). Think-aloud student interviews (N = 12) were used to identify conceptual misunderstandings used by students. Novice response process validity interviews (N = 16) helped provide information on how students interpreted and answered TCI items and were the basis of item revisions. Practicing general chemistry instructors (N = 18), or experts, defined boundaries of thermochemistry content included on the TCI. Once TCI items were in the later stages of development, an online version of the TCI was used in expert response process validity survey (N = 12), to provide expert feedback on item content, format and consensus of the correct answer for each item. Quantitative studies included three phases: beta testing of TCI items (N = 280), pilot testing of the a 12-item TCI (N = 485), and a large data collection using a 10-item TCI ( N = 1331). In addition to traditional classical test theory analysis, Rasch model analysis was also used for evaluation of testing data at the test and item level. The TCI was administered in both formative assessment (beta and pilot testing) and summative assessment (large data collection), with items performing well in both. One item, item K, did not have acceptable psychometric properties when the TCI was used as a quiz (summative assessment), but was retained in the final version of the TCI based on the acceptable psychometric properties displayed in pilot testing (formative assessment).
Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating. Research Report. ETS RR-12-09

ERIC Educational Resources Information Center

Li, Yanmei

2012-01-01

In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…
A Bayesian Method for the Detection of Item Preknowledge in CAT. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.

ERIC Educational Resources Information Center

McLeod, Lori D.; Lewis, Charles; Thissen, David.

With the increased use of computerized adaptive testing, which allows for continuous testing, new concerns about test security have evolved, one being the assurance that items in an item pool are safeguarded from theft. In this paper, the risk of score inflation and procedures to detect test takers using item preknowledge are explored. When test…
Using Item-Type Performance Covariance to Improve the Skill Model of an Existing Tutor

ERIC Educational Resources Information Center

Pavlik, Philip I., Jr.; Cen, Hao; Wu, Lili; Koedinger, Kenneth R.

2008-01-01

Using data from an existing pre-algebra computer-based tutor, we analyzed the covariance of item-types with the goal of describing a more effective way to assign skill labels to item-types. Analyzing covariance is important because it allows us to place the skills in a related network in which we can identify the role each skill plays in learning…
Communication preferences of chronically ill adolescents: development of an assessment instrument.

PubMed

Klosinski, Matthias G; Farin, Erik

2015-09-01

The purpose of this study was to develop and psychometrically test a patient-oriented, theory-based questionnaire to capture the communication preferences of chronically ill adolescents in provider-patient interaction. In a qualitative prestudy, patients were asked to express their preferences in focus groups. From those results and relying on previous research findings, we generated questionnaire items and in a second pretest, examined them in 1-to-1 cognitive interviews for comprehensibility and acceptance. The resultant questionnaire was then psychometrically tested in the main study on 423 chronically ill inpatient adolescents aged 12 to 17 years in 14 rehabilitation clinics in Germany. Numerous preferences were extractable from the focus-group interviews and transferred into 106 Items. Psychometric testing of the questionnaire resulted in 3 scales encompassing 27 items. These we describe as the emotional-affective communication component (EAC), instrumental communication component (IC), and adolescent-specific communication component (ASC). Confirmatory factor analysis revealed the scales EAC und IC to be good to very good, and the ASC scale as satisfactory regarding unidimensionality. The participants gave the questionnaire high marks for comprehensibility, acceptance, and relevance. The 3 scales' Cronbach's alpha falls between .78 and .92. A questionnaire with 27 items is now available for application as a psychometrically tested and simple-to-use measuring instrument. Research is still needed concerning the generalizability to other patient groups (e.g., the acutely ill or outpatients) and whether it can be tailored for use by different types of care providers or to accommodate the communication preferences of parents. (c) 2015 APA, all rights reserved.
Developing a fluid intelligence scale through a combination of Rasch modeling and cognitive psychology.

PubMed

Primi, Ricardo

2014-09-01

Ability testing has been criticized because understanding of the construct being assessed is incomplete and because the testing has not yet been satisfactorily improved in accordance with new knowledge from cognitive psychology. This article contributes to the solution of this problem through the application of item response theory and Susan Embretson's cognitive design system for test development in the development of a fluid intelligence scale. This study is based on findings from cognitive psychology; instead of focusing on the development of a test, it focuses on the definition of a variable for the creation of a criterion-referenced measure for fluid intelligence. A geometric matrix item bank with 26 items was analyzed with data from 2,797 undergraduate students. The main result was a criterion-referenced scale that was based on information from item features that were linked to cognitive components, such as storage capacity, goal management, and abstraction; this information was used to create the descriptions of selected levels of a fluid intelligence scale. The scale proposed that the levels of fluid intelligence range from the ability to solve problems containing a limited number of bits of information with obvious relationships through the ability to solve problems that involve abstract relationships under conditions that are confounded with an information overload and distraction by mixed noise. This scale can be employed in future research to provide interpretations for the measurements of the cognitive processes mastered and the types of difficulty experienced by examinees. PsycINFO Database Record (c) 2014 APA, all rights reserved.
[A study of behavior patterns between smokers and nonsmokers].

PubMed

Kim, H S

1990-04-01

Clinical and epidemiologic studies of coronary heart disease (CHD) have from time to time over the last three decades found associations between prevalence of CHD and behavioral attributes and cigarette smoking. The main purpose of this study is reduced to major risk factor of coronary heart disease through prohibition of smoking and control of behavior pattern. The subjects consisted of 120 smokers and 90 nonsmokers who were married men older than 30 years working in officers. The officers were surveyed by means of questionnaire September 26 through October 6, 1989. The Instruments used for this study was a self-administered measurement tool composed of 59 items was made through modifications of Jenkuns Activity Survey (JAS). The Data were analysed by SAS (Statistical Analysis System) program personal computer. The statistical technique used for this study were Frequency, chi 2-test, t-test, ANOVA, Pearson Correlation Coefficient. The 15 items were chosen with items above 0.3 of the factor loading in the factor analysis. In the first factor analysis 19 factors were extracted and accounted for 86% of the total variance. However when the number of factors were limited to 3 in order to derive Jenkins classification, three factors were derived. There names are Job-Involvement, Speed & Impatience, Hard-Driving. Each of them includes 21 items, 21 and 9, respectively. The results of this study were as follow: 1. The score of the smoker group and non-smoker group in Job-Involvement (t = 5.7147, p less than 0.0001), Speed & Impatience (t = 4.6756, p less than .0001), Hard-Driving (t = 8.0822, p less than .0001) and total type A behavior pattern showed statistically significant differences (t = 8.1224, p less than .0001). 2. The score of type A behavior pattern by number of cigarettes smoked daily were not statistically significant differences. 3. The score of type A behavior pattern by duration of smoking were not significant differences. It was concluded that the relationship between smokers and non-smokers of type A behavior pattern was statistically significant difference but number of cigarettes smoked daily and duration of smoking were not significant differences. Therefore this study is needed to adequate nursing intervention of type A behavior pattern in order to elevated to educational effect for prohibition of cigarette smoking.
Measuring Nurses' Value, Implementation, and Knowledge of Evidence-Based Practice: Further Psychometric Testing of the Quick-EBP-VIK Survey.

PubMed

Connor, Linda; Paul, Fiona; McCabe, Margaret; Ziniel, Sonja

2017-02-01

The Quick-EBP-VIK is a new instrument for measuring nurses' value, implementation, and knowledge of EBP. Psychometric testing was conducted in two parts. Part 1 describes the tool development and validity testing which resulted in the development of a 25-item survey after receiving ≥0.80 Item-Level Content Validity Index for both clarity and relevance. Part 2 describes psychometric testing was necessary to assess additional types of validity and reliability. The purpose of this paper is to further describe the psychometric testing of the Quick-EBP-VIK survey instrument. This descriptive study was designed to assess test-retest reliability, internal consistency and construct validity via a web-based survey. The survey instrument was e-mailed to all nurses at the study hospital. Nurses who responded to the first survey (Wave 1) received another e-mail invitation to complete the survey instrument again (Wave 2) for the purpose of assessing the test-retest reliability of the instrument. A total of 1,177 deliverable e-mails were sent to all nursing staff at one free standing pediatric hospital with Magnet ® designation in the northeast. A total of 382 nurses returned completed surveys, indicating a 32.5% response rate for Wave 1. A total of 131 nurses responded to Wave 2 indicating a response rate of 34.3%. The intraclass correlation coefficients for the items included in the final instrument ranged from 0.43 to 0.80 and were deemed sufficient. These represent a sufficient intraclass correlation coefficient. The Cronbach's Alpha values for each of the three domains are all higher than 0.7 indicating that the items of each of the measurement dimension are internally consistent. However, the composite reliability of the third domain was slightly lower than 0.7 when using Raykov's Rho. The Quick-EBP-VIK instrument has gone through rigorous comprehensive testing and has demonstrated good psychometric properties. © 2016 Sigma Theta Tau International.

Item Response Theory Models for Performance Decline during Testing

ERIC Educational Resources Information Center

Jin, Kuan-Yu; Wang, Wen-Chung

2014-01-01

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

PubMed

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Samejima Items in Multiple-Choice Tests: Identification and Implications

ERIC Educational Resources Information Center

Rahman, Nazia

2013-01-01

Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as "Samejima items") if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers…
Computerized Numerical Control Test Item Bank.

ERIC Educational Resources Information Center

Reneau, Fred; And Others

This guide contains 285 test items for use in teaching a course in computerized numerical control. All test items were reviewed, revised, and validated by incumbent workers and subject matter instructors. Items are provided for assessing student achievement in such aspects of programming and planning, setting up, and operating machines with…
EPRI/SCE testing and evaluation of electric work vehicles: Jet 500, Volkswagen Type 2, DAUG Type GM2, and Battronic Minivan

DOE Office of Scientific and Technical Information (OSTI.GOV)

McCluskey, R.K.; Arias, J.L.

1979-12-01

During the first 11 months of the EPRI/SCE Electric Vehicle Project, four electric vehicles (EVs) were tested and evaluated: the Jet Industries Electra-Van Model 500, the Volkswagen (VW) Type 2 Electrotransporter, a VW Type GM2 Transporter with DAUG electric drive, and the Battronic Minivan. The project emphasized road-testing of these vehicles to acquire data on their useful driving range, performance, and reliability. Each vehicle was driven more than 1000 miles along SCE-selected test routes to determine the effects of different terrains (level, slight grades, and steep grades), traffic conditions (one, two, three, and four stops/mile and freeway), and payload. Themore » vehicle component failures that occurred during testing are itemized and described briefly, and assessments of expected field reliability are made. Other vehicle characteristics and measurements of interest are presented. The data base on these test vehicles is intended to provide the reader an overview of the real world performance that can be expected from present-day state-of-the-art EVs.« less
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

ERIC Educational Resources Information Center

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

2013-01-01

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Robust Scale Transformation Methods in IRT True Score Equating under Common-Item Nonequivalent Groups Design

ERIC Educational Resources Information Center

He, Yong

2013-01-01

Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…
Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd; Gerritz, Kalle

1990-01-01

Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Item Structural Properties as Predictors of Item Difficulty and Item Association.

ERIC Educational Resources Information Center

Solano-Flores, Guillermo

1993-01-01

Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
Investigating Item Exposure Control Methods in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Ozturk, Nagihan Boztunc; Dogan, Nuri

2015-01-01

This study aims to investigate the effects of item exposure control methods on measurement precision and on test security under various item selection methods and item pool characteristics. In this study, the Randomesque (with item group sizes of 5 and 10), Sympson-Hetter, and Fade-Away methods were used as item exposure control methods. Moreover,…
Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

ERIC Educational Resources Information Center

Lee, Woo-yeol; Cho, Sun-Joo

2017-01-01

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Item Pool Design for an Operational Variable-Length Computerized Adaptive Test

ERIC Educational Resources Information Center

He, Wei; Reckase, Mark D.

2014-01-01

For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
Animates are better remembered than inanimates: further evidence from word and picture stimuli.

PubMed

Bonin, Patrick; Gelin, Margaux; Bugaiska, Aurélia

2014-04-01

In three experiments, we showed that animate entities are remembered better than inanimate entities. Experiment 1 revealed better recall for words denoting animate than inanimate items. Experiment 2 replicated this finding with the use of pictures. In Experiment 3, we found better recognition for animate than for inanimate words. Importantly, we also found a higher recall rate of “remember” than of “know” responses for animates, whereas the recall rates were similar for the two types of responses for inanimate items. This finding suggests that animacy enhances not only the quantity but also the quality of memory traces, through the recall of contextual details of previous experiences (i.e., episodic memory). Finally, in Experiment 4, we tested whether the animacy effect was due to animate items being richer in terms of sensory features than inanimate items. The findings provide further evidence for the functionalist view of memory championed by Nairne and coworkers (Nairne, 2010; Nairne & Pandeirada, Cognitive Psychology, 61 :1–22, 2010a, 2010b).
The Costs and Benefits of Testing and Guessing on Recognition Memory

ERIC Educational Resources Information Center

Huff, Mark J.; Balota, David A.; Hutchison, Keith A.

2016-01-01

We examined whether 2 types of interpolated tasks (i.e., retrieval-practice via free recall or guessing a missing critical item) improved final recognition for related and unrelated word lists relative to restudying or completing a filler task. Both retrieval-practice and guessing tasks improved correct recognition relative to restudy and filler…
Performance of the S - [chi][squared] Statistic for Full-Information Bifactor Models

ERIC Educational Resources Information Center

Li, Ying; Rupp, Andre A.

2011-01-01

This study investigated the Type I error rate and power of the multivariate extension of the S - [chi][squared] statistic using unidimensional and multidimensional item response theory (UIRT and MIRT, respectively) models as well as full-information bifactor (FI-bifactor) models through simulation. Manipulated factors included test length, sample…
Effects of Presentation Mode on Veridical and False Memory in Individuals with Intellectual Disability

ERIC Educational Resources Information Center

Carlin, Michael; Toglia, Michael P.; Belmonte, Colleen; DiMeglio, Chiara

2012-01-01

In the present study the effects of visual, auditory, and audio-visual presentation formats on memory for thematically constructed lists were assessed in individuals with intellectual disability and mental age-matched children. The auditory recognition test included target items, unrelated foils, and two types of semantic lures: critical related…
Connotative Meaning of Disability Labels under Standard and Ambiguous Test Conditions.

ERIC Educational Resources Information Center

Semmel, Melvyn I.

At the George Peabody College for Teachers, Nashville, Tennessee, 50 male students responded to a questionnaire concerning their reactions to individuals having mental or physical disabilities, to persons of another race, and to gifted persons. The 20 questions (scale items) focused on association with 12 types of "disabled" persons (disability…
Standardized Testing in Physics via the World Wide Web.

ERIC Educational Resources Information Center

MacIsaac, Dan; Cole, Rebecca Pollard; Cole, David M.; McCullough, Laura; Maxka, Jim

2002-01-01

Examines the differences in paper-based and web-based administrations of a commonly used assessment instrument, the Force Concept Inventory (FCI). Results demonstrated no appreciable difference on FCI scores or FCI items based on the type of administration. Concludes that the web-based administration of the FCI appears to be as efficacious as the…
A photographic method to measure food item intake. Validation in geriatric institutions.

PubMed

Pouyet, Virginie; Cuvelier, Gérard; Benattar, Linda; Giboreau, Agnès

2015-01-01

From both a clinical and research perspective, measuring food intake is an important issue in geriatric institutions. However, weighing food in this context can be complex, particularly when the items remaining on a plate (side dish, meat or fish and sauce) need to be weighed separately following consumption. A method based on photography that involves taking photographs after a meal to determine food intake consequently seems to be a good alternative. This method enables the storage of raw data so that unhurried analyses can be performed to distinguish the food items present in the images. Therefore, the aim of this paper was to validate a photographic method to measure food intake in terms of differentiating food item intake in the context of a geriatric institution. Sixty-six elderly residents took part in this study, which was performed in four French nursing homes. Four dishes of standardized portions were offered to the residents during 16 different lunchtimes. Three non-trained assessors then independently estimated both the total and specific food item intakes of the participants using images of their plates taken after the meal (photographic method) and a reference image of one plate taken before the meal. Total food intakes were also recorded by weighing the food. To test the reliability of the photographic method, agreements between different assessors and agreements among various estimates made by the same assessor were evaluated. To test the accuracy and specificity of this method, food intake estimates for the four dishes were compared with the food intakes determined using the weighed food method. To illustrate the added value of the photographic method, food consumption differences between the dishes were explained by investigating the intakes of specific food items. Although they were not specifically trained for this purpose, the results demonstrated that the assessor estimates agreed between assessors and among various estimates made by the same assessor. The results also revealed that the accuracy of this method was not dependent on the type of food studied, thus, the photographic method was not specific to a particular food type. Finally, the photographic method was able to provide more detailed data because it allowed differentiation between food item intakes. These findings clearly suggest that the photographic method is a valid and useful method to measure food intake in geriatric institutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Analyzing Item Generation with Natural Language Processing Tools for the "TOEIC"® Listening Test. Research Report. ETS RR-17-52

ERIC Educational Resources Information Center

Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin

2017-01-01

In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…

Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

PubMed

Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

2017-01-01

The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
The beneficial effect of testing: an event-related potential study

PubMed Central

Bai, Cheng-Hua; Bridger, Emma K.; Zimmer, Hubert D.; Mecklinger, Axel

2015-01-01

The enhanced memory performance for items that are tested as compared to being restudied (the testing effect) is a frequently reported memory phenomenon. According to the episodic context account of the testing effect, this beneficial effect of testing is related to a process which reinstates the previously learnt episodic information. Few studies have explored the neural correlates of this effect at the time point when testing takes place, however. In this study, we utilized the ERP correlates of successful memory encoding to address this issue, hypothesizing that if the benefit of testing is due to retrieval-related processes at test then subsequent memory effects (SMEs) should resemble the ERP correlates of retrieval-based processing in their temporal and spatial characteristics. Participants were asked to learn Swahili-German word pairs before items were presented in either a testing or a restudy condition. Memory performance was assessed immediately and 1-day later with a cued recall task. Successfully recalling items at test increased the likelihood that items were remembered over time compared to items which were only restudied. An ERP subsequent memory contrast (later remembered vs. later forgotten tested items), which reflects the engagement of processes that ensure items are recallable the next day were topographically comparable with the ERP correlate of immediate recollection (immediately remembered vs. immediately forgotten tested items). This result shows that the processes which allow items to be more memorable over time share qualitatively similar neural correlates with the processes that relate to successful retrieval at test. This finding supports the notion that testing is more beneficial than restudying on memory performance over time because of its engagement of retrieval processes, such as the re-encoding of actively retrieved memory representations. PMID:26441577
The development of a science process assessment for fourth-grade students

NASA Astrophysics Data System (ADS)

Smith, Kathleen A.; Welliver, Paul W.

In this study, a multiple-choice test entitled the Science Process Assessment was developed to measure the science process skills of students in grade four. Based on the Recommended Science Competency Continuum for Grades K to 6 for Pennsylvania Schools, this instrument measured the skills of (1) observing, (2) classifying, (3) inferring, (4) predicting, (5) measuring, (6) communicating, (7) using space/time relations, (8) defining operationally, (9) formulating hypotheses, (10) experimenting, (11) recognizing variables, (12) interpreting data, and (13) formulating models. To prepare the instrument, classroom teachers and science educators were invited to participate in two science education workshops designed to develop an item bank of test questions applicable to measuring process skill learning. Participants formed writing teams and generated 65 test items representing the 13 process skills. After a comprehensive group critique of each item, 61 items were identified for inclusion into the Science Process Assessment item bank. To establish content validity, the item bank was submitted to a select panel of science educators for the purpose of judging item acceptability. This analysis yielded 55 acceptable test items and produced the Science Process Assessment, Pilot 1. Pilot 1 was administered to 184 fourth-grade students. Students were given a copy of the test booklet; teachers read each test aloud to the students. Upon completion of this first administration, data from the item analysis yielded a reliability coefficient of 0.73. Subsequently, 40 test items were identified for the Science Process Assessment, Pilot 2. Using the test-retest method, the Science Process Assessment, Pilot 2 (Test 1 and Test 2) was administered to 113 fourth-grade students. Reliability coefficients of 0.80 and 0.82, respectively, were ascertained. The correlation between Test 1 and Test 2 was 0.77. The results of this study indicate that (1) the Science Process Assessment, Pilot 2, is a valid and reliable instrument applicable to measuring the science process skills of students in grade four, (2) using educational workshops as a means of developing item banks of test questions is viable and productive in the test development process, and (3) involving classroom teachers and science educators in the test development process is educationally efficient and effective.
PACIC Instrument: disentangling dimensions using published validation models.

PubMed

Iglesias, K; Burnand, B; Peytremann-Bridevaux, I

2014-06-01

To better understand the structure of the Patient Assessment of Chronic Illness Care (PACIC) instrument. More specifically to test all published validation models, using one single data set and appropriate statistical tools. Validation study using data from cross-sectional survey. A population-based sample of non-institutionalized adults with diabetes residing in Switzerland (canton of Vaud). French version of the 20-items PACIC instrument (5-point response scale). We conducted validation analyses using confirmatory factor analysis (CFA). The original five-dimension model and other published models were tested with three types of CFA: based on (i) a Pearson estimator of variance-covariance matrix, (ii) a polychoric correlation matrix and (iii) a likelihood estimation with a multinomial distribution for the manifest variables. All models were assessed using loadings and goodness-of-fit measures. The analytical sample included 406 patients. Mean age was 64.4 years and 59% were men. Median of item responses varied between 1 and 4 (range 1-5), and range of missing values was between 5.7 and 12.3%. Strong floor and ceiling effects were present. Even though loadings of the tested models were relatively high, the only model showing acceptable fit was the 11-item single-dimension model. PACIC was associated with the expected variables of the field. Our results showed that the model considering 11 items in a single dimension exhibited the best fit for our data. A single score, in complement to the consideration of single-item results, might be used instead of the five dimensions usually described. © The Author 2014. Published by Oxford University Press in association with the International Society for Quality in Health Care; all rights reserved.
Developing Interest in Art Scale and Determining the Relation between Personality Type of Teacher Candidates and Their Interest in Art

ERIC Educational Resources Information Center

Taskesen, Orhan

2014-01-01

The goal of this study is to develop a scale that measures individuals' interest in art and to test if there is a relation between this scale and personality types. For this aim, in the first stage of the study, a scale that can measure university students' interest in art is developed. Draft scale, which is made of 25 items, is conducted on 171…
On the Relationship Between Classical Test Theory and Item Response Theory: From One to the Other and Back.

PubMed

Raykov, Tenko; Marcoulides, George A

2016-04-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
41 CFR 101-30.300 - Scope of subpart.

Code of Federal Regulations, 2013 CFR

2013-07-01

....3-Cataloging Items of Supply § 101-30.300 Scope of subpart. This subpart prescribes the types of items to be cataloged, the types of items to be excluded from the Federal Catalog System, the responsibilities for catalog data preparation and transmission to the Defense Logistics Services Center (DLSC), and...
Objective and Subjective Cancer Knowledge Among Faith-Based Chinese Adults.

PubMed

Hou, Su-I; Liu, Ling Jie

2017-10-01

This study examined cancer knowledge between church-going younger versus older Chinese adults. Hou's 8-item validated cancer screening knowledge test (CSKT) and a new 14-item cancer warning signs test (CWST) were used to assess objective knowledge. Subjective knowledge was measured by one overall 5-point Likert scale item. A total of 372 Taiwanese and Chinese Americans from nine churches participated. Although there were no significant differences by age on either the CSKT scores (younger = 5.89 vs. older = 5.71; p = .297) or the CWST (younger = 6.27 vs. older = 5.86; p = .245), subjective knowledge was higher among older Chinese adults (younger = 2.44 vs. older = 3.05, p < .001). Older Chinese adults were also more likely to identify cancer warning signs correctly, while younger adults were more likely to identify false warning signs correctly. Results have implication on tailoring cancer knowledge type (subjective vs. objective) and content domain (screening vs. warning signs). Findings can help health educators better understand cancer education needs among Chinese adults.
Locally Dependent Linear Logistic Test Model with Person Covariates

ERIC Educational Resources Information Center

Ip, Edward H.; Smits, Dirk J. M.; De Boeck, Paul

2009-01-01

The article proposes a family of item-response models that allow the separate and independent specification of three orthogonal components: item attribute, person covariate, and local item dependence. Special interest lies in extending the linear logistic test model, which is commonly used to measure item attributes, to tests with embedded item…
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items

ERIC Educational Resources Information Center

Penfield, Randall D.

2006-01-01

This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?

ERIC Educational Resources Information Center

Jackson, Evelyn W.; And Others

1994-01-01

Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

ERIC Educational Resources Information Center

Lee, Yi-Hsuan; Zhang, Jinming

2010-01-01

This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Electronics. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…
Auto Mechanics. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Tannehill, Dana, Ed.

This document contains 546 criterion-referenced multiple choice and true or false test items for a course in auto mechanics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 35 units covering the…
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

PubMed Central

2016-01-01

Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
Reliability of the Client-Centeredness of Goal Setting (C-COGS) Scale in Acquired Brain Injury Rehabilitation.

PubMed

Doig, Emmah; Prescott, Sarah; Fleming, Jennifer; Cornwell, Petrea; Kuipers, Pim

2016-01-01

To examine the internal reliability and test-retest reliability of the Client-Centeredness of Goal Setting (C-COGS) scale. The C-COGS scale was administered to 42 participants with acquired brain injury after completion of multidisciplinary goal planning. Internal reliability of scale items was examined using item-partial total correlations and Cronbach's α coefficient. The scale was readministered within a 1-mo period to a subsample of 12 participants to examine test-retest reliability by calculating exact and close percentage agreement for each item. After examination of item-partial total correlations, test items were revised. The revised items demonstrated stronger internal consistency than the original items. Preliminary evaluation of test-retest reliability was fair, with an average exact percent agreement across all test items of 67%. Findings support the preliminary reliability of the C-COGS scale as a tool to evaluate and promote client-centered goal planning in brain injury rehabilitation. Copyright © 2016 by the American Occupational Therapy Association, Inc.
Item-Writing Guidelines for Physics

ERIC Educational Resources Information Center

Regan, Tom

2015-01-01

A teacher learning how to write test questions (test items) will almost certainly encounter item-writing guidelines--lists of item-writing do's and don'ts. Item-writing guidelines usually are presented as applicable across all assessment settings. Table I shows some guidelines that I believe to be generally applicable and two will be briefly…
Unidimensional Interpretations for Multidimensional Test Items

ERIC Educational Resources Information Center

Kahraman, Nilufer

2013-01-01

This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…
Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form

PubMed Central

Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.

2015-01-01

Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967

Repeated retrieval practice and item difficulty: does criterion learning eliminate item difficulty effects?

PubMed

Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A

2013-12-01

A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
Factors that influence search termination decisions in free recall: an examination of response type and confidence.

PubMed

Unsworth, Nash; Brewer, Gene A; Spillers, Gregory J

2011-09-01

In three experiments search termination decisions were examined as a function of response type (correct vs. incorrect) and confidence. It was found that the time between the last retrieved item and the decision to terminate search (exit latency) was related to the type of response and confidence in the last item retrieved. Participants were willing to search longer when the last retrieved item was a correct item vs. an incorrect item and when the confidence was high in the last retrieved item. It was also found that the number of errors retrieved during the recall period was related to search termination decisions such that the more errors retrieved, the more likely participants were to terminate the search. Finally, it was found that knowledge of overall search set size influenced the time needed to search for items, but did not influence search termination decisions. Copyright © 2011 Elsevier B.V. All rights reserved.
Test Bias: An Objective Definition for Test Items.

ERIC Educational Resources Information Center

Durovic, Jerry J.

A test bias definition, applicable at the item-level of a test is presented. The definition conceptually equates test bias with measuring different things in different groups, and operationally equates test bias with a difference in item fit to the Rasch Model, greater than one, between groups. It is suggested that the proposed definition avoids…
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program

PubMed Central

2013-01-01

Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program.

PubMed

Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M

2013-03-04

Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Potential Damage to Flight Hardware from MIL-STD-462 CS02 Setup

NASA Technical Reports Server (NTRS)

Harris, Patrick K.; Block, Nathan F.

2002-01-01

The MIL-STD-462 CS02 conducted susceptibility test setup, performed during electromagnetic compatibility (EMC) testing, consists of an audio transformer with the secondary used as an inductor and a large capacitor. Together, these two components form an L-type low-pass filter to minimize the injected test signal input into the power source. Some flight hardware power input configurations are not compatible with this setup and break into oscillation when powered up. This can damage flight hardware and caused a catastrophic failure to an item tested in the Goddard Space Flight Center (GSFC) Large EMC Test Facility.
Identification of high school students' ability level of constructing free body diagrams to solve restricted and structured response items in force matter

NASA Astrophysics Data System (ADS)

Rahmaniar, Andinisa; Rusnayati, Heni; Sutiadi, Asep

2017-05-01

While solving physics problem particularly in force matter, it is needed to have the ability of constructing free body diagrams which can help students to analyse every force which acts on an object, the length of its vector and the naming of its force. Mix method was used to explain the result without any special treatment to participants. The participants were high school students in first grade totals 35 students. The purpose of this study is to identify students' ability level of constructing free body diagrams in solving restricted and structured response items. Considering of two types of test, every student would be classified into four levels ability of constructing free body diagrams which is every level has different characteristic and some students were interviewed while solving test in order to know how students solve the problem. The result showed students' ability of constructing free body diagrams on restricted response items about 34.86% included in no evidence of level, 24.11% inadequate level, 29.14% needs improvement level and 4.0% adequate level. On structured response items is about 16.59% included no evidence of level, 23.99% inadequate level, 36% needs improvement level, and 13.71% adequate level. Researcher found that students who constructed free body diagrams first and constructed free body diagrams correctly were more successful in solving restricted and structured response items.
Detecting Gender Bias Through Test Item Analysis

NASA Astrophysics Data System (ADS)

González-Espada, Wilson J.

2009-03-01

Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
Performance of the likelihood ratio difference (G2 Diff) test for detecting unidimensionality in applications of the multidimensional Rasch model.

PubMed

Harrell-Williams, Leigh; Wolfe, Edward W

2014-01-01

Previous research has investigated the influence of sample size, model misspecification, test length, ability distribution offset, and generating model on the likelihood ratio difference test in applications of item response models. This study extended that research to the evaluation of dimensionality using the multidimensional random coefficients multinomial logit model (MRCMLM). Logistic regression analysis of simulated data reveal that sample size and test length have a large effect on the capacity of the LR difference test to correctly identify unidimensionality, with shorter tests and smaller sample sizes leading to smaller Type I error rates. Higher levels of simulated misfit resulted in fewer incorrect decisions than data with no or little misfit. However, Type I error rates indicate that the likelihood ratio difference test is not suitable under any of the simulated conditions for evaluating dimensionality in applications of the MRCMLM.
A comparison of Rasch item-fit and Cronbach's alpha item reduction analysis for the development of a Quality of Life scale for children and adolescents.

PubMed

Erhart, M; Hagquist, C; Auquier, P; Rajmil, L; Power, M; Ravens-Sieberer, U

2010-07-01

This study compares item reduction analysis based on classical test theory (maximizing Cronbach's alpha - approach A), with analysis based on the Rasch Partial Credit Model item-fit (approach B), as applied to children and adolescents' health-related quality of life (HRQoL) items. The reliability and structural, cross-cultural and known-group validity of the measures were examined. Within the European KIDSCREEN project, 3019 children and adolescents (8-18 years) from seven European countries answered 19 HRQoL items of the Physical Well-being dimension of a preliminary KIDSCREEN instrument. The Cronbach's alpha and corrected item total correlation (approach A) were compared with infit mean squares and the Q-index item-fit derived according to a partial credit model (approach B). Cross-cultural differential item functioning (DIF ordinal logistic regression approach), structural validity (confirmatory factor analysis and residual correlation) and relative validity (RV) for socio-demographic and health-related factors were calculated for approaches (A) and (B). Approach (A) led to the retention of 13 items, compared with 11 items with approach (B). The item overlap was 69% for (A) and 78% for (B). The correlation coefficient of the summated ratings was 0.93. The Cronbach's alpha was similar for both versions [0.86 (A); 0.85 (B)]. Both approaches selected some items that are not strictly unidimensional and items displaying DIF. RV ratios favoured (A) with regard to socio-demographic aspects. Approach (B) was superior in RV with regard to health-related aspects. Both types of item reduction analysis should be accompanied by additional analyses. Neither of the two approaches was universally superior with regard to cultural, structural and known-group validity. However, the results support the usability of the Rasch method for developing new HRQoL measures for children and adolescents.
Estimating Total-test Scores from Partial Scores in a Matrix Sampling Design.

ERIC Educational Resources Information Center

Sachar, Jane; Suppes, Patrick

It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars

PubMed Central

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
On the Performance of the Marginal Homogeneity Test to Detect Rater Drift.

PubMed

Sgammato, Adrienne; Donoghue, John R

2018-06-01

When constructed response items are administered repeatedly, "trend scoring" can be used to test for rater drift. In trend scoring, raters rescore responses from the previous administration. Two simulation studies evaluated the utility of Stuart's Q measure of marginal homogeneity as a way of evaluating rater drift when monitoring trend scoring. In the first study, data were generated based on trend scoring tables obtained from an operational assessment. The second study tightly controlled table margins to disentangle certain features present in the empirical data. In addition to Q , the paired t test was included as a comparison, because of its widespread use in monitoring trend scoring. Sample size, number of score categories, interrater agreement, and symmetry/asymmetry of the margins were manipulated. For identical margins, both statistics had good Type I error control. For a unidirectional shift in margins, both statistics had good power. As expected, when shifts in the margins were balanced across categories, the t test had little power. Q demonstrated good power for all conditions and identified almost all items identified by the t test. Q shows substantial promise for monitoring of trend scoring.
Modeling Item-Level and Step-Level Invariance Effects in Polytomous Items Using the Partial Credit Model

ERIC Educational Resources Information Center

Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D.

2012-01-01

Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…
Science Library of Test Items. Volume Two.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

The second volume of test items in the Science Library of Test Items is intended as a resource to assist teachers in implementing and evaluating science courses in the first 4 years of Australian secondary school. The items were selected from questions submitted to the School Certificate Development Unit by teachers in New South Wales. Only the…
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.

ERIC Educational Resources Information Center

Brutten, Sheila R.; And Others

A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing

ERIC Educational Resources Information Center

Yen, Yung-Chin; Ho, Rong-Guey; Liao, Wen-Wei; Chen, Li-Ju

2012-01-01

In a test, the testing score would be closer to examinee's actual ability when careless mistakes were corrected. In CAT, however, changing the answer of one item in CAT might cause the following items no longer appropriate for estimating the examinee's ability. These inappropriate items in a reviewable CAT might in turn introduce bias in ability…
Comparing and Combining Dichotomous and Polytomous Items with SPRT Procedure in Computerized Classification Testing.

ERIC Educational Resources Information Center

Lau, C. Allen; Wang, Tianyou

The purposes of this study were to: (1) extend the sequential probability ratio testing (SPRT) procedure to polytomous item response theory (IRT) models in computerized classification testing (CCT); (2) compare polytomous items with dichotomous items using the SPRT procedure for their accuracy and efficiency; (3) study a direct approach in…
A Conditional Exposure Control Method for Multidimensional Adaptive Testing

ERIC Educational Resources Information Center

Finkelman, Matthew; Nering, Michael L.; Roussos, Louis A.

2009-01-01

In computerized adaptive testing (CAT), ensuring the security of test items is a crucial practical consideration. A common approach to reducing item theft is to define maximum item exposure rates, i.e., to limit the proportion of examinees to whom a given item can be administered. Numerous methods for controlling exposure rates have been proposed…
The Effects of Clinically Relevant Multiple-Choice Items on the Statistical Discrimination of Physician Clinical Competence.

ERIC Educational Resources Information Center

Downing, Steven M.; Maatsch, Jack L.

To test the effect of clinically relevant multiple-choice item content on the validity of statistical discriminations of physicians' clinical competence, data were collected from a field test of the Emergency Medicine Examination, test items for the certification of specialists in emergency medicine. Two 91-item multiple-choice subscales were…

The Effect of Including or Excluding Students with Testing Accommodations on IRT Calibrations.

ERIC Educational Resources Information Center

Karkee, Thakur; Lewis, Dan M.; Barton, Karen; Haug, Carolyn

This study aimed to determine the degree to which the inclusion of accommodated students with disabilities in the calibration sample affects the characteristics of item parameters and the test results. Investigated were effects on test reliability, item fit to the applicable item response theory (IRT) model, item parameter estimates, and students'…
Three controversies over item disclosure in medical licensure examinations.

PubMed

Park, Yoon Soo; Yang, Eunbae B

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Online Calibration of Polytomous Items Under the Generalized Partial Credit Model

PubMed Central

Zheng, Yi

2016-01-01

Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. PMID:29881063
Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms

ERIC Educational Resources Information Center

Debeer, Dries; Ali, Usama S.; van Rijn, Peter W.

2017-01-01

Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
A Two-Decision Model for Responses to Likert-Type Items

ERIC Educational Resources Information Center

Thissen-Roe, Anne; Thissen, David

2013-01-01

Extreme response set, the tendency to prefer the lowest or highest response option when confronted with a Likert-type response scale, can lead to misfit of item response models such as the generalized partial credit model. Recently, a series of intrinsically multidimensional item response models have been hypothesized, wherein tendency toward…
Inductive Selectivity in Children's Cross-Classified Concepts

ERIC Educational Resources Information Center

Nguyen, Simone P.

2012-01-01

Cross-classified items pose an interesting challenge to children's induction as these items belong to many different categories, each of which may serve as a basis for a different type of inference. Inductive selectivity is the ability to appropriately make different types of inferences about a single cross-classifiable item based on its different…
A Proposed System of "Project Management" for Study Items.

ERIC Educational Resources Information Center

Worcester Public Schools, MA.

The purposes of the proposed system are to provide a standard operating procedure for a systematic and effective handling of project-type study items as differentiated from informational-type items; to assign definite singular responsibility for projects; to suggest specific sequential steps to be taken in the preparation of the project report;…
Estimating Ordinal Reliability for Likert-Type and Ordinal Item Response Data: A Conceptual, Empirical, and Practical Guide

ERIC Educational Resources Information Center

Gadermann, Anne M.; Guhn, Martin; Zumbo, Bruno D.

2012-01-01

This paper provides a conceptual, empirical, and practical guide for estimating ordinal reliability coefficients for ordinal item response data (also referred to as Likert, Likert-type, ordered categorical, or rating scale item responses). Conventionally, reliability coefficients, such as Cronbach's alpha, are calculated using a Pearson…
Emotional content enhances true but not false memory for categorized stimuli.

PubMed

Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna

2013-04-01

Past research has shown that emotion enhances true memory, but that emotion can either increase or decrease false memory. Two theoretical possibilities-the distinctiveness of emotional stimuli and the conceptual relatedness of emotional content-have been implicated as being responsible for influencing both true and false memory for emotional content. In the present study, we sought to identify the mechanisms that underlie these mixed findings by equating the thematic relatedness of the study materials across each type of valence used (negative, positive, or neutral). In three experiments, categorically bound stimuli (e.g., funeral, pets, and office items) were used for this purpose. When the encoding task required the processing of thematic relatedness, a significant true-memory enhancement for emotional content emerged in recognition memory, but no emotional boost to false memory (exp. 1). This pattern persisted for true memory with a longer retention interval between study and test (24 h), and false recognition was reduced for emotional items (exp. 2). Finally, better recognition memory for emotional items once again emerged when the encoding task (arousal ratings) required the processing of the emotional aspect of the study items, with no emotional boost to false recognition (EXP. 3). Together, these findings suggest that when emotional and neutral stimuli are equivalently high in thematic relatedness, emotion continues to improve true memory, but it does not override other types of grouping to increase false memory.
The Role of Item Feedback in Self-Adapted Testing.

ERIC Educational Resources Information Center

Roos, Linda L.; And Others

1997-01-01

The importance of item feedback in self-adapted testing was studied by comparing feedback and no feedback conditions for computerized adaptive tests and self-adapted tests taken by 363 college students. Results indicate that item feedback is not necessary to realize score differences between self-adapted and computerized adaptive testing. (SLD)
Criterion-Referenced Test Items for Auto Body.

ERIC Educational Resources Information Center

Tannehill, Dana, Ed.

This test item bank on auto body repair contains criterion-referenced test questions based upon competencies found in the Missouri Auto Body Competency Profile. Some test items are keyed for multiple competencies. The tests cover the following 26 competency areas in the auto body curriculum: auto body careers; measuring and mixing; tools and…
Automated Test-Form Generation

ERIC Educational Resources Information Center

van der Linden, Wim J.; Diao, Qi

2011-01-01

In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different…
Geography, Years 7-10, Library of Test Items. Volume Eight. Junior Secondary Items To Be Used With 1976 to 1980 H.S.C. Geography Exam. Broadsheets.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
The results of STEM education methods for enhancing critical thinking and problem solving skill in physics the 10th grade level

NASA Astrophysics Data System (ADS)

Soros, P.; Ponkham, K.; Ekkapim, S.

2018-01-01

This research aimed to: 1) compare the critical think and problem solving skills before and after learning using STEM Education plan, 2) compare student achievement before and after learning about force and laws of motion using STEM Education plan, and 3) the satisfaction of learning by using STEM Education. The sample used were 37 students from grade 10 at Borabu School, Borabu District, Mahasarakham Province, semester 2, Academic year 2016. Tools used in this study consist of: 1) STEM Education plan about the force and laws of motion for grade 10 students of 1 schemes with total of 14 hours, 2) The test of critical think and problem solving skills with multiple-choice type of 5 options and 2 option of 30 items, 3) achievement test on force and laws of motion with multiple-choice of 4 options of 30 items, 4) satisfaction learning with 5 Rating Scale of 20 items. The statistics used in data analysis were percentage, mean, standard deviation, and t-test (Dependent). The results showed that 1) The student with learning using STEM Education plan have score of critical think and problem solving skills on post-test higher than pre-test with statistically significant level .01. 2) The student with learning using STEM Education plan have achievement score on post-test higher than pre-test with statistically significant level of .01. 3) The student'level of satisfaction toward the learning by using STEM Education plan was at a high level (X ¯ = 4.51, S.D=0.56).
The proposed factor structure of temperament and personality in Japan: combining traits from TEMPS-A and MPT.

PubMed

Akiyama, Tsuyoshi; Tsuda, Hitoshi; Matsumoto, Satoko; Miyake, Yuko; Kawamura, Yoshiya; Noda, Toshie; Akiskal, Kareen K; Akiskal, Hagop S

2005-03-01

In Japan, Kraepelin's descriptions on four "fundamental states" of manic depressive illness, the concepts of schizoid temperament by Kretschmer and obsessional and melancholic type temperament by Shimoda and Tellenbach have been widely accepted. This research investigates the construct validity of these temperaments through factor analysis. TEMPS-A measured depressive, cyclothymic, hyperthymic and irritable temperaments and MPT rigidity, esoteric and isolation subscales measured, respectively, melancholic type and schizoid temperaments. Factor analysis was implemented with TEMPS-A alone and TEMPS-A and MPT combined data. With TEMPS-A alone analysis, Factor 1 included 1 depressive, 11 cyclothymic and 12 irritable temperament items with a factor loading higher than 0.4; Factor 2 included 1 depressive and 10 hyperthymic temperament items; and Factor 3 included 2 depressive temperament items only. With TEMPS-A and MPT combined data, Factor 1 included 3 depressive, 11 cyclothymic and 5 irritable temperament items with a factor loading higher than 0.4 (interpreted as the central cyclothymic tendency for all affective temperaments along Kretschmerian lines and accounting for 11.7% of the variance); Factor 2 included 6 hyperthymic temperament items (6.22% of variance); Factor 3 included 1 cyclothymic, 7 irritable and 1 schizoid temperament items (interpreted as the irritable temperament and accounting for 3.24% of the variance); Factor 4 included 1 depressive temperament and 5 melancholic type items (interpreted as the latter, accounting for 2.66% of the variance); Factor 5 included 5 depressive temperament items, along interpersonal sensitivity and passivity lines, and accounting for 2.31% of the variance; and Factor 6 included 4 schizoid temperament items accounting for 2.07% of the variance. We did not use the Kasahara scale, which some believe to better capture the Japanese melancholic type. Sample was 70% male. These analyses confirm the factor validity of depressive, hyperthymic, cyclothymic and irritable temperaments (TEMPS-A), as well as the melancholic type and the schizoid temperament (MPT). Traits of the depressive and melancholic types emerge as rather distinct. Indeed, our results permit the delineation of an interpersonally sensitive type that "gives in to others" as the core features of the depressive temperament; this is to be contrasted with the higher functioning, perfectionistic, work-oriented melancholic type. Mood dysregulation is represented by the largest number of traits in this population. Contrary to a widely held belief that the melancholic type with its devotion to work and to others is the signature temperament in Japan, cyclothymic traits account for the largest variance in this nonclinical population. Hyperthymic temperament, melancholic type and schizoid temperaments appear largely independent of mood dysregulation. In this Japanese population, TEMPS-A may identify temperament constructs more comprehensively when implemented with melancholic type and schizoid temperament question items added to it. The proposed new Japanese Temperament and Personality (JTP) Scale has self-rated items divided into six subscales.
Type-specific proactive interference in patients with semantic and phonological STM deficits.

PubMed

Harris, Lara; Olson, Andrew; Humphreys, Glyn

2014-01-01

Prior neuropsychological evidence suggests that semantic and phonological components of short-term memory (STM) are functionally and neurologically distinct. The current paper examines proactive interference (PI) from semantic and phonological information in two STM-impaired patients, DS (semantic STM deficit) and AK (phonological STM deficit). In Experiment 1 probe recognition tasks with open and closed sets of stimuli were used. Phonological PI was assessed using nonword items, and semantic and phonological PI was assessed using words. In Experiment 2 phonological and semantic PI was elicited by an item recognition probe test with stimuli that bore phonological and semantic relations to the probes. The data suggested heightened phonological PI for the semantic STM patient, and exaggerated effects of semantic PI in the phonological STM case. The findings are consistent with an account of extremely rapid decay of activated type-specific representations in cases of severely impaired phonological and semantic STM.
When Listening Is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions.

PubMed

Short, Kathleen; Bucak, S Deniz; Rosenthal, Francine; Raymond, Mark R

2018-05-01

In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
Revisiting the role of recollection in item versus forced-choice recognition memory.

PubMed

Cook, Gabriel I; Marsh, Richard L; Hicks, Jason L

2005-08-01

Many memory theorists have assumed that forced-choice recognition tests can rely more on familiarity, whereas item (yes-no) tests must rely more on recollection. In actuality, several studies have found no differences in the contributions of recollection and familiarity underlying the two different test formats. Using word frequency to manipulate stimulus characteristics, the present study demonstrated that the contributions of recollection to item versus forced-choice tests is variable. Low word frequency resulted in significantly more recollection in an item test than did a forced-choice procedure, but high word frequency produced the opposite result. These results clearly constrain any uniform claim about the degree to which recollection supports responding in item versus forced-choice tests.
A Comparison of Methods of Vertical Equating.

ERIC Educational Resources Information Center

Loyd, Brenda H.; Hoover, H. D.

Rasch model vertical equating procedures were applied to three mathematics computation tests for grades six, seven, and eight. Each level of the test was composed of 45 items in three sets of 15 items, arranged in such a way that tests for adjacent grades had two sets (30 items) in common, and the sixth and eighth grades had 15 items in common. In…
Ability or Access-Ability: Differential Item Functioning of Items on Alternate Performance-Based Assessment Tests for Students with Visual Impairments

ERIC Educational Resources Information Center

Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.

2012-01-01

Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…

Objective and Item Banking Computer Software and Its Use in Comprehensive Achievement Monitoring.

ERIC Educational Resources Information Center

Schriber, Peter E.; Gorth, William P.

The current emphasis on objectives and test item banks for constructing more effective tests is being augmented by increasingly sophisticated computer software. Items can be catalogued in numerous ways for retrieval. The items as well as instructional objectives can be stored and test forms can be selected and printed by the computer. It is also…
Data Collection using the MetalMapper in Dynamic Data Acquisition and Cued Modes

DTIC Science & Technology

2017-07-01

land mines, pyrotechnics, bombs , and demolition materials. Surface sweeps identified MEC items throughout Units 11 and 12, including 37mm, 40mm...munitions testing and as impact areas for 4.2-in mortars, large caliber projectiles (75mm–155mm), and numerous types of bombs . With the exception of some...RSA-073 includes AN-M76 bombs , PT1 (incendiary mixture similar to “goop”)-filled; M47-type bombs , Isobutyl Methacrylate Incendiary Mix (IM-AE)- and
Investigating expectation effects using multiple physiological measures

PubMed Central

Siller, Alexander; Ambach, Wolfgang; Vaitl, Dieter

2015-01-01

The study aimed at experimentally investigating whether the human body can anticipate future events under improved methodological conditions. Previous studies have reported contradictory results for the phenomenon typically called presentiment. If the positive findings are accurate, they call into doubt our views about human perception, and if they are inaccurate, a plausible conventional explanation might be based on the experimental design of the previous studies, in which expectation due to item sequences was misinterpreted as presentiment. To address these points, we opted to collect several physiological variables, to test different randomization types and to manipulate subjective significance individually. For the latter, we combined a mock crime scenario, in which participants had to steal specific items, with a concealed information test (CIT), in which the participants had to conceal their knowledge when interrogated about items they had stolen or not stolen. We measured electrodermal activity, respiration, finger pulse, heart rate (HR), and reaction times. The participants (n = 154) were assigned randomly to four different groups. Items presented in the CIT were either drawn with replacement (full) or without replacement (pseudo) and were either presented category-wise (cat) or regardless of categories (nocat). To understand how these item sequences influence expectation and modulate physiological reactions, we compared the groups with respect to effect sizes for stolen vs. not stolen items. Group pseudo_cat yielded the highest effect sizes, and pseudo_nocat yielded the lowest. We could not find any evidence of presentiment but did find evidence of physiological correlates of expectation. Due to the design differing fundamentally from previous studies, these findings do not allow for conclusions on the question whether the expectation bias is being confounded with presentiment. PMID:26500600
Validation of the CMT Pediatric Scale as an outcome measure of disability

PubMed Central

Burns, Joshua; Ouvrier, Robert; Estilow, Tim; Shy, Rosemary; Laurá, Matilde; Pallant, Julie F.; Lek, Monkol; Muntoni, Francesco; Reilly, Mary M.; Pareyson, Davide; Acsadi, Gyula; Shy, Michael E.; Finkel, Richard S.

2012-01-01

Objective Charcot-Marie-Tooth disease (CMT) is a common heritable peripheral neuropathy. There is no treatment for any form of CMT although clinical trials are increasingly occurring. Patients usually develop symptoms during the first two decades of life but there are no established outcome measures of disease severity or response to treatment. We identified a set of items that represent a range of impairment levels and conducted a series of validation studies to build a patient-centered multi-item rating scale of disability for children with CMT. Methods As part of the Inherited Neuropathies Consortium, patients aged 3–20 years with a variety of CMT types were recruited from the USA, UK, Italy and Australia. Initial development stages involved: definition of the construct, item pool generation, peer review and pilot testing. Based on data from 172 patients, a series of validation studies were conducted, including: item and factor analysis, reliability testing, Rasch modeling and sensitivity analysis. Results Seven areas for measurement were identified (strength, dexterity, sensation, gait, balance, power, endurance), and a psychometrically robust 11-item scale constructed (Charcot-Marie-Tooth disease Pediatric Scale: CMTPedS). Rasch analysis supported the viability of the CMTPedS as a unidimensional measure of disability in children with CMT. It showed good overall model fit, no evidence of misfitting items, no person misfit and it was well targeted for children with CMT. Interpretation The CMTPedS is a well-tolerated outcome measure that can be completed in 25-minutes. It is a reliable, valid and sensitive global measure of disability for children with CMT from the age of 3 years. PMID:22522479
Assessing Mathematics Self-Efficacy: How Many Categories Do We Really Need?

ERIC Educational Resources Information Center

Toland, Michael D.; Usher, Ellen L.

2016-01-01

The present study tested whether a reduced number of categories is optimal for assessing mathematics self-efficacy among middle school students using a 6-point Likert-type format or a 0- to 100-point format. Two independent samples of middle school adolescents (N = 1,913) were administered a 24-item Middle School Mathematics Self-Efficacy Scale…
Can Dual Processing Theory Explain Physics Students' Performance on the Force Concept Inventory?

ERIC Educational Resources Information Center

Wood, Anna K.; Galloway, Ross K.; Hardy, Judy

2016-01-01

According to dual processing theory there are two types, or modes, of thinking: system 1, which involves intuitive and nonreflective thinking, and system 2, which is more deliberate and requires conscious effort and thought. The Cognitive Reflection Test (CRT) is a widely used and robust three item instrument that measures the tendency to override…
Judgment: Analyzing Fallacies and Weaknesses in Arguments: Grades 7-12.

ERIC Educational Resources Information Center

Instructional Objectives Exchange, Los Angeles, CA.

Objectives, with sample test items and explanations of answers are presented for instruction in judgment and logic in analyzing fallacies and weaknesses in arguments. This type of material is not usually taught in pre-college curricula, but has been geared for the secondary grades. Each fallacy is explained after the stated objective, and answers…
Comparing the IRT Pre-equating and Section Pre-equating: A Simulation Study.

ERIC Educational Resources Information Center

Hwang, Chi-en; Cleary, T. Anne

The results obtained from two basic types of pre-equatings of tests were compared: the item response theory (IRT) pre-equating and section pre-equating (SPE). The simulated data were generated from a modified three-parameter logistic model with a constant guessing parameter. Responses of two replication samples of 3000 examinees on two 72-item…
Maximum Likelihood Item Easiness Models for Test Theory without an Answer Key

ERIC Educational Resources Information Center

France, Stephen L.; Batchelder, William H.

2015-01-01

Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…
Development and Evaluation of a Questionnaire to Assess Physical Educators' Knowledge of Student Assessment

ERIC Educational Resources Information Center

Emmanouilidou, Kyriaki; Derri, Vassiliki; Aggelousis, Nicolaos; Vassiliadou, Olga

2012-01-01

The purpose of this pilot study was to develop and evaluate an instrument for measuring Greek elementary physical educators' knowledge of student assessment. A multiple-choice questionnaire comprised of items about concepts, methods, tools, and types of student assessment in physical education was designed and tested. The initial 35-item…
Towards a Theory of Learning for Naming Rehabilitation: Retrieval Practice and Spacing Effects

PubMed Central

Schwartz, Myrna F.; Rawson, Katherine A.; Traut, Hilary; Verkuilen, Jay

2016-01-01

Purpose The purpose of this article was to examine how different types of learning experiences affect naming impairment in aphasia. Methods In 4 people with aphasia with naming impairment, we compared the benefits of naming treatment that emphasized retrieval practice (practice retrieving target names from long-term memory) with errorless learning (repetition training, which preempts retrieval practice) according to different schedules of learning. The design was within subjects. Items were administered for multiple training trials for retrieval practice or repetition in a spaced schedule (an item's trials were separated by multiple unrelated trials) or massed schedule (1 trial intervened between an item's trials). In the spaced condition, we studied 3 magnitudes of spacing to evaluate the impact of effortful retrieval during training on the ultimate benefits conferred by retrieval practice naming treatment. The primary outcome was performance on a retention test of naming after 1 day, with a follow-up test after 1 week. Results Group analyses revealed that retrieval practice outperformed errorless learning, and spaced learning outperformed massed learning at retention test and at follow-up. Increases in spacing in the retrieval practice condition yielded more robust learning of retrieved information. Conclusion This study delineates the importance of retrieval practice and spacing for treating naming impairment in aphasia. PMID:27716858
Study on a novel core module based on optical fiber bundles for urine dry-chemistry analysis

NASA Astrophysics Data System (ADS)

Liu, Gaiqin; Ma, Zengwei; Li, Rui; Hu, Nan; Chen, Ping; Wang, Fei; Zhang, Ruiying; Chen, Longcong

2017-09-01

A core module with a novel optical structure is presented to analyze urine by the dry-chemistry method in this paper. It consists of a 32-bit microprocessor, optical fiber bundles, a high precision color sensor and a temperature sensor. The optical fiber bundles are adopted to control the spread path of light and reduce the influence of ambient light and the distance between the strip and sensor effectively. And the temperature sensor is applied to detect the environmental temperature to calibrate the measurement results. Therefore, all these can bring a lot of benefits to the core module, such as improving its test accuracy, reducing its volume and cost, and simplifying its assembly. Additionally, some parameters, including the calculation coefficient about reflectivity of each item, semi-quantitative intervals, the number of test items, may be modified by corresponding instructions in order to enhance its applicability. Meanwhile, its outputs can be chosen among the original data, normalized color values, reflectivity, and the semi-quantitative level of each test item by available instructions. Our results show that the module has high measurement accuracy of more than 95%, good stability, reliability, and consistency and can be easily used in various types of urine analyzers.
The impact of environmental and demographic factors on nursing job satisfaction.

PubMed

Rahnavard, Farnaz; Sadati, Ahmad Kalateh; Hemmati, Sorror; Ebrahimzade, Najmeh; Sarikhani, Yaser; Heydari, Seyed Taghi; Lankarani, Kamran Bagheri

2018-04-01

This study aims to evaluate all aspects of job satisfaction in registered nurses working in different hospitals in Shiraz, Iran. This cross-sectional study was performed during February to August 2015 in Shiraz, Iran. It comprised of 371 registered nurses working in government and private hospitals using multi-stage cluster sampling. Job satisfaction was evaluated using 5 items of the Job Descriptive Index (JDI) consisting of 63 questions developed by Smith, Kendall, and Hulin (1969). Statistical tests including independent sample t test and one-way analysis of variance (ANOVA) were used in order to identify the relation between job satisfaction, and demographic features and work environment. Data were analyzed by SPSS version 15.0, using descriptive statistics, independent-samples t-test, and ANOVA. Our findings showed no relationship between demographic variables and job satisfaction. However, a significant association was observed between environmental aspects such as work rotation (fixed versus rotating) nurse's status (staff vs. supervisors), type of hospitals (governmental vs. private) and work (p<0.01), promotion (p<0.02) and pay (p<0.01) items respectively; however, type of hospital was deemed exempt regarding promotion. Also regarding the number of shifts per week, nurses with more than eight shifts present a lower mean score of satisfaction about pay significantly (p=0.03). The results concerning younger nurses have different types of satisfaction based on several environmental factors. Nurses' policy makers must pay more attention to nurses' satisfaction and focus on reducing the various inequalities.
Older and Wiser: Older Adults’ Episodic Word Memory Benefits from Sentence Study Contexts

PubMed Central

Matzen, Laura E.; Benjamin, Aaron S.

2013-01-01

A hallmark of adaptive cognition is the ability to modulate learning in response to the demands posed by different types of tests and different types of materials. Here we evaluate how older adults process words and sentences differently by examining patterns of memory errors. In two experiments, we explored younger and older adults’ sensitivity to lures on a recognition test following study of words in these two types of contexts. Among the studied words were compound words such as “blackmail” and “jailbird” that were related to conjunction lures (e.g. “blackbird”) and semantic lures (e.g. “criminal”). Participants engaged in a recognition test that included old items, conjunction lures, semantic lures, and unrelated new items. In both experiments, younger and older adults had the same general pattern of memory errors: more incorrect endorsements of semantic than conjunction lures following sentence study and more incorrect endorsements of conjunction than semantic lures following list study. The similar pattern reveals that older and younger adults responded to the constraints of the two different study contexts in similar ways. However, while younger and older adults showed similar levels of memory performance for the list study context, the sentence study context elicited superior memory performance in the older participants. It appears as though memory tasks that take advantage of greater expertise in older adults--in this case, greater experience with sentence processing--can reveal superior memory performance in the elderly. PMID:23834493
An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

ERIC Educational Resources Information Center

Ali, Usama S.; Chang, Hua-Hua

2014-01-01

Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…
Fitting the Rasch Model to Account for Variation in Item Discrimination

ERIC Educational Resources Information Center

Weitzman, R. A.

2009-01-01

Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…
Examination of Polytomous Items' Psychometric Properties According to Nonparametric Item Response Theory Models in Different Test Conditions

ERIC Educational Resources Information Center

Sengul Avsar, Asiye; Tavsancil, Ezel

2017-01-01

This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Rasch Measurement and Item Banking: Theory and Practice.

ERIC Educational Resources Information Center

Nakamura, Yuji

The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Test Design Project: Studies in Test Bias. Annual Report.

ERIC Educational Resources Information Center

McArthur, David

Item bias in a multiple-choice test can be detected by appropriate analyses of the persons x items scoring matrix. This permits comparison of groups of examinees tested with the same instrument. The test may be biased if it is not measuring the same thing in comparable groups, if groups are responding to different aspects of the test items, or if…
The Impact of Settable Test Item Exposure Control Interface Format on Postsecondary Business Student Test Performance

ERIC Educational Resources Information Center

Truell, Allen D.; Zhao, Jensen J.; Alexander, Melody W.

2005-01-01

The purposes of this study were to determine if there is a significant difference in postsecondary business student scores and test completion time based on settable test item exposure control interface format, and to determine if there is a significant difference in student scores and test completion time based on settable test item exposure…

Identifying the Source of Misfit in Item Response Theory Models.

PubMed

Liu, Yang; Maydeu-Olivares, Alberto

2014-01-01

When an item response theory model fails to fit adequately, the items for which the model provides a good fit and those for which it does not must be determined. To this end, we compare the performance of several fit statistics for item pairs with known asymptotic distributions under maximum likelihood estimation of the item parameters: (a) a mean and variance adjustment to bivariate Pearson's X(2), (b) a bivariate subtable analog to Reiser's (1996) overall goodness-of-fit test, (c) a z statistic for the bivariate residual cross product, and (d) Maydeu-Olivares and Joe's (2006) M2 statistic applied to bivariate subtables. The unadjusted Pearson's X(2) with heuristically determined degrees of freedom is also included in the comparison. For binary and ordinal data, our simulation results suggest that the z statistic has the best Type I error and power behavior among all the statistics under investigation when the observed information matrix is used in its computation. However, if one has to use the cross-product information, the mean and variance adjusted X(2) is recommended. We illustrate the use of pairwise fit statistics in 2 real-data examples and discuss possible extensions of the current research in various directions.
Development and validation of the neck dissection impairment index: a quality of life measure.

PubMed

Taylor, Rodney J; Chepeha, Judith C; Teknos, Theodoros N; Bradford, Carol R; Sharma, Pramod K; Terrell, Jeffrey E; Hogikyan, Norman D; Wolf, Gregory T; Chepeha, Douglas B

2002-01-01

To validate a health-related quality-of-life (QOL) instrument for patients following neck dissection and to identify the factors that affect QOL following neck dissection. Cross-sectional validation study. The outpatient clinic of a tertiary care cancer center. Convenience sample of 54 patients previously treated for head and neck cancer who underwent a selective neck dissection or modified radical neck dissection (64 total neck dissections). Patients had a minimum postoperative convalescence of 11 months. Thirty-two underwent accessory nerve-sparing modified radical neck dissection, and 32 underwent selective neck dissection. A 10-item, self-report instrument, the Neck Dissection Impairment Index (NDII), was developed and validated. Reliability was evaluated with test-retest correlation and internal consistency using the Cronbach alpha coefficient. Convergent validity was assessed using the 36-Item Short-Form Health Survey (SF-36) and the Constant Shoulder Scale, a shoulder function test. Multiple variable regression was used to determine variables that most affected QOL following neck dissection The 10-item NDII test-retest correlation was 0.91 (P<.001) with an internal consistency Cronbach alpha coefficient of.95. The NDII correlated with the Constant Shoulder Scale (r = 0.85, P<.001) and with the SF-36 physical functioning (r = 0.50, P<.001) and role-physical functioning (r = 0.60, P<.001) domains. Using multiple variable regression, the variables that contributed most to QOL score were patient's age and weight, radiation treatment, and neck dissection type. The NDII is a valid, reliable instrument for assessing neck dissection impairment. Patient's age, weight, radiation treatment, and neck dissection type were important factors that affect QOL following neck dissection.
Estimating Total-Test Scores from Partial Scores in a Matrix Sampling Design.

ERIC Educational Resources Information Center

Sachar, Jane; Suppes, Patrick

1980-01-01

The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students and 60 items of the 110-item Stanford Mental Arithmetic Test. Three methods yielded fairly good estimates of the total-test score. (Author/RL)
The GRACE checklist for rating the quality of observational studies of comparative effectiveness: a tale of hope and caution.

PubMed

Dreyer, Nancy A; Velentgas, Priscilla; Westrich, Kimberly; Dubois, Robert

2014-03-01

While there is growing demand for information about comparative effectiveness (CE), there is substantial debate about whether and when observational studies have sufficient quality to support decision making. To develop and test an item checklist that can be used to qualify those observational CE studies sufficiently rigorous in design and execution to contribute meaningfully to the evidence base for decision support. An 11-item checklist about data and methods (the GRACE checklist) was developed through literature review and consultation with experts from professional societies, payer groups, the private sector, and academia. Since no single gold standard exists for validation, checklist item responses were compared with 3 different types of external quality ratings (N=88 articles). The articles compared treatment effectiveness and/or safety of drugs, medical devices, and medical procedures. We validated checklist item responses 3 ways against external quality ratings, using published articles of observational CE or safety studies: (a) Systematic Review-quality assessment from a published systematic review; (b) Single Expert Review-quality assessment made according to the solicited "expert opinion" of a senior researcher; and (c) Concordant Expert Review-quality assessments from 2 experts for which there was concordance. Volunteers (N=113) from 5 continents completed 280 article assessments using the checklist. Positive and negative predictive values (PPV, NPV, respectively) of individual items were estimated to compare testers' assessments with those of experts. Taken as a whole, the scale had better NPV than PPV, for both data and methods. The most consistent predictor of quality relates to the validity of the primary outcomes measurement for the study purpose. Other consistent markers of quality relate to using concurrent comparators, minimizing the effects of bias by prudent choice of covariates, and using sensitivity analysis to test robustness of results. Concordance of expert opinion on the quality of the rated articles was 52%; most checklist items performed better. The 11-item GRACE checklist provides guidance to help determine which observational studies of CE have used strong scientific methods and good data that are fit for purpose and merit consideration for decision making. The checklist contains a parsimonious set of elements that can be objectively assessed in published studies, and user testing shows that it can be successfully applied to studies of drugs, medical devices, and clinical and surgical interventions. Although no scoring is provided, study reports that rate relatively well across checklist items merit in-depth examination to understand applicability, effect size, and likelihood of residual bias. The current testing and validation efforts did not achieve clear discrimination between studies fit for purpose and those not, but we have identified a critical, though remediable, limitation in our approach. Not specifying a specific granular decision for evaluation, or not identifying a single study objective in reports that included more than one, left reviewers with too broad an assessment challenge. We believe that future efforts will be more successful if reviewers are asked to focus on a specific objective or question. Despite the challenges encountered in this testing, an agreed upon set of assessment elements, checklists, or score cards is critical for the maturation of this field. Substantial resources will be expended on studies of real-world effectiveness, and if the rigor of these observational assessments cannot be assessed, then the impact of the studies will be suboptimal. Similarly, agreement on key elements of quality will ensure that budgets are appropriately directed toward those elements. Given the importance of this task and the lessons learned from these extensive efforts at validation and user testing, we are optimistic about the potential for improved assessments that can be used for diverse situations by people with a wide range of experience and training. Future testing would benefit by directing reviewers to address a single, granular research question, which would avoid problems that arose by using the checklist to evaluate multiple objectives, by using other types of validation test sets, and by employing further multivariate analysis to see if any combination or sequence of item responses has particularly high predictive validity.
Adults Living with Type 2 Diabetes: Kept Personal Health Information Items as Expressions of Need

ERIC Educational Resources Information Center

Whetstone, Melinda

2013-01-01

This study investigated personal information behavior and information needs that 21 adults managing life with Type 2 diabetes identify explicitly and implicitly during discussions of item acquisition and use of health information items that are kept in their homes. Research drew upon a naturalistic lens, in that semi-structured interviews were…
41 CFR 101-27.209 - Utilization and distribution of shelf-life items.

Code of Federal Regulations, 2014 CFR

2014-07-01

... distribution of shelf-life items. 101-27.209 Section 101-27.209 Public Contracts and Property Management... PROCUREMENT 27-INVENTORY MANAGEMENT 27.2-Management of Shelf-Life Materials § 101-27.209 Utilization and distribution of shelf-life items. Where it is determined that specified quantities of both Type I and Type II...
An examination of gender bias on the eighth-grade MEAP science test as it relates to the Hunter Gatherer Theory of Spatial Sex Differences

NASA Astrophysics Data System (ADS)

Armstrong-Hall, Judy Gail

The purpose of this study was to apply the Hunter-Gatherer Theory of sex spatial skills to responses to individual questions by eighth grade students on the Science component of the Michigan Educational Assessment Program (MEAP) to determine if sex bias was inherent in the test. The Hunter-Gatherer Theory on Spatial Sex Differences, an original theory, that suggested a spatial dimorphism concept with female spatial skill of pattern recall of unconnected items and male spatial skills requiring mental movement. This is the first attempt to apply the Hunter-Gatherer Theory on Spatial Sex Differences to a standardized test. An overall hypothesis suggested that the Hunter-Gatherer Theory of Spatial Sex Differences could predict that males would perform better on problems involving mental movement and females would do better on problems involving the pattern recall of unconnected items. Responses to questions on the 1994-95 MEAP requiring the use of male spatial skills and female spatial skills were analyzed for 5,155 eighth grade students. A panel composed of five educators and a theory developer determined which test items involved the use of male and female spatial skills. A MANOVA, using a random sample of 20% of the 5,155 students to compare male and female correct scores, was statistically significant, with males having higher scores on male spatial skills items and females having higher scores on female spatial skills items. Pearson product moment correlation analyses produced a positive correlation for both male and female performance on both types of spatial skills. The Hunter-Gatherer Theory of Spatial Sex Differences appears to be able to predict that males could perform better on the problems involving mental movement and females could perform better on problems involving the pattern recall of unconnected items. Recommendations for further research included: examination of male/female spatial skill differences at early elementary and high school levels to determine impact of gender on difficulties in solving spatial problems; investigation of the relationship between dominant female spatial skills for students diagnosed with ADHD; study effects of teaching male spatial skills to female students starting in early elementary school to determine the effect on standardized testing.
A Generalized DIF Effect Variance Estimator for Measuring Unsigned Differential Test Functioning in Mixed Format Tests

ERIC Educational Resources Information Center

Penfield, Randall D.; Algina, James

2006-01-01

One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…
A pilot study: The effects of repeat washing and fabric type on the detection of seminal fluid and spermatozoa.

PubMed

Nolan, A; Speers, Samuel J; Murakami, Julie; Chapman, Brendan

2018-05-19

In sexual assault cases and more specifically those involving childhood sexual abuse (CSA), victims may have had their potentially semen-stained clothing washed multiple times before a criminal investigation commences. Although it has been previously demonstrated that spermatozoa persist on cotton clothing following a single wash cycle, items of clothing washed multiple times are not routinely examined in these cases because of the assumption that the laundering process would have removed all seminal fluid and spermatozoa. The aim of this study was to examine the persistence of seminal fluid and spermatozoa on a range of fabric types including cotton, nylon, terry towel (100% cotton), polyester fleece, satin and lace which were laundered up to six times. Three techniques were used for the detection of seminal fluid and spermatozoa: an alternative light source, acid phosphatase test and microscopy. The study demonstrated that spermatozoa persisted on cotton and terry towel following six wash cycles. This data emphasises the need to recover and examine items of clothing and bedding of victims for semen, even if the item has been washed multiple times. Copyright © 2018 Elsevier B.V. All rights reserved.
Potential Damage to Flight Hardware from MIL-STD-462 CS02 Setup

NASA Technical Reports Server (NTRS)

Harris, Patrick K.; Block, Nathan F.

2003-01-01

The MIL-STD-462 CS02 conducted susceptibility test setup includes an audio transformer, with the secondary used as an inductor, and a large capacitor. Together, these two components form an L-type low-pass filter to minimize the injected test signal input into the power source. Some flight hardware power input configurations are not compatible with this setup and break into oscillation when powered up. This, in turn, can damage flight hardware. Such an oscillation resulted in the catastrophic failure of an item tested in the Goddard Space Flight Center (GSFC) Large electromagnetic compatibility (EMC) Test Facility.
Item response theory analysis of the mechanics baseline test

NASA Astrophysics Data System (ADS)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Computerized adaptive testing: the capitalization on chance problem.

PubMed

Olea, Julio; Barrada, Juan Ramón; Abad, Francisco J; Ponsoda, Vicente; Cuevas, Lara

2012-03-01

This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of theta, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (theta). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.
Using Likert-type and ipsative/forced choice items in sequence to generate a preference.

PubMed

Ried, L Douglas

2014-01-01

Collaboration and implementation of a minimum, standardized set of core global educational and professional competencies seems appropriate given the expanding international evolution of pharmacy practice. However, winnowing down hundreds of competencies from a plethora of local, national and international competency frameworks to select the most highly preferred to be included in the core set is a daunting task. The objective of this paper is to describe a combination of strategies used to ascertain the most highly preferred items among a large number of disparate items. In this case, the items were >100 educational and professional competencies that might be incorporated as the core components of new and existing competency frameworks. Panelists (n = 30) from the European Union (EU) and United States (USA) were chosen to reflect a variety of practice settings. Each panelist completed two electronic surveys. The first survey presented competencies in a Likert-type format and the second survey presented many of the same competencies in an ipsative/forced choice format. Item mean scores were calculated for each competency, the competencies were ranked, and non-parametric statistical tests were used to ascertain the consistency in the rankings achieved by the two strategies. This exploratory study presented over 100 competencies to the panelists in the beginning. The two methods provided similar results, as indicated by the significant correlation between the rankings (Spearman's rho = 0.30, P < 0.09). A two-step strategy using Likert-type and ipsative/forced choice formats in sequence, appears to be useful in a situation where a clear preference is required from among a large number of choices. The ipsative/forced choice format resulted in some differences in the competency preferences because the panelists could not rate them equally by design. While this strategy was used for the selection of professional educational competencies in this exploratory study, it is applicable in other situations where a smaller set of highly preferred items might be selected from a large list of choices in other areas of inquiry (e.g., patient reported outcomes). Copyright © 2014 Elsevier Inc. All rights reserved.
How does aging affect the types of error made in a visual short-term memory ‘object-recall’ task?

PubMed Central

Sapkota, Raju P.; van der Linde, Ian; Pardhan, Shahina

2015-01-01

This study examines how normal aging affects the occurrence of different types of incorrect responses in a visual short-term memory (VSTM) object-recall task. Seventeen young (Mean = 23.3 years, SD = 3.76), and 17 normally aging older (Mean = 66.5 years, SD = 6.30) adults participated. Memory stimuli comprised two or four real world objects (the memory load) presented sequentially, each for 650 ms, at random locations on a computer screen. After a 1000 ms retention interval, a test display was presented, comprising an empty box at one of the previously presented two or four memory stimulus locations. Participants were asked to report the name of the object presented at the cued location. Errors rates wherein participants reported the names of objects that had been presented in the memory display but not at the cued location (non-target errors) vs. objects that had not been presented at all in the memory display (non-memory errors) were compared. Significant effects of aging, memory load and target recency on error type and absolute error rates were found. Non-target error rate was higher than non-memory error rate in both age groups, indicating that VSTM may have been more often than not populated with partial traces of previously presented items. At high memory load, non-memory error rate was higher in young participants (compared to older participants) when the memory target had been presented at the earliest temporal position. However, non-target error rates exhibited a reversed trend, i.e., greater error rates were found in older participants when the memory target had been presented at the two most recent temporal positions. Data are interpreted in terms of proactive interference (earlier examined non-target items interfering with more recent items), false memories (non-memory items which have a categorical relationship to presented items, interfering with memory targets), slot and flexible resource models, and spatial coding deficits. PMID:25653615
How does aging affect the types of error made in a visual short-term memory 'object-recall' task?

PubMed

Sapkota, Raju P; van der Linde, Ian; Pardhan, Shahina

2014-01-01

This study examines how normal aging affects the occurrence of different types of incorrect responses in a visual short-term memory (VSTM) object-recall task. Seventeen young (Mean = 23.3 years, SD = 3.76), and 17 normally aging older (Mean = 66.5 years, SD = 6.30) adults participated. Memory stimuli comprised two or four real world objects (the memory load) presented sequentially, each for 650 ms, at random locations on a computer screen. After a 1000 ms retention interval, a test display was presented, comprising an empty box at one of the previously presented two or four memory stimulus locations. Participants were asked to report the name of the object presented at the cued location. Errors rates wherein participants reported the names of objects that had been presented in the memory display but not at the cued location (non-target errors) vs. objects that had not been presented at all in the memory display (non-memory errors) were compared. Significant effects of aging, memory load and target recency on error type and absolute error rates were found. Non-target error rate was higher than non-memory error rate in both age groups, indicating that VSTM may have been more often than not populated with partial traces of previously presented items. At high memory load, non-memory error rate was higher in young participants (compared to older participants) when the memory target had been presented at the earliest temporal position. However, non-target error rates exhibited a reversed trend, i.e., greater error rates were found in older participants when the memory target had been presented at the two most recent temporal positions. Data are interpreted in terms of proactive interference (earlier examined non-target items interfering with more recent items), false memories (non-memory items which have a categorical relationship to presented items, interfering with memory targets), slot and flexible resource models, and spatial coding deficits.
The Impact of Test Dimensionality, Common-Item Set Format, and Scale Linking Methods on Mixed-Format Test Equating

ERIC Educational Resources Information Center

Öztürk-Gübes, Nese; Kelecioglu, Hülya

2016-01-01

The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…
Traumatic stress is linked to a deficit in associative episodic memory.

PubMed

Guez, Jonathan; Naveh-Benjamin, Moshe; Yankovsky, Yan; Cohen, Jonathan; Shiber, Asher; Shalev, Hadar

2011-06-01

Individuals with posttraumatic stress disorder (PTSD) are haunted by persistent memories of the trauma, but ironically are impaired in memories of daily life. The current set of 4 experiments compared new learning and memory of emotionally neutral content in 2 groups of patients and aged- and education-matched controls: 20 patients diagnosed with chronic posttraumatic stress disorder (C-PTSD) and 20 patients diagnosed with acute stress disorder (ASD). In all experiments, participants studied a list of stimuli pairs (words or pictures) and were then tested for their memory of the items, or for the association between items in each pair. Results indicated that both types of patients showed associative memory impairment compared to a control group, although their item memory performance was relatively intact. Potential mechanisms underlying such associative memory deficits in posttraumatic patients are discussed. Copyright © 2011 International Society for Traumatic Stress Studies.
Cognitive interviews to test and refine questionnaires.

PubMed

García, Alexandra A

2011-01-01

Survey data are compromised when respondents do not interpret questions in the way researchers expect. Cognitive interviews are used to detect problems respondents have in understanding survey instructions and items, and in formulating answers. This paper describes methods for conducting cognitive interviews and describes the processes and lessons learned with an illustrative case study. The case study used cognitive interviews to elicit respondents' understanding and perceptions of the format, instructions, items, and responses that make up the Diabetes Symptom Self-Care Inventory (DSSCI), a questionnaire designed to measure Mexican Americans' symptoms of type 2 diabetes and their symptom management strategies. Responses to cognitive interviews formed the basis for revisions in the format, instructions, items, and translation of the DSSCI. All those who develop and revise surveys are urged to incorporate cognitive interviews into their instrumentation methods so that they may produce more reliable and valid measurements. © 2011 Wiley Periodicals, Inc.
Attitude measurement: Judging the emotional intensity of likert-type science attitude statements

NASA Astrophysics Data System (ADS)

Shrigley, Robert L.; Koballa, Thomas R., Jr.

Emotional intensity, that readiness of a teacher to respond favorably or unfavorably toward such psychological objects as science or the teaching of science, is the quality that distinguishes the attitude concept from other related psychological concepts. It would seem, then, that valid attitude statements, if they are to reflect the definition of attitude, would evoke emotional intensity, responses in both a favorable and unfavorable direction by a group of teachers on each item on a science attitude scale. Science educators who design or modify science attitude scales should continue using item-total correlations and other quantitative techniques to test for emotional intensity, but qualitative judgments are necessary, too. In addition, the frequency distribution of data generated by each statement should be examined for skewness and high percentages of neutral responses, both of which can impair the emotional intensity of an item.
Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

ERIC Educational Resources Information Center

Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

2015-01-01

Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.