selected test items: Topics by Science.gov

Sample records for selected test items

Selecting Items for Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
The Selection of Test Items for Decision Making with a Computer Adaptive Test.

ERIC Educational Resources Information Center

Spray, Judith A.; Reckase, Mark D.

The issue of test-item selection in support of decision making in adaptive testing is considered. The number of items needed to make a decision is compared for two approaches: selecting items from an item pool that are most informative at the decision point or selecting items that are most informative at the examinee's ability level. The first…
A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.

ERIC Educational Resources Information Center

Lau, C. Allen; Wang, Tianyou

This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.

ERIC Educational Resources Information Center

Benson, Jeri; Wilson, Michael

Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Influence of Fallible Item Parameters on Test Information During Adaptive Testing.

ERIC Educational Resources Information Center

Wetzel, C. Douglas; McBride, James R.

Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…
A Comparison of the One-and Three-Parameter Logistic Models on Measures of Test Efficiency.

ERIC Educational Resources Information Center

Benson, Jeri

Two methods of item selection were used to select sets of 40 items from a 50-item verbal analogies test, and the resulting item sets were compared for relative efficiency. The BICAL program was used to select the 40 items having the best mean square fit to the one parameter logistic (Rasch) model. The LOGIST program was used to select the 40 items…
Procedures for Selecting Items for Computerized Adaptive Tests.

ERIC Educational Resources Information Center

Kingsbury, G. Gage; Zara, Anthony R.

1989-01-01

Several classical approaches and alternative approaches to item selection for computerized adaptive testing (CAT) are reviewed and compared. The study also describes procedures for constrained CAT that may be added to classical item selection approaches to allow them to be used for applied testing. (TJH)
Bayesian Item Selection in Constrained Adaptive Testing Using Shadow Tests

ERIC Educational Resources Information Center

Veldkamp, Bernard P.

2010-01-01

Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item…
An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Han, Kyung T.

2012-01-01

Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
The Impact of Receiving the Same Items on Consecutive Computer Adaptive Test Administrations.

ERIC Educational Resources Information Center

O'Neill, Thomas; Lunz, Mary E.; Thiede, Keith

2000-01-01

Studied item exposure in a computerized adaptive test when the item selection algorithm presents examinees with questions they were asked in a previous test administration. Results with 178 repeat examinees on a medical technologists' test indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate…
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items

ERIC Educational Resources Information Center

Penfield, Randall D.

2006-01-01

This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing

ERIC Educational Resources Information Center

Deng, Hui; Ansley, Timothy; Chang, Hua-Hua

2010-01-01

In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with…
Automated Test-Form Generation

ERIC Educational Resources Information Center

van der Linden, Wim J.; Diao, Qi

2011-01-01

In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different…
Expertise sensitive item selection.

PubMed

Chow, P; Russell, H; Traub, R E

2000-12-01

In this paper we describe and illustrate a procedure for selecting items from a large pool for a certification test. The proposed procedure, which is intended to improve the alignment of the certification test with on-the-job performance, is based on an expertise sensitive index. This index for an item is the difference between the item's p values for experts and novices. An example is provided of the application of the index for selecting items to be used in certifying bakers.
Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test

ERIC Educational Resources Information Center

Ho, Tsung-Han; Dodd, Barbara G.

2012-01-01

In this study we compared five item selection procedures using three ability estimation methods in the context of a mixed-format adaptive test based on the generalized partial credit model. The item selection procedures used were maximum posterior weighted information, maximum expected information, maximum posterior weighted Kullback-Leibler…
A Feedback Control Strategy for Enhancing Item Selection Efficiency in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Weissman, Alexander

2006-01-01

A computerized adaptive test (CAT) may be modeled as a closed-loop system, where item selection is influenced by trait level ([theta]) estimation and vice versa. When discrepancies exist between an examinee's estimated and true [theta] levels, nonoptimal item selection is a likely result. Nevertheless, examinee response behavior consistent with…
Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Evolution of a Test Item

ERIC Educational Resources Information Center

Spaan, Mary

2007-01-01

This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

PubMed

Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

2016-01-01

The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Missouri Assessment Program (MAP), Spring 2000: High School Health/Physical Education, Released Items, Grade 9.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…

Using Response-Time Constraints in Item Selection To Control for Differential Speededness in Computerized Adaptive Testing. LSAC Research Report Series.

ERIC Educational Resources Information Center

van der Linden, Wim J.; Scrams, David J.; Schnipke, Deborah L.

This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has been administered. Predictions from the model are…
Optimizing the Use of Response Times for Item Selection in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Choe, Edison M.; Kern, Justin L.; Chang, Hua-Hua

2018-01-01

Despite common operationalization, measurement efficiency of computerized adaptive testing should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response…
Item Selection and Pre-equating with Empirical Item Characteristic Curves.

ERIC Educational Resources Information Center

Livingston, Samuel A.

An empirical item characteristic curve shows the probability of a correct response as a function of the student's total test score. These curves can be estimated from large-scale pretest data. They enable test developers to select items that discriminate well in the score region where decisions are made. A similar set of curves can be used to…
Relevance of Item Analysis in Standardizing an Achievement Test in Teaching of Physical Science in B.Ed Syllabus

ERIC Educational Resources Information Center

Marie, S. Maria Josephine Arokia; Edannur, Sreekala

2015-01-01

This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores with Item Exposure Control and Content Constraints

ERIC Educational Resources Information Center

Yao, Lihua

2014-01-01

The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle;…
Comparing Methods for Item Analysis: The Impact of Different Item-Selection Statistics on Test Difficulty

ERIC Educational Resources Information Center

Jones, Andrew T.

2011-01-01

Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
Item Selection in Multidimensional Computerized Adaptive Testing--Gaining Information from Different Angles

ERIC Educational Resources Information Center

Wang, Chun; Chang, Hua-Hua

2011-01-01

Over the past thirty years, obtaining diagnostic information from examinees' item responses has become an increasingly important feature of educational and psychological testing. The objective can be achieved by sequentially selecting multidimensional items to fit the class of latent traits being assessed, and therefore Multidimensional…
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

PubMed Central

2016-01-01

Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests

ERIC Educational Resources Information Center

van der Linden, Wim J.; Ariel, Adelaide; Veldkamp, Bernard P.

2006-01-01

Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content…
Using Mutual Information for Adaptive Item Comparison and Student Assessment

ERIC Educational Resources Information Center

Liu, Chao-Lin

2005-01-01

The author analyzes properties of mutual information between dichotomous concepts and test items. The properties generalize some common intuitions about item comparison, and provide principled foundations for designing item-selection heuristics for student assessment in computer-assisted educational systems. The proposed item-selection strategies…
Combining computer adaptive testing technology with cognitively diagnostic assessment.

PubMed

McGlohen, Meghan; Chang, Hua-Hua

2008-08-01

A major advantage of computerized adaptive testing (CAT) is that it allows the test to home in on an examinee's ability level in an interactive manner. The aim of the new area of cognitive diagnosis is to provide information about specific content areas in which an examinee needs help. The goal of this study was to combine the benefit of specific feedback from cognitively diagnostic assessment with the advantages of CAT. In this study, three approaches to combining these were investigated: (1) item selection based on the traditional ability level estimate (theta), (2) item selection based on the attribute mastery feedback provided by cognitively diagnostic assessment (alpha), and (3) item selection based on both the traditional ability level estimate (theta) and the attribute mastery feedback provided by cognitively diagnostic assessment (alpha). The results from these three approaches were compared for theta estimation accuracy, attribute mastery estimation accuracy, and item exposure control. The theta- and alpha-based condition outperformed the alpha-based condition regarding theta estimation, attribute mastery pattern estimation, and item exposure control. Both the theta-based condition and the theta- and alpha-based condition performed similarly with regard to theta estimation, attribute mastery estimation, and item exposure control, but the theta- and alpha-based condition has an additional advantage in that it uses the shadow test method, which allows the administrator to incorporate additional constraints in the item selection process, such as content balancing, item type constraints, and so forth, and also to select items on the basis of both the current theta and alpha estimates, which can be built on top of existing 3PL testing programs.
Integrating Test-Form Formatting into Automated Test Assembly

ERIC Educational Resources Information Center

Diao, Qi; van der Linden, Wim J.

2013-01-01

Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…
A Comparison of the One-, the Modified Three-, and the Three-Parameter Item Response Theory Models in the Test Development Item Selection Process.

ERIC Educational Resources Information Center

Eignor, Daniel R.; Douglass, James B.

This paper attempts to provide some initial information about the use of a variety of item response theory (IRT) models in the item selection process; its purpose is to compare the information curves derived from the selection of items characterized by several different IRT models and their associated parameter estimation programs. These…
Effects of Content Balancing and Item Selection Method on Ability Estimation in Computerized Adaptive Tests

ERIC Educational Resources Information Center

Sahin, Alper; Ozbasi, Durmus

2017-01-01

Purpose: This study aims to reveal effects of content balancing and item selection method on ability estimation in computerized adaptive tests by comparing Fisher's maximum information (FMI) and likelihood weighted information (LWI) methods. Research Methods: Four groups of examinees (250, 500, 750, 1000) and a bank of 500 items with 10 different…
Using Response Times for Item Selection in Adaptive Testing

ERIC Educational Resources Information Center

van der Linden, Wim J.

2008-01-01

Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the…
A Comparison of Four Item-Selection Methods for Severely Constrained CATs

ERIC Educational Resources Information Center

He, Wei; Diao, Qi; Hauser, Carl

2014-01-01

This study compared four item-selection procedures developed for use with severely constrained computerized adaptive tests (CATs). Severely constrained CATs refer to those adaptive tests that seek to meet a complex set of constraints that are often not conclusive to each other (i.e., an item may contribute to the satisfaction of several…
Bias in Testing: A Presentation of Selected Methods.

ERIC Educational Resources Information Center

Merz, William R.; Rudner, Lawrence M.

A variety of terms related to test bias or test fairness have been used in a variety of ways, but in this document the "fair use of tests" is defined as equitable selection procedures by means of intact tests, and "test item bias" refers to the study of separate items with respect to the tests of which they are a part. Seven…
Development of a short version of the new brief job stress questionnaire.

PubMed

Inoue, Akiomi; Kawakami, Norito; Shimomitsu, Teruichi; Tsutsumi, Akizumi; Haratani, Takashi; Yoshikawa, Toru; Shimazu, Akihito; Odagiri, Yuko

2014-01-01

This study was aimed to investigate the test-retest reliability and validity of a short version of the New Brief Job Stress Questionnaire (New BJSQ) whose scales have one item selected from a standard version. Based on the results from an anonymous web-based questionnaire of occupational health staffs and personnel/labor staffs, we selected higher-priority scales from the standard version. After selecting one item with highest item-total correlation coefficient from each scale, a 23-item questionnaire was developed. A nationally representative survey was administered to Japanese employees (n=1,633) to examine test-retest reliability and validity. Most scales (or items) showed modest but adequate levels of test-retest reliability (r>0.50). Furthermore, job demands and job resources scales (or items) were associated with mental and physical stress reactions while job resources scales (or items) were also associated with positive outcomes. These findings provided a piece of evidence that the short version of the New BJSQ is reliable and valid.
Development of a Short Version of the New Brief Job Stress Questionnaire

PubMed Central

INOUE, Akiomi; KAWAKAMI, Norito; SHIMOMITSU, Teruichi; TSUTSUMI, Akizumi; HARATANI, Takashi; YOSHIKAWA, Toru; SHIMAZU, Akihito; ODAGIRI, Yuko

2014-01-01

This study was aimed to investigate the test-retest reliability and validity of a short version of the New Brief Job Stress Questionnaire (New BJSQ) whose scales have one item selected from a standard version. Based on the results from an anonymous web-based questionnaire of occupational health staffs and personnel/labor staffs, we selected higher-priority scales from the standard version. After selecting one item with highest item-total correlation coefficient from each scale, a 23-item questionnaire was developed. A nationally representative survey was administered to Japanese employees (n=1,633) to examine test-retest reliability and validity. Most scales (or items) showed modest but adequate levels of test-retest reliability (r>0.50). Furthermore, job demands and job resources scales (or items) were associated with mental and physical stress reactions while job resources scales (or items) were also associated with positive outcomes. These findings provided a piece of evidence that the short version of the New BJSQ is reliable and valid. PMID:24975108
An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

ERIC Educational Resources Information Center

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…

[Mokken scaling of the Cognitive Screening Test].

PubMed

Diesfeldt, H F A

2009-10-01

The Cognitive Screening Test (CST) is a twenty-item orientation questionnaire in Dutch, that is commonly used to evaluate cognitive impairment. This study applied Mokken Scale Analysis, a non-parametric set of techniques derived from item response theory (IRT), to CST-data of 466 consecutive participants in psychogeriatric day care. The full item set and the standard short version of fourteen items both met the assumptions of the monotone homogeneity model, with scalability coefficient H = 0.39, which is considered weak. In order to select items that would fulfil the assumption of invariant item ordering or the double monotonicity model, the subjects were randomly partitioned into a training set (50% of the sample) and a test set (the remaining half). By means of an automated item selection eleven items were found to measure one latent trait, with H = 0.67 and item H coefficients larger than 0.51. Cross-validation of the item analysis in the remaining half of the subjects gave comparable values (H = 0.66; item H coefficients larger than 0.56). The selected items involve year, place of residence, birth date, the monarch's and prime minister's names, and their predecessors. Applying optimal discriminant analysis (ODA) it was found that the full set of twenty CST items performed best in distinguishing two predefined groups of patients of lower or higher cognitive ability, as established by an independent criterion derived from the Amsterdam Dementia Screening Test. The chance corrected predictive value or prognostic utility was 47.5% for the full item set, 45.2% for the fourteen items of the standard short version of the CST, and 46.1% for the homogeneous, unidimensional set of selected eleven items. The results of the item analysis support the application of the CST in cognitive assessment, and revealed a more reliable 'short' version of the CST than the standard short version (CST14).
Utilizing Response Time Distributions for Item Selection in CAT

ERIC Educational Resources Information Center

Fan, Zhewen; Wang, Chun; Chang, Hua-Hua; Douglas, Jeffrey

2012-01-01

Traditional methods for item selection in computerized adaptive testing only focus on item information without taking into consideration the time required to answer an item. As a result, some examinees may receive a set of items that take a very long time to finish, and information is not accrued as efficiently as possible. The authors propose two…
Developing a Strategy for Using Technology-Enhanced Items in Large-Scale Standardized Tests

ERIC Educational Resources Information Center

Bryant, William

2017-01-01

As large-scale standardized tests move from paper-based to computer-based delivery, opportunities arise for test developers to make use of items beyond traditional selected and constructed response types. Technology-enhanced items (TEIs) have the potential to provide advantages over conventional items, including broadening construct measurement,…
Investigating Item Exposure Control Methods in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Ozturk, Nagihan Boztunc; Dogan, Nuri

2015-01-01

This study aims to investigate the effects of item exposure control methods on measurement precision and on test security under various item selection methods and item pool characteristics. In this study, the Randomesque (with item group sizes of 5 and 10), Sympson-Hetter, and Fade-Away methods were used as item exposure control methods. Moreover,…
Mutual Information Item Selection in Adaptive Classification Testing

ERIC Educational Resources Information Center

Weissman, Alexander

2007-01-01

A general approach for item selection in adaptive multiple-category classification tests is provided. The approach uses mutual information (MI), a special case of the Kullback-Leibler distance, or relative entropy. MI works efficiently with the sequential probability ratio test and alleviates the difficulties encountered with using other local-…
The Performance of IRT Model Selection Methods with Mixed-Format Tests

ERIC Educational Resources Information Center

Whittaker, Tiffany A.; Chang, Wanchen; Dodd, Barbara G.

2012-01-01

When tests consist of multiple-choice and constructed-response items, researchers are confronted with the question of which item response theory (IRT) model combination will appropriately represent the data collected from these mixed-format tests. This simulation study examined the performance of six model selection criteria, including the…
Computerized Adaptive Testing: Overview and Introduction.

ERIC Educational Resources Information Center

Meijer, Rob R.; Nering, Michael L.

1999-01-01

Provides an overview of computerized adaptive testing (CAT) and introduces contributions to this special issue. CAT elements discussed include item selection, estimation of the latent trait, item exposure, measurement precision, and item-bank development. (SLD)
Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model

PubMed Central

Seo, Dong Gi; Weiss, David J.

2015-01-01

Most computerized adaptive tests (CATs) have been studied using the framework of unidimensional item response theory. However, many psychological variables are multidimensional and might benefit from using a multidimensional approach to CATs. This study investigated the accuracy, fidelity, and efficiency of a fully multidimensional CAT algorithm (MCAT) with a bifactor model using simulated data. Four item selection methods in MCAT were examined for three bifactor pattern designs using two multidimensional item response theory models. To compare MCAT item selection and estimation methods, a fixed test length was used. The Ds-optimality item selection improved θ estimates with respect to a general factor, and either D- or A-optimality improved estimates of the group factors in three bifactor pattern designs under two multidimensional item response theory models. The MCAT model without a guessing parameter functioned better than the MCAT model with a guessing parameter. The MAP (maximum a posteriori) estimation method provided more accurate θ estimates than the EAP (expected a posteriori) method under most conditions, and MAP showed lower observed standard errors than EAP under most conditions, except for a general factor condition using Ds-optimality item selection. PMID:29795848
Science Library of Test Items. Volume Two.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

The second volume of test items in the Science Library of Test Items is intended as a resource to assist teachers in implementing and evaluating science courses in the first 4 years of Australian secondary school. The items were selected from questions submitted to the School Certificate Development Unit by teachers in New South Wales. Only the…
Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms

ERIC Educational Resources Information Center

Debeer, Dries; Ali, Usama S.; van Rijn, Peter W.

2017-01-01

Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
Computerized adaptive testing: the capitalization on chance problem.

PubMed

Olea, Julio; Barrada, Juan Ramón; Abad, Francisco J; Ponsoda, Vicente; Cuevas, Lara

2012-03-01

This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of theta, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (theta). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.
Efforts Toward the Development of Unbiased Selection and Assessment Instruments.

ERIC Educational Resources Information Center

Rudner, Lawrence M.

Investigations into item bias provide an empirical basis for the identification and elimination of test items which appear to measure different traits across populations or cultural groups. The Psychometric rationales for six approaches to the identification of biased test items are reviewed: (1) Transformed item difficulties: within-group…
Weapon Performance Testing and Analysis: The MODI-PAC Round, the Number 4 Lead-Shot Round, and the Flying Baton

DTIC Science & Technology

1976-01-01

items. The items tested were the MODI-PAC, a proprietary item of Reming)on Arms Company, a standard 12 - gauge round of No. 4 lead shot, and an...to refrain from testing this item. Therefore, the final selection of items for testing were (1) the MODI-PAC, (2) a standard 12 - gauge shotgun round of...The first item evaluated was the MODI-PAC5. The MOQ1-PAC which standsfor “modified impact “ is a 12 - gauge shotgun shell loaded with approximately 320
Use of Jackknifing to Evaluate Effects of Anchor Item Selection on Equating with the Nonequivalent Groups with Anchor Test (NEAT) Design. Research Report. ETS RR-15-10

ERIC Educational Resources Information Center

Lu, Ru; Haberman, Shelby; Guo, Hongwen; Liu, Jinghua

2015-01-01

In this study, we apply jackknifing to anchor items to evaluate the impact of anchor selection on equating stability. In an ideal world, the choice of anchor items should have little impact on equating results. When this ideal does not correspond to reality, selection of anchor items can strongly influence equating results. This influence does not…
Are Learning Disabled Students "Test-Wise?": An Inquiry into Reading Comprehension Test Items.

ERIC Educational Resources Information Center

Scruggs, Thomas E.; Lifson, Steve

The ability to correctly answer reading comprehension test items, without having read the accompanying reading passage, was compared for third grade learning disabled students and their peers from a regular classroom. In the first experiment, fourteen multiple choice items were selected from the Stanford Achievement Test. No reading passages were…
Objective and Item Banking Computer Software and Its Use in Comprehensive Achievement Monitoring.

ERIC Educational Resources Information Center

Schriber, Peter E.; Gorth, William P.

The current emphasis on objectives and test item banks for constructing more effective tests is being augmented by increasingly sophisticated computer software. Items can be catalogued in numerous ways for retrieval. The items as well as instructional objectives can be stored and test forms can be selected and printed by the computer. It is also…
Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications

ERIC Educational Resources Information Center

Yao, Lihua

2012-01-01

Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…
Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study

ERIC Educational Resources Information Center

Yi, Qing; Zhang, Jinming; Chang, Hua-Hua

2008-01-01

Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive…
Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

ERIC Educational Resources Information Center

Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

2015-01-01

Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
A Comparison of Item Selection Procedures Using Different Ability Estimation Methods in Computerized Adaptive Testing Based on the Generalized Partial Credit Model

ERIC Educational Resources Information Center

Ho, Tsung-Han

2010-01-01

Computerized adaptive testing (CAT) provides a highly efficient alternative to the paper-and-pencil test. By selecting items that match examinees' ability levels, CAT not only can shorten test length and administration time but it can also increase measurement precision and reduce measurement error. In CAT, maximum information (MI) is the most…

Item Selection Criteria with Practical Constraints for Computerized Classification Testing

ERIC Educational Resources Information Center

Lin, Chuan-Ju

2011-01-01

This study compares four item selection criteria for a two-category computerized classification testing: (1) Fisher information (FI), (2) Kullback-Leibler information (KLI), (3) weighted log-odds ratio (WLOR), and (4) mutual information (MI), with respect to the efficiency and accuracy of classification decision using the sequential probability…
A Comparison Study of Item Exposure Control Strategies in MCAT

ERIC Educational Resources Information Center

Mao, Xiuzhen; Ozdemir, Burhanettin; Wang, Yating; Xiu, Tao

2016-01-01

Four item selection indexes with and without exposure control are evaluated and compared in multidimensional computerized adaptive testing (CAT). The four item selection indices are D-optimality, Posterior expectation Kullback-Leibler information (KLP), the minimized error variance of the linear combination score with equal weight (V1), and the…
A Comparison of Item Selection Techniques for Testlets

ERIC Educational Resources Information Center

Murphy, Daniel L.; Dodd, Barbara G.; Vaughn, Brandon K.

2010-01-01

This study examined the performance of the maximum Fisher's information, the maximum posterior weighted information, and the minimum expected posterior variance methods for selecting items in a computerized adaptive testing system when the items were grouped in testlets. A simulation study compared the efficiency of ability estimation among the…
Investigating Measurement Invariance in Computer-Based Personality Testing: The Impact of Using Anchor Items on Effect Size Indices

ERIC Educational Resources Information Center

Egberink, Iris J. L.; Meijer, Rob R.; Tendeiro, Jorge N.

2015-01-01

A popular method to assess measurement invariance of a particular item is based on likelihood ratio tests with all other items as anchor items. The results of this method are often only reported in terms of statistical significance, and researchers proposed different methods to empirically select anchor items. It is unclear, however, how many…
Selective attention and recognition: effects of congruency on episodic learning.

PubMed

Rosner, Tamara M; D'Angelo, Maria C; MacLellan, Ellen; Milliken, Bruce

2015-05-01

Recent research on cognitive control has focused on the learning consequences of high selective attention demands in selective attention tasks (e.g., Botvinick, Cognit Affect Behav Neurosci 7(4):356-366, 2007; Verguts and Notebaert, Psychol Rev 115(2):518-525, 2008). The current study extends these ideas by examining the influence of selective attention demands on remembering. In Experiment 1, participants read aloud the red word in a pair of red and green spatially interleaved words. Half of the items were congruent (the interleaved words had the same identity), and the other half were incongruent (the interleaved words had different identities). Following the naming phase, participants completed a surprise recognition memory test. In this test phase, recognition memory was better for incongruent than for congruent items. In Experiment 2, context was only partially reinstated at test, and again recognition memory was better for incongruent than for congruent items. In Experiment 3, all of the items contained two different words, but in one condition the words were presented close together and interleaved, while in the other condition the two words were spatially separated. Recognition memory was better for the interleaved than for the separated items. This result rules out an interpretation of the congruency effects on recognition in Experiments 1 and 2 that hinges on stronger relational encoding for items that have two different words. Together, the results support the view that selective attention demands for incongruent items lead to encoding that improves recognition.
Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

ERIC Educational Resources Information Center

Yao, Lihua

2013-01-01

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…
Emotional Intelligence in Applicant Selection for Care-Related Academic Programs

ERIC Educational Resources Information Center

Zysberg, Leehu; Levy, Anat; Zisberg, Anna

2011-01-01

Two studies describe the development of the Audiovisual Test of Emotional Intelligence (AVEI), aimed at candidate selection in educational settings. Study I depicts the construction of the test and the preliminary examination of its psychometric properties in a sample of 92 college students. Item analysis allowed the modification of problem items,…
Dual-Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing

ERIC Educational Resources Information Center

Kang, Hyeon-Ah; Zhang, Susu; Chang, Hua-Hua

2017-01-01

The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery…
Post-hoc simulation study to adopt a computerized adaptive testing (CAT) for a Korean Medical License Examination.

PubMed

Seo, Dong Gi; Choi, Jeongwook

2018-05-17

Computerized adaptive testing (CAT) has been adopted in license examinations due to a test efficiency and accuracy. Many research about CAT have been published to prove the efficiency and accuracy of measurement. This simulation study investigated scoring method and item selection methods to implement CAT in Korean medical license examination (KMLE). This study used post-hoc (real data) simulation design. The item bank used in this study was designed with all items in a 2017 KMLE. All CAT algorithms for this study were implemented by a 'catR' package in R program. In terms of accuracy, Rasch and 2parametric logistic (PL) model performed better than 3PL model. Modal a Posteriori (MAP) or Expected a Posterior (EAP) provided more accurate estimates than MLE and WLE. Furthermore Maximum posterior weighted information (MPWI) or Minimum expected posterior variance (MEPV) performed better than other item selection methods. In terms of efficiency, Rasch model was recommended to reduce test length. Simulation study should be performed under varied test conditions before adopting a live CAT. Based on a simulation study, specific scoring and item selection methods should be predetermined before implementing a live CAT.
Application of Computerized Adaptive Testing to Entrance Examination for Graduate Studies in Turkey

ERIC Educational Resources Information Center

Bulut, Okan; Kan, Adnan

2012-01-01

Problem Statement: Computerized adaptive testing (CAT) is a sophisticated and efficient way of delivering examinations. In CAT, items for each examinee are selected from an item bank based on the examinee's responses to the items. In this way, the difficulty level of the test is adjusted based on the examinee's ability level. Instead of…
Implementing Sympson-Hetter Item-Exposure Control in a Shadow-Test Approach to Constrained Adaptive Testing

ERIC Educational Resources Information Center

Veldkamp, Bernard P.; van der Linden, Wim J.

2008-01-01

In most operational computerized adaptive testing (CAT) programs, the Sympson-Hetter (SH) method is used to control the exposure of the items. Several modifications and improvements of the original method have been proposed. The Stocking and Lewis (1998) version of the method uses a multinomial experiment to select items. For severely constrained…
Construction and Analysis of Educational Tests Using Abductive Machine Learning

ERIC Educational Resources Information Center

El-Alfy, El-Sayed M.; Abdel-Aal, Radwan E.

2008-01-01

Recent advances in educational technologies and the wide-spread use of computers in schools have fueled innovations in test construction and analysis. As the measurement accuracy of a test depends on the quality of the items it includes, item selection procedures play a central role in this process. Mathematical programming and the item response…
Applications of Computerized Adaptive Testing. Proceedings of a Symposium presented at the Annual Convention of the Military Testing Association (18th, October 1976). Research Report 77-1.

ERIC Educational Resources Information Center

Weiss, David J., Ed.

This symposium consists of five papers and presents some recent developments in adaptive testing which have applications to several military testing problems. The overview, by James R. McBride, defines adaptive testing and discusses some of its item selection and scoring strategies. Item response theory, or item characteristic curve theory, is…
New decision criteria for selecting delta check methods based on the ratio of the delta difference to the width of the reference range can be generally applicable for each clinical chemistry test item.

PubMed

Park, Sang Hyuk; Kim, So-Young; Lee, Woochang; Chun, Sail; Min, Won-Ki

2012-09-01

Many laboratories use 4 delta check methods: delta difference, delta percent change, rate difference, and rate percent change. However, guidelines regarding decision criteria for selecting delta check methods have not yet been provided. We present new decision criteria for selecting delta check methods for each clinical chemistry test item. We collected 811,920 and 669,750 paired (present and previous) test results for 27 clinical chemistry test items from inpatients and outpatients, respectively. We devised new decision criteria for the selection of delta check methods based on the ratio of the delta difference to the width of the reference range (DD/RR). Delta check methods based on these criteria were compared with those based on the CV% of the absolute delta difference (ADD) as well as those reported in 2 previous studies. The delta check methods suggested by new decision criteria based on the DD/RR ratio corresponded well with those based on the CV% of the ADD except for only 2 items each in inpatients and outpatients. Delta check methods based on the DD/RR ratio also corresponded with those suggested in the 2 previous studies, except for 1 and 7 items in inpatients and outpatients, respectively. The DD/RR method appears to yield more feasible and intuitive selection criteria and can easily explain changes in the results by reflecting both the biological variation of the test item and the clinical characteristics of patients in each laboratory. We suggest this as a measure to determine delta check methods.
The Psychological Effect of Errors in Standardized Language Test Items on EFL Students' Responses to the Following Item

ERIC Educational Resources Information Center

Khaksefidi, Saman

2017-01-01

This study investigates the psychological effect of a wrong question with wrong items on answering to the next question in a test of structure. Forty students selected through stratified random sampling are given 15 questions of a standardized test namely a TOEFL structure test in which questions number 7 and number 11 are wrong and their answers…
Computerized Adaptive Testing for Polytomous Motivation Items: Administration Mode Effects and a Comparison with Short Forms

ERIC Educational Resources Information Center

Hol, A. Michiel; Vorst, Harrie C. M.; Mellenbergh, Gideon J.

2007-01-01

In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible…
Wisconsin Title I Migrant Education. Section 143 Project: Development of an Item Bank. Summary Report.

ERIC Educational Resources Information Center

Brown, Frank N.; And Others

The successful Wisconsin Title 1 project item bank offers a valid, flexible, and efficient means of providing migrant student tests in reading and mathematics tailored to instructor curricula. The item bank system consists of nine PASCAL computer programs which maintain, search, and select from approximately 1,000 test items stored on floppy disks…
Development, Validation, and Use of an Item Bank for Police Promotion Examinations.

ERIC Educational Resources Information Center

Enger, John M.

In Arkansas, in reaction to complaints about traditional methods of selection for promotion, the civil service commission has chosen to base promotions in the police department solely on scores on locally-developed objective tests. Items developed and loaded into a computerized test bank were selected from six areas of responsibility: (1) criminal…
Precision-Based Item Selection for Exposure Control in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Carroll, Ian A.

2017-01-01

Item exposure control is, relative to adaptive testing, a nascent concept that has emerged only in the last two to three decades on an academic basis as a practical issue in high-stakes computerized adaptive tests. This study aims to implement a new strategy in item exposure control by incorporating the standard error of the ability estimate into…
Tier One Performance Screen Initial Operational Test and Evaluation: 2012 Interim Report

DTIC Science & Technology

2013-12-01

are known to predict outcomes in work settings. Because the TAPAS uses item response theory (IRT) methods to construct and score items, it can be...Qualification Test (AFQT), to select new Soldiers. Although the AFQT is useful for selecting new Soldiers, other personal attributes are important to...to be and will continue to serve as a useful metric for selecting new Soldiers, other personal attributes, in particular non-cognitive attributes

TEDS-M 2008 User Guide for the International Database. Supplement 4: TEDS-M Released Mathematics and Mathematics Pedagogy Knowledge Assessment Items

ERIC Educational Resources Information Center

Brese, Falk, Ed.

2012-01-01

The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…
A Method for the Comparison of Item Selection Rules in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Barrada, Juan Ramon; Olea, Julio; Ponsoda, Vicente; Abad, Francisco Jose

2010-01-01

In a typical study comparing the relative efficiency of two item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or…
Item Response Models for Examinee-Selected Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei

2012-01-01

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
Reading Ability and Print Exposure: Item Response Theory Analysis of the Author Recognition Test

PubMed Central

Moore, Mariah; Gordon, Peter C.

2015-01-01

In the Author Recognition Test (ART) participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, with this predictive ability generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. This large-scale study (1012 college student participants) used Item Response Theory (IRT) to analyze item (author) characteristics to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and to optimize scoring of the ART. Factor analysis suggests a potential two factor structure of the ART differentiating between literary vs. popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of time spent encoding words as measured using eye-tracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Further, they show that frequency data can be used to select items of appropriate difficulty and that frequency data from corpora based on particular time periods and types of text may allow test adaptation for different populations. PMID:25410405
Reading ability and print exposure: item response theory analysis of the author recognition test.

PubMed

Moore, Mariah; Gordon, Peter C

2015-12-01

In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
Effect of Item Response Theory (IRT) Model Selection on Testlet-Based Test Equating. Research Report. ETS RR-14-19

ERIC Educational Resources Information Center

Cao, Yi; Lu, Ru; Tao, Wei

2014-01-01

The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
Development of Test Items Related to Selected Concepts Within the Scheme the Particle Nature of Matter.

ERIC Educational Resources Information Center

Doran, Rodney L.; Pella, Milton O.

The purpose of this study was to develop tests items with a minimum reading demand for use with pupils at grade levels two through six. An item was judged to be acceptable if the item satisfied at least four of six criteria. Approximately 250 students in grades 2-6 participated in the study. Half of the students were given instruction to develop…
IRT Model Selection Methods for Dichotomous Items

ERIC Educational Resources Information Center

Kang, Taehoon; Cohen, Allan S.

2007-01-01

Fit of the model to the data is important if the benefits of item response theory (IRT) are to be obtained. In this study, the authors compared model selection results using the likelihood ratio test, two information-based criteria, and two Bayesian methods. An example illustrated the potential for inconsistency in model selection depending on…
Developing Computerized Tests for Classroom Teachers: A Pilot Study.

ERIC Educational Resources Information Center

Glowacki, Margaret L.; And Others

Two types of computerized testing have been defined: (1) computer-based testing, using a computer to administer conventional tests in which all examinees take the same set of items; and (2) adaptive tests, in which items are selected for administration by the computer, based on examinee's previous responses. This paper discusses an option for…
Designing P-Optimal Item Pools in Computerized Adaptive Tests with Polytomous Items

ERIC Educational Resources Information Center

Zhou, Xuechun

2012-01-01

Current CAT applications consist of predominantly dichotomous items, and CATs with polytomously scored items are limited. To ascertain the best approach to polytomous CAT, a significant amount of research has been conducted on item selection, ability estimation, and impact of termination rules based on polytomous IRT models. Few studies…
The Prediction of Item Parameters Based on Classical Test Theory and Latent Trait Theory

ERIC Educational Resources Information Center

Anil, Duygu

2008-01-01

In this study, the prediction power of the item characteristics based on the experts' predictions on conditions try-out practices cannot be applied was examined for item characteristics computed depending on classical test theory and two-parameters logistic model of latent trait theory. The study was carried out on 9914 randomly selected students…
The Impact Analysis of Psychological Reliability of Population Pilot Study for Selection of Particular Reliable Multi-Choice Item Test in Foreign Language Research Work

ERIC Educational Resources Information Center

Fazeli, Seyed Hossein

2010-01-01

The purpose of research described in the current study is the psychological reliability, its importance, application, and more to investigate on the impact analysis of psychological reliability of population pilot study for selection of particular reliable multi-choice item test in foreign language research work. The population for subject…
Re-Fitting for a Different Purpose: A Case Study of Item Writer Practices in Adapting Source Texts for a Test of Academic Reading

ERIC Educational Resources Information Center

Green, Anthony; Hawkey, Roger

2012-01-01

The important yet under-researched role of item writers in the selection and adaptation of texts for high-stakes reading tests is investigated through a case study involving a group of trained item writers working on the International English Language Testing System (IELTS). In the first phase of the study, participants were invited to reflect in…
Geography library of Test Items. Volume Seven: A Selection of Test Items to Accompany the Resource Kit: Rice Growing & Rice Milling in South-Western New South Wales.

ERIC Educational Resources Information Center

Kouimanos, John, Ed.

Accompanying a multimedia resource unit on aspects of rice growing, volume eight of the geography collection includes a section introducing terminology, a viewing guide to the filmstrips and unit test items. Rice farming and marketing in Australia and growing methods in several countries are presented with regional studies in southeast Australia.…
American College Student Values: Their Relationship to Selected Personal and Academic Variables.

ERIC Educational Resources Information Center

Ritter, Carolyn E.

A 20-item chi-square test of independence was administered to a selected sample of college students that was stratified 50% male and 50% female. Male and female responses showed a significant difference on 18 of the 20 items. The 2 items on which attitudes of both sexes were the same were the role of government in business and a solution to the…
Selective Reminding and Free and Cued Selective Reminding in Mild Cognitive Impairment and Alzheimer Disease.

PubMed

Lemos, Raquel; Afonso, Ana; Martins, Cristina; Waters, James H; Blanco, Filipe Sobral; Simões, Mário R; Santana, Isabel

2016-01-01

The Selective Reminding Test (SRT) and the Free and Cued Selective Reminding Test (FCSRT) are multitrial memory tests that use a common "selective reminding" paradigm that aims to facilitate learning by presenting only the missing words from the previous recall trial. While in the FCSRT semantic cues are provided to elicit recall, in the SRT, participants are merely reminded of the missing items by repeating them. These tests have been used to assess age-related memory changes and to predict dementia. The performance of healthy elders on these tests has been compared before, and results have shown that twice as many words were retrieved from long-term memory in the FCSRT compared with the SRT. In this study, we compared the tests' properties and their accuracy in discriminating amnestic mild cognitive impairment (aMCI; n = 20) from Alzheimer disease (AD; n = 18). Patients with AD performed significantly worse than patients with aMCI on both tests. The percentage of items recalled during the learning trials was significantly higher for the FCSRT in both groups, and a higher number of items were later retrieved, showing the benefit of category cueing. Our key finding was that the FCSRT showed higher accuracy in discriminating patients with aMCI from those with AD.
A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests.

ERIC Educational Resources Information Center

Kingsbury, G. Gage; Zara, Anthony R.

1991-01-01

This simulation investigated two procedures that reduce differences between paper-and-pencil testing and computerized adaptive testing (CAT) by making CAT content sensitive. Results indicate that the price in terms of additional test items of using constrained CAT for content balancing is much smaller than that of using testlets. (SLD)
34 CFR 462.11 - What must an application contain?

Code of Federal Regulations, 2013 CFR

2013-07-01

... (ii) Examinees, for adaptive tests in which items are selected in real time. (d) Maintenance..., including the number of times the test has been administered; and (5) For a computerized adaptive test, the... termination conditions; (iii) Score the test; and (iv) Control for item exposure. (e) Match of content to the...
34 CFR 462.11 - What must an application contain?

Code of Federal Regulations, 2014 CFR

2014-07-01

... (ii) Examinees, for adaptive tests in which items are selected in real time. (d) Maintenance..., including the number of times the test has been administered; and (5) For a computerized adaptive test, the... termination conditions; (iii) Score the test; and (iv) Control for item exposure. (e) Match of content to the...
34 CFR 462.11 - What must an application contain?

Code of Federal Regulations, 2012 CFR

2012-07-01

... (ii) Examinees, for adaptive tests in which items are selected in real time. (d) Maintenance..., including the number of times the test has been administered; and (5) For a computerized adaptive test, the... termination conditions; (iii) Score the test; and (iv) Control for item exposure. (e) Match of content to the...

Use of Matrix Sampling Procedures to Assess Achievement in Solving Open Addition and Subtraction Sentences.

ERIC Educational Resources Information Center

Montague, Margariete A.

This study investigated the feasibility of concurrently and randomly sampling examinees and items in order to estimate group achievement. Seven 32-item tests reflecting a 640-item universe of simple open sentences were used such that item selection (random, systematic) and assignment (random, systematic) of items (four, eight, sixteen) to forms…
[Development of critical thinking skill evaluation scale for nursing students].

PubMed

You, So Young; Kim, Nam Cho

2014-04-01

To develop a Critical Thinking Skill Test for Nursing Students. The construct concepts were drawn from a literature review and in-depth interviews with hospital nurses and surveys were conducted among students (n=607) from nursing colleges. The data were collected from September 13 to November 23, 2012 and analyzed using the SAS program, 9.2 version. The KR 20 coefficient for reliability, difficulty index, discrimination index, item-total correlation and known group technique for validity were performed. Four domains and 27 skills were identified and 35 multiple choice items were developed. Thirty multiple choice items which had scores higher than .80 on the content validity index were selected for the pre test. From the analysis of the pre test data, a modified 30 items were selected for the main test. In the main test, the KR 20 coefficient was .70 and Corrected Item-Total Correlations range was .11-.38. There was a statistically significant difference between two academic systems (p=.001). The developed instrument is the first critical thinking skill test reflecting nursing perspectives in hospital settings and is expected to be utilized as a tool which contributes to improvement of the critical thinking ability of nursing students.
Racial and Ethnic Bias in Test Construction. Final Report.

ERIC Educational Resources Information Center

Green, Donald Ross

To determine if tryout samples typically used for item selection contribute to test bias against minority groups, item analyses were made of the California Achievement Tests using seven subgroups of the standardization sample: Northern White Suburban, Northern Black Urban, Southern White Suburban, Southern Black Rural, Southern White Rural,…
Racial and Ethnic Bias in Test Construction.

ERIC Educational Resources Information Center

Green, Donald Ross

To determine if tryout samples typically used for item selection contribute to test bias against minority groups, item analyses were made of the California Achievement Tests using seven sub-groups of the standardization sample: Northern White Suburban, Northern Black Urban, Southern White Suburban, Southern Black Rural, Southern White Rural,…
The Accuracy of Estimated Total Test Statistics. Final Report.

ERIC Educational Resources Information Center

Kleinke, David J.

In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…
Directed forgetting of visual symbols: evidence for nonverbal selective rehearsal.

PubMed

Hourihan, Kathleen L; Ozubko, Jason D; MacLeod, Colin M

2009-12-01

Is selective rehearsal possible for nonverbal information? Two experiments addressed this question using the item method directed forgetting paradigm, where the advantage of remember items over forget items is ascribed to selective rehearsal favoring the remember items. In both experiments, difficult-to-name abstract symbols were presented for study, followed by a recognition test. Directed forgetting effects were evident for these symbols, regardless of whether they were or were not spontaneously named. Critically, a directed forgetting effect was observed for unnamed symbols even when the symbols were studied under verbal suppression to prevent verbal rehearsal. This pattern indicates that a form of nonverbal rehearsal can be used strategically (i.e., selectively) to enhance memory, even when verbal rehearsal is not possible.
34 CFR 462.12 - What procedures does the Secretary use to review the suitability of tests?

Code of Federal Regulations, 2010 CFR

2010-07-01

... item selection must ensure negligible overlap in items across pre- and post-testing; (iv) Includes a... regarding the suitability of a test, the Secretary publishes in the Federal Register and posts on the... suitability of tests? 462.12 Section 462.12 Education Regulations of the Offices of the Department of...
An Instrument to Predict Job Performance of Home Health Aides--Testing the Reliability and Validity.

ERIC Educational Resources Information Center

Sturges, Jack; Quina, Patricia

The development of four paper-and-pencil tests, useful in assessing the effectiveness of inservice training provided to either nurses aides or home health aides, was described. These tests were designed for utilization in employment selection and case assignment. Two tests of 37 multiple-choice items and two tests of 10 matching items were…
ITEM SELECTION TECHNIQUES AND EVALUATION OF INSTRUCTIONAL OBJECTIVES.

ERIC Educational Resources Information Center

COX, RICHARD C.

THE VALIDITY OF AN EDUCATIONAL ACHIEVEMENT TEST DEPENDS UPON THE CORRESPONDENCE BETWEEN SPECIFIED EDUCATIONAL OBJECTIVES AND THE EXTENT TO WHICH THESE OBJECTIVES ARE MEASURED BY THE EVALUATION INSTRUMENT. THIS STUDY IS DESIGNED TO EVALUATE THE EFFECT OF STATISTICAL ITEM SELECTION ON THE STRUCTURE OF THE FINAL EVALUATION INSTRUMENT AS COMPARED WITH…
Validity and Reliability of General Nutrition Knowledge Questionnaire for Adults in Uganda

PubMed Central

Bukenya, Richard; Ahmed, Abhiya; Andrade, Jeanette M.; Grigsby-Toussaint, Diana S.; Muyonga, John; Andrade, Juan E.

2017-01-01

This study sought to develop and validate a general nutrition knowledge questionnaire (GNKQ) for Ugandan adults. The initial draft consisted of 133 items on five constructs associated with nutrition knowledge; expert recommendations (16 items), food groups (70 items), selecting food (10 items), nutrition and disease relationship (23 items), and food fortification in Uganda (14 items). The questionnaire validity was evaluated in three studies. For the content validity (study 1), a panel of five content matter nutrition experts reviewed the GNKQ draft before and after face validity. For the face validity (study 2), head teachers and health workers (n = 27) completed the questionnaire before attending one of three focus groups to review the clarity of the items. For the construct and test-rest reliability (study 3), head teachers (n = 40) from private and public primary schools and nutrition (n = 52) and engineering (n = 49) students from Makerere University took the questionnaire twice (two weeks apart). Experts agreed (content validity index, CVI > 0.9; reliability, Gwet’s AC1 > 0.85) that all constructs were relevant to evaluate nutrition knowledge. After the focus groups, 29 items were identified as unclear, requiring major (n = 5) and minor (n = 24) reviews. The final questionnaire had acceptable internal consistency (Cronbach α > 0.95), test-retest reliability (r = 0.89), and differentiated (p < 0.001) nutrition knowledge scores between nutrition (67 ± 5) and engineering (39 ± 11) students. Only the construct on nutrition recommendations was unreliable (Cronbach α = 0.51, test-retest r = 0.55), which requires further optimization. The final questionnaire included topics on food groups (41 items), selecting food (2 items), nutrition and disease relationship (14 items), and food fortification in Uganda (22 items) and had good content, construct, and test-retest reliability to evaluate nutrition knowledge among Ugandan adults. PMID:28230779
A Mixture Rasch Model-Based Computerized Adaptive Test for Latent Class Identification

ERIC Educational Resources Information Center

Jiao, Hong; Macready, George; Liu, Junhui; Cho, Youngmi

2012-01-01

This study explored a computerized adaptive test delivery algorithm for latent class identification based on the mixture Rasch model. Four item selection methods based on the Kullback-Leibler (KL) information were proposed and compared with the reversed and the adaptive KL information under simulated testing conditions. When item separation was…
Air Force Officer Qualifying Test Form O: Development and Standardization.

ERIC Educational Resources Information Center

Rogers, Deborah L.; And Others

This report presents the rationale, development, and standardization of the Air Force Officer Qualifying Test (AFOQT) Form O. The test is used to select individuals for officer commissioning programs, and candidates for pilot and navigator training. Form O contains 380 items organized in 16 subtests. All items are administered in a single test…
Specifying the role of the left prefrontal cortex in word selection

PubMed Central

Ries, S. K; Karzmark, C. R.; Navarrete, E.; Knight, R. T.; Dronkers, N. F.

2015-01-01

Word selection allows us to choose words during language production. This is often viewed as a competitive process wherein a lexical representation is retrieved among semantically-related alternatives. The left prefrontal cortex (LPFC) is thought to help overcome competition for word selection through top-down control. However, whether the LPFC is always necessary for word selection remains unclear. We tested 6 LPFC-injured patients and controls in two picture naming paradigms varying in terms of item repetition. Both paradigms elicited the expected semantic interference effects (SIE), reflecting interference caused by semantically-related representations in word selection. However, LPFC patients as a group showed a larger SIE than controls only in the paradigm involving item repetition. We argue that item repetition increases interference caused by semantically-related alternatives, resulting in increased LPFC-dependent cognitive control demands. The remaining network of brain regions associated with word selection appears to be sufficient when items are not repeated. PMID:26291289
Force, velocity, and work: The effects of different contexts on students' understanding of vector concepts using isomorphic problems

NASA Astrophysics Data System (ADS)

Barniol, Pablo; Zavala, Genaro

2014-12-01

In this article we compare students' understanding of vector concepts in problems with no physical context, and with three mechanics contexts: force, velocity, and work. Based on our "Test of Understanding of Vectors," a multiple-choice test presented elsewhere, we designed two isomorphic shorter versions of 12 items each: a test with no physical context, and a test with mechanics contexts. For this study, we administered the items twice to students who were finishing an introductory mechanics course at a large private university in Mexico. The first time, we administered the two 12-item tests to 608 students. In the second, we only tested the items for which we had found differences in students' performances that were difficult to explain, and in this case, we asked them to show their reasoning in written form. In the first administration, we detected no significant difference between the medians obtained in the tests; however, we did identify significant differences in some of the items. For each item we analyze the type of difference found between the tests in the selection of the correct answer, the most common error on each of the tests, and the differences in the selection of incorrect answers. We also investigate the causes of the different context effects. Based on these analyses, we establish specific recommendations for the instruction of vector concepts in an introductory mechanics course. In the Supplemental Material we include both tests for other researchers studying vector learning, and for physics teachers who teach this material.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

PubMed

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
The optimal sequence and selection of screening test items to predict fall risk in older disabled women: the Women's Health and Aging Study.

PubMed

Lamb, Sarah E; McCabe, Chris; Becker, Clemens; Fried, Linda P; Guralnik, Jack M

2008-10-01

Falls are a major cause of disability, dependence, and death in older people. Brief screening algorithms may be helpful in identifying risk and leading to more detailed assessment. Our aim was to determine the most effective sequence of falls screening test items from a wide selection of recommended items including self-report and performance tests, and to compare performance with other published guidelines. Data were from a prospective, age-stratified, cohort study. Participants were 1002 community-dwelling women aged 65 years old or older, experiencing at least some mild disability. Assessments of fall risk factors were conducted in participants' homes. Fall outcomes were collected at 6 monthly intervals. Algorithms were built for prediction of any fall over a 12-month period using tree classification with cross-set validation. Algorithms using performance tests provided the best prediction of fall events, and achieved moderate to strong performance when compared to commonly accepted benchmarks. The items selected by the best performing algorithm were the number of falls in the last year and, in selected subpopulations, frequency of difficulty balancing while walking, a 4 m walking speed test, body mass index, and a test of knee extensor strength. The algorithm performed better than that from the American Geriatric Society/British Geriatric Society/American Academy of Orthopaedic Surgeons and other guidance, although these findings should be treated with caution. Suggestions are made on the type, number, and sequence of tests that could be used to maximize estimation of the probability of falling in older disabled women.
Measurement versus prediction in the construction of patient-reported outcome questionnaires: can we have our cake and eat it?

PubMed

Smits, Niels; van der Ark, L Andries; Conijn, Judith M

2017-11-02

Two important goals when using questionnaires are (a) measurement: the questionnaire is constructed to assign numerical values that accurately represent the test taker's attribute, and (b) prediction: the questionnaire is constructed to give an accurate forecast of an external criterion. Construction methods aimed at measurement prescribe that items should be reliable. In practice, this leads to questionnaires with high inter-item correlations. By contrast, construction methods aimed at prediction typically prescribe that items have a high correlation with the criterion and low inter-item correlations. The latter approach has often been said to produce a paradox concerning the relation between reliability and validity [1-3], because it is often assumed that good measurement is a prerequisite of good prediction. To answer four questions: (1) Why are measurement-based methods suboptimal for questionnaires that are used for prediction? (2) How should one construct a questionnaire that is used for prediction? (3) Do questionnaire-construction methods that optimize measurement and prediction lead to the selection of different items in the questionnaire? (4) Is it possible to construct a questionnaire that can be used for both measurement and prediction? An empirical data set consisting of scores of 242 respondents on questionnaire items measuring mental health is used to select items by means of two methods: a method that optimizes the predictive value of the scale (i.e., forecast a clinical diagnosis), and a method that optimizes the reliability of the scale. We show that for the two scales different sets of items are selected and that a scale constructed to meet the one goal does not show optimal performance with reference to the other goal. The answers are as follows: (1) Because measurement-based methods tend to maximize inter-item correlations by which predictive validity reduces. (2) Through selecting items that correlate highly with the criterion and lowly with the remaining items. (3) Yes, these methods may lead to different item selections. (4) For a single questionnaire: Yes, but it is problematic because reliability cannot be estimated accurately. For a test battery: Yes, but it is very costly. Implications for the construction of patient-reported outcome questionnaires are discussed.
Three controversies over item disclosure in medical licensure examinations.

PubMed

Park, Yoon Soo; Yang, Eunbae B

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Online Calibration of Polytomous Items Under the Generalized Partial Credit Model

PubMed Central

Zheng, Yi

2016-01-01

Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. PMID:29881063
An Attempt to Influence Selected Portions of Student Learning.

ERIC Educational Resources Information Center

Anderson, Edwin R.

In an attempt to selectively improve student performance, one-half of a set of difficult test items from a FORTRAN programming class had handouts explaining the concepts underlying the items distributed to the students. Each handout contained a written learning objective, a short prose passage explaining the objective, and one or more practice…

Informed and Uninformed Naïve Assessment Constructors' Strategies for Item Selection

ERIC Educational Resources Information Center

Fives, Helenrose; Barnes, Nicole

2017-01-01

We present a descriptive analysis of 53 naïve assessment constructors' explanations for selecting test items to include on a summative assessment. We randomly assigned participants to an informed and uninformed condition (i.e., informed participants read an article describing a Table of Specifications). Through recursive thematic analyses of…
Mutual Information Item Selection Method in Cognitive Diagnostic Computerized Adaptive Testing with Short Test Length

ERIC Educational Resources Information Center

Wang, Chun

2013-01-01

Cognitive diagnostic computerized adaptive testing (CD-CAT) purports to combine the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models aim at classifying examinees into the correct mastery profile group so as to pinpoint the strengths and weakness of each examinee whereas CAT algorithms choose items to determine those…
Severity of Organized Item Theft in Computerized Adaptive Testing: An Empirical Study. Research Report. ETS RR-06-22

ERIC Educational Resources Information Center

Yi, Qing; Zhang, Jinming; Chang, Hua-Hua

2006-01-01

Chang and Zhang (2002, 2003) proposed several baseline criteria for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria were obtained from theoretical derivations that assumed uniformly randomized item selection. The current study investigated potential damage caused…
The Comparative Effectiveness of Different Item Analysis Techniques in Increasing Change Score Reliability.

ERIC Educational Resources Information Center

Crocker, Linda M.; Mehrens, William A.

Four new methods of item analysis were used to select subsets of items which would yield measures of attitude change. The sample consisted of 263 students at Michigan State University who were tested on the Inventory of Beliefs as freshmen and retested on the same instrument as juniors. Item change scores and total change scores were computed for…
The associative memory deficit in aging is related to reduced selectivity of brain activity during encoding

PubMed Central

Saverino, Cristina; Fatima, Zainab; Sarraf, Saman; Oder, Anita; Strother, Stephen C.; Grady, Cheryl L.

2016-01-01

Human aging is characterized by reductions in the ability to remember associations between items, despite intact memory for single items. Older adults also show less selectivity in task-related brain activity, such that patterns of activation become less distinct across multiple experimental tasks. This reduced selectivity, or dedifferentiation, has been found for episodic memory, which is often reduced in older adults, but not for semantic memory, which is maintained with age. We used functional magnetic resonance imaging (fMRI) to investigate whether there is a specific reduction in selectivity of brain activity during associative encoding in older adults, but not during item encoding, and whether this reduction predicts associative memory performance. Healthy young and older adults were scanned while performing an incidental-encoding task for pictures of objects and houses under item or associative instructions. An old/new recognition test was administered outside the scanner. We used agnostic canonical variates analysis and split-half resampling to detect whole brain patterns of activation that predicted item vs. associative encoding for stimuli that were later correctly recognized. Older adults had poorer memory for associations than did younger adults, whereas item memory was comparable across groups. Associative encoding trials, but not item encoding trials, were predicted less successfully in older compared to young adults, indicating less distinct patterns of associative-related activity in the older group. Importantly, higher probability of predicting associative encoding trials was related to better associative memory after accounting for age and performance on a battery of neuropsychological tests. These results provide evidence that neural distinctiveness at encoding supports associative memory and that a specific reduction of selectivity in neural recruitment underlies age differences in associative memory. PMID:27082043
Solving the measurement invariance anchor item problem in item response theory.

PubMed

Meade, Adam W; Wright, Natalie A

2012-09-01

The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.
Measuring Social Studies Concept Attainment: Boys and Girls. Report from the Project on A Structure of Concept Attainment Abilities.

ERIC Educational Resources Information Center

Harris, Margaret L.; Tabachnick, B. Robert

This paper describes test development efforts for measuring achievement of selected concepts in social studies. It includes descriptive item and test statistics for the tests developed. Twelve items were developed for each of 30 concepts. Subject specialists categorized the concepts into three major areas: Geographic Region, Man and Society, and…
Effect of individual thinking styles on item selection during study time allocation.

PubMed

Jia, Xiaoyu; Li, Weijian; Cao, Liren; Li, Ping; Shi, Meiling; Wang, Jingjing; Cao, Wei; Li, Xinyu

2018-04-01

The influence of individual differences on learners' study time allocation has been emphasised in recent studies; however, little is known about the role of individual thinking styles (analytical versus intuitive). In the present study, we explored the influence of individual thinking styles on learners' application of agenda-based and habitual processes when selecting the first item during a study-time allocation task. A 3-item cognitive reflection test (CRT) was used to determine individuals' degree of cognitive reliance on intuitive versus analytical cognitive processing. Significant correlations between CRT scores and the choices of first item selection were observed in both Experiment 1a (study time was 5 seconds per triplet) and Experiment 1b (study time was 20 seconds per triplet). Furthermore, analytical decision makers constructed a value-based agenda (prioritised high-reward items), whereas intuitive decision makers relied more upon habitual responding (selected items from the leftmost of the array). The findings of Experiment 1a were replicated in Experiment 2 notwithstanding ruling out the possible effects from individual intelligence and working memory capacity. Overall, the individual thinking style plays an important role on learners' study time allocation and the predictive ability of CRT is reliable in learners' item selection strategy. © 2016 International Union of Psychological Science.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars

PubMed Central

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
Specifying the role of the left prefrontal cortex in word selection.

PubMed

Riès, S K; Karzmark, C R; Navarrete, E; Knight, R T; Dronkers, N F

2015-10-01

Word selection allows us to choose words during language production. This is often viewed as a competitive process wherein a lexical representation is retrieved among semantically-related alternatives. The left prefrontal cortex (LPFC) is thought to help overcome competition for word selection through top-down control. However, whether the LPFC is always necessary for word selection remains unclear. We tested 6 LPFC-injured patients and controls in two picture naming paradigms varying in terms of item repetition. Both paradigms elicited the expected semantic interference effects (SIE), reflecting interference caused by semantically-related representations in word selection. However, LPFC patients as a group showed a larger SIE than controls only in the paradigm involving item repetition. We argue that item repetition increases interference caused by semantically-related alternatives, resulting in increased LPFC-dependent cognitive control demands. The remaining network of brain regions associated with word selection appears to be sufficient when items are not repeated. Copyright © 2015 Elsevier Inc. All rights reserved.
Measurement equivalence of seven selected items of posttraumatic growth between black and white adult survivors of Hurricane Katrina.

PubMed

Rhodes, Alison M; Tran, Thanh V

2013-02-01

This study examined the equivalence or comparability of the measurement properties of seven selected items measuring posttraumatic growth among self-identified Black (n = 270) and White (n = 707) adult survivors of Hurricane Katrina, using data from the Baseline Survey of the Hurricane Katrina Community Advisory Group Study. Internal consistency reliability was equally good for both groups (Cronbach's alphas = .79), as were correlations between individual scale items and their respective overall scale. Confirmatory factor analysis of a congeneric measurement model of seven selected items of posttraumatic growth showed adequate measures of fit for both groups. The results showed only small variation in magnitude of factor loadings and measurement errors between the two samples. Tests of measurement invariance showed mixed results, but overall indicated that factor loading, error variance, and factor variance were similar between the two samples. These seven selected items can be useful for future large-scale surveys of posttraumatic growth.
A Framework for the Development of Computerized Adaptive Tests

ERIC Educational Resources Information Center

Thompson, Nathan A.; Weiss, David J.

2011-01-01

A substantial amount of research has been conducted over the past 40 years on technical aspects of computerized adaptive testing (CAT), such as item selection algorithms, item exposure controls, and termination criteria. However, there is little literature providing practical guidance on the development of a CAT. This paper seeks to collate some…
Comprehensive Adult Student Assessment Systems Braille Reading Assessment: An Exploratory Study

ERIC Educational Resources Information Center

Posey, Virginia K.; Henderson, Barbara W.

2012-01-01

Introduction: This exploratory study determined whether transcribing selected test items on an adult life and work skills reading test into braille could maintain the same approximate scale-score range and maintain fitness within the item response theory model as used by the Comprehensive Adult Student Assessment Systems (CASAS) for developing…
Application of a Multidimensional Nested Logit Model to Multiple-Choice Test Items

ERIC Educational Resources Information Center

Bolt, Daniel M.; Wollack, James A.; Suh, Youngsuk

2012-01-01

Nested logit models have been presented as an alternative to multinomial logistic models for multiple-choice test items (Suh and Bolt in "Psychometrika" 75:454-473, 2010) and possess a mathematical structure that naturally lends itself to evaluating the incremental information provided by attending to distractor selection in scoring. One potential…
Metacognitive Control and Strategy Selection: Deciding to Practice Retrieval during Learning

ERIC Educational Resources Information Center

Karpicke, Jeffrey D.

2009-01-01

Retrieval practice is a potent technique for enhancing learning, but how often do students practice retrieval when they regulate their own learning? In 4 experiments the subjects learned foreign-language items across multiple study and test periods. When items were assigned to be repeatedly tested, repeatedly studied, or removed after they were…
Issues and Procedures in the Development of Criterion Referenced Tests.

ERIC Educational Resources Information Center

Klein, Stephen P.; Kosecoff, Jacqueline

The basic steps and procedures in the development of criterion referenced tests (CRT), as well as the issues and problems associated with these activities are discussed. In the first section of the paper, the discussions focus upon the purpose and defining characteristics of CRTs, item construction and selection, improving item quality, content…
English 30, Part B: Reading. Questions Booklet. Grade 12 Diploma Examination, January 1997.

ERIC Educational Resources Information Center

Alberta Dept. of Education, Edmonton. Student Evaluation Branch.

Intended for students taking the Grade 12 Diploma Examinations in English 30, this "questions booklet" presents 70 multiple choice test items based on 8 reading selections in the accompanying readings booklet. After instructions for students, the booklet presents the multiple choice items which test students' comprehension of the poetry,…
Development of and Field-Test Results for the CAHPS PCMH Survey

PubMed Central

Scholle, Sarah Hudson; Vuong, Oanh; Ding, Lin; Fry, Stephanie; Gallagher, Patricia; Brown, Julie A.; Hays, Ron D.; Cleary, Paul D.

2017-01-01

Objective To develop and evaluate survey questions that assess processes of care relevant to Patient-Centered Medical Homes (PCMHs). Research Design We convened expert panels, reviewed evidence on effective care practices and existing surveys, elicited broad public input, and conducted cognitive interviews and a field test to develop items relevant to PCMHs that could be added to the CAHPS® Clinician & Group (CG-CAHPS) 1.0 Survey. Surveys were tested using a two-contact mail protocol in 10 adult and 33 pediatric practices (both private and community health centers) in Massachusetts. A total of 4,875 completed surveys were received (overall response rate of 25%). Analyses We calculated the rate of valid responses for each item. We conducted exploratory factor analyses and estimated item-to-total correlations, individual and site level reliability, and correlations among proposed multi-item composites. Results Ten items in four new domains (Comprehensiveness, Information, Self-Management Support, and Shared Decision-Making) and four items in two existing domains (Access and Coordination of Care) were selected to be supplemental items to be used in conjunction with the adult CG-CAHPS 1.0 survey. For the child version, four items in each of two new domains (Information and Self-Management Support) and five items in existing domains (Access, Comprehensiveness-Prevention, Coordination of Care) were selected. Conclusions This study provides support for the reliability and validity of new items to supplement the CG-CAHPS 1.0 survey to assess aspects of primary care that are important attributes of Patient-Centered Medical Homes. PMID:23064272
Three controversies over item disclosure in medical licensure examinations

PubMed Central

Park, Yoon Soo; Yang, Eunbae B.

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693
Selection of Common Items as an Unrecognized Source of Variability in Test Equating: A Bootstrap Approximation Assuming Random Sampling of Common Items

ERIC Educational Resources Information Center

Michaelides, Michalis P.; Haertel, Edward H.

2014-01-01

The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…

Development of a wheelchair mobility skills test for children and adolescents: combining evidence with clinical expertise.

PubMed

Sol, Marleen Elisabeth; Verschuren, Olaf; de Groot, Laura; de Groot, Janke Frederike

2017-02-13

Wheelchair mobility skills (WMS) training is regarded by children using a manual wheelchair and their parents as an important factor to improve participation and daily physical activity. Currently, there is no outcome measure available for the evaluation of WMS in children. Several wheelchair mobility outcome measures have been developed for adults, but none of these have been validated in children. Therefore the objective of this study is to develop a WMS outcome measure for children using the current knowledge from literature in combination with the clinical expertise of health care professionals, children and their parents. Mixed methods approach. Phase 1: Item identification of WMS items through a systematic review using the 'COnsensus-based Standards for the selection of health Measurement Instruments' (COSMIN) recommendations. Phase 2: Item selection and validation of relevant WMS items for children, using a focus group and interviews with children using a manual wheelchair, their parents and health care professionals. Phase 3: Feasibility of the newly developed Utrecht Pediatric Wheelchair Mobility Skills Test (UP-WMST) through pilot testing. Phase 1: Data analysis and synthesis of nine WMS related outcome measures showed there is no widely used outcome measure with levels of evidence across all measurement properties. However, four outcome measures showed some levels of evidence on reliability and validity for adults. Twenty-two WMS items with the best clinimetric properties were selected for further analysis in phase 2. Phase 2: Fifteen items were deemed as relevant for children, one item needed adaptation and six items were considered not relevant for assessing WMS in children. Phase 3: Two health care professionals administered the UP-WMST in eight children. The instructions of the UP-WMST were clear, but the scoring method of the height difference items needed adaptation. The outdoor items for rolling over soft surface and the side slope item were excluded in the final version of the UP-WMST due to logistic reasons. The newly developed 15 item UP-WMST is a validated outcome measure which is easy to administer in children using a manual wheelchair. More research regarding reliability, construct validity and responsiveness is warranted before the UP-WMST can be used in practice.
Decisions that Make a Difference in Detecting Differential Item Functioning

ERIC Educational Resources Information Center

Sireci, Stephen G.; Rios, Joseph A.

2013-01-01

There are numerous statistical procedures for detecting items that function differently across subgroups of examinees that take a test or survey. However, in endeavouring to detect items that may function differentially, selection of the statistical method is only one of many important decisions. In this article, we discuss the important decisions…
Measuring patient-provider communication skills in Rwanda: Selection, adaptation and assessment of psychometric properties of the Communication Assessment Tool.

PubMed

Cubaka, Vincent Kalumire; Schriver, Michael; Vedsted, Peter; Makoul, Gregory; Kallestrup, Per

2018-04-23

To identify, adapt and validate a measure for providers' communication and interpersonal skills in Rwanda. After selection, translation and piloting of the measure, structural validity, test-retest reliability, and differential item functioning were assessed. Identification and adaptation: The 14-item Communication Assessment Tool (CAT) was selected and adapted. Content validation found all items highly relevant in the local context except two, which were retained upon understanding the reasoning applied by patients. Eleven providers and 291 patients were involved in the field-testing. Confirmatory factor analysis showed a good fit for the original one factor model. Test-retest reliability assessment revealed a mean quadratic weighted Kappa = 0.81 (range: 0.69-0.89, N = 57). The average proportion of excellent scores was 15.7% (SD: 24.7, range: 9.9-21.8%, N = 180). Differential item functioning was not observed except for item 1, which focuses on greetings, for age groups (p = 0.02, N = 180). The Kinyarwanda version of CAT (K-CAT) is a reliable and valid patient-reported measure of providers' communication and interpersonal skills. K-CAT was validated on nurses and its use on other types of providers may require further validation. K-CAT is expected to be a valuable feedback tool for providers in practice and in training. Copyright © 2018 Elsevier B.V. All rights reserved.
Attainment of Selected Earth Science Concepts by Texas High School Seniors.

ERIC Educational Resources Information Center

Rollins, Mavis M.; And Others

The purpose of this study was to determine whether high school seniors (N=492) had attained each of five selected earth science concepts and if said attainment was influenced by the number of science courses completed. A 72-item, multiple-choice format test (12 items for each concept) was developed and piloted previous to this study to measure…
Cross-cultural adaptation and validation of the Quebec User Evaluation of Satisfaction with Assistive Technology (QUEST 2.0): the development of the Taiwanese version.

PubMed

Mao, Hui-Fen; Chen, Wan-Yin; Yao, Grace; Huang, Sheau-Ling; Lin, Chia-Chi; Huang, Wen-Ni Wennie

2010-05-01

To develop and validate a cross-cultural version of the Quebec User Evaluation of Satisfaction with Assistive Technology (QUEST 2.0) for users of assistive technology devices in Taiwan. A cross-sectional survey. The standard cultural adaptation procedure was used for questionnaire translation and cultural item design. A field test was then conducted for item selection and psychometric properties testing. One hundred and five volunteer assistive device users in community. A questionnaire comprising 12 items of the QUEST 2.0 and 16 culture-specific items. One culture-specific item, 'Cost', was selected based on eight criteria and added to the QUEST 2.0 (12 items) to formulate the Taiwanese version of QUEST 2.0 (T-QUEST). The T-QUEST consisted of 13 items which were classified into two domains: device (8 items) and service (5 items). The internal consistencies of the device, service and total T-QUEST scores were 0.87, 0.84 and 0.90, respectively. The device, services and total T-QUEST scores achieved good test-retest stability (intraclass correlation coefficient (ICC) 0.90, 0.97, 0.95). Exploratory factor analysis revealed that T-QUEST had a two-factor structure for device and service in the construct of user satisfaction (53.42% of the variance explained). Users of assistive device in different culture may have different concerns regarding satisfaction. T-QUEST is the first published version of QUEST with culture-specific items added to the original translated items of QUEST 2.0. T-QUEST was a valid and reliable tool for measuring user satisfaction among Mandarin-speaking individuals using various kinds of assistive devices.
Construction and Validation of a Women's Autonomy Measurement Scale with Reference to Utilization of Maternal Health Care Services in Nepal.

PubMed

Bhandari, T R; Dangal, G; Sarma, P S; Kutty, V R

2014-01-01

Women's autonomy is one of the predictors of maternal health care service utilization. This study aimed to construct and validate a scale for measuring women's autonomy with relevance to developing countries. We conducted a study for construction and validation of a scale in Rupandehi and further validated in Kapilvastu districts of Nepal. Initially, we administered a 24-item preliminary scale and finalized a 23-item scale using psychometric tests. After defining the construct of women's autonomy, we pooled 194 items and selected 24 items to develop a preliminary scale. The scale development process followed different steps i.e. definition of construct, generation of items pool, pretesting, analysis of psychometric test and further validation. The new scale was strongly supported by Cronbach's Alpha value (0.84), test-retest Pearson correlation (0.87), average content validity ratio (0.8) and overall agreement- Kappa value of the items (0.83) whereas all values were found satisfactory. From factor analysis, we selected 23 items for the final scale which show good convergent and discriminant validity. From preliminary draft, we removed one item; the remaining 23 items were loaded in five factors. All five factors had single loading items by suppressing absolute coefficient value less than 0.45 and average coefficient was more than 0.60 of each factor. Similarly, the factors and loaded items had good convergent and discriminant validity which further showed strong measurement capacity of the scale. The new scale is a reliable tool for assessing women's autonomy in developing countries. We recommend for further use and validation of the scale for ensuring the measurement capacity.
Managing a Test Item Bank on a Microcomputer: Can It Help You and Your Students?

ERIC Educational Resources Information Center

Peterson, Julian A.; Meister, Lynn L.

1983-01-01

Describes a test item bank developed by the Association for Medical School Departments of Biochemistry (Texas). Programs (written in Pascal) allow self-evaluation by interactive student access to questions randomly selected from a chosen category. Potential users of the system (having student, manager, and instructor modes) are invited to contact…
The Golden Rule Agreement is Psychometrically Defensible.

ERIC Educational Resources Information Center

Gonzalez-Tamayo, Eulogio

The agreement between the Educational Testing Service (ETS) and the Golden Rule Insurance Company of Illinois is interpreted as setting the general principles on which items must be selected to be included in a licensure test. These principles put a limit to the difficulty level of any item, and they also limit the size of the difference in…
Biological Science: An Ecological Approach. BSCS Green Version. Teacher's Resource Book and Test Item Bank. Sixth Edition.

ERIC Educational Resources Information Center

Biological Sciences Curriculum Study, Colorado Springs.

This book consists of four sections: (1) "Supplemental Materials"; (2) "Supplemental Investigations"; (3) "Test Item Bank"; and (4) "Blackline Masters." The first section provides additional background material related to selected chapters and investigations in the student book. Included are a periodic table of the elements, genetics problems and…
Should We Stop Developing Heuristics and Only Rely on Mixed Integer Programming Solvers in Automated Test Assembly? A Rejoinder to van der Linden and Li (2016).

PubMed

Chen, Pei-Hua

2017-05-01

This rejoinder responds to the commentary by van der Linden and Li entiled "Comment on Three-Element Item Selection Procedures for Multiple Forms Assembly: An Item Matching Approach" on the article "Three-Element Item Selection Procedures for Multiple Forms Assembly: An Item Matching Approach" by Chen. Van der Linden and Li made a strong statement calling for the cessation of test assembly heuristics development, and instead encouraged embracing mixed integer programming (MIP). This article points out the nondeterministic polynomial (NP)-hard nature of MIP problems and how solutions found using heuristics could be useful in an MIP context. Although van der Linden and Li provided several practical examples of test assembly supporting their view, the examples ignore the cases in which a slight change of constraints or item pool data might mean it would not be possible to obtain solutions as quickly as before. The article illustrates the use of heuristic solutions to improve both the performance of MIP solvers and the quality of solutions. Additional responses to the commentary by van der Linden and Li are included.
The relative price of healthy and less healthy foods available in Australian school canteens.

PubMed

Billich, Natassja; Adderley, Marijke; Ford, Laura; Keeton, Isabel; Palermo, Claire; Peeters, Anna; Woods, Julie; Backholer, Kathryn

2018-04-12

School canteens have an important role in modelling a healthy food environment. Price is a strong predictor of food and beverage choice. This study compared the relative price of healthy and less healthy lunch and snack items sold within Australian school canteens. A convenience sample of online canteen menus from five Australian states were selected (100 primary and 100 secondary schools). State-specific canteen guidelines were used to classify menu items into 'green' (eat most), 'amber' (select carefully) and 'red' (not recommended in schools). The price of the cheapest 'healthy' lunch (vegetable-based 'green') and snack ('green' fruit) item was compared to the cheapest 'less healthy' ('amber/red') lunch and snack item, respectively, using an un-paired t-test. The relative price of the 'healthy' items and the 'less healthy' items was calculated to determine the proportion of schools that sold the 'less healthy' item cheaper. The mean cost of the 'healthy' lunch items was greater than the 'less healthy' lunch items for both primary (AUD $0.70 greater) and secondary schools ($0.50 greater; p < 0.01). For 75% of primary and 57% of secondary schools, the selected 'less healthy' lunch item was cheaper than the 'healthy' lunch item. For 41% of primary and 48% of secondary schools, the selected 'less healthy' snack was cheaper than the 'healthy' snack. These proportions were greatest for primary schools located in more, compared to less, disadvantaged areas. The relative price of foods sold within Australian school canteens appears to favour less healthy foods. School canteen healthy food policies should consider the price of foods sold.
Optimizing data collection for public health decisions: a data mining approach

PubMed Central

2014-01-01

Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484
Optimizing data collection for public health decisions: a data mining approach.

PubMed

Partington, Susan N; Papakroni, Vasil; Menzies, Tim

2014-06-12

Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.
Investigating a memory-based account of negative priming: support for selection-feature mismatch.

PubMed

MacDonald, P A; Joordens, S

2000-08-01

Using typical and modified negative priming tasks, the selection-feature mismatch account of negative priming was tested. In the modified task, participants performed selections on the basis of a semantic feature (e.g., referent size). This procedure has been shown to enhance negative priming (P. A. MacDonald, S. Joordens, & K. N. Seergobin, 1999). Across 3 experiments, negative priming occurred only when the repeated item mismatched in terms of the feature used as the basis for selections. When the repeated item was congruent on the selection feature across the prime and probe displays, positive priming arose. This pattern of results appeared in both the ignored- and the attended-repetition conditions. Negative priming does not result from previously ignoring an item. These findings strongly support the selection-feature mismatch account of negative priming and refute both the distractor inhibition and the episodic-retrieval explanations.
Local Norms and Test Characteristics for Selected Forms of the M.A.A. Placement Test.

ERIC Educational Resources Information Center

Melancon, Janet G.; Thompson, Bruce

The psychometric integrity of selected items from the Mathematics Association of America (MAA) placement tests for college students was investigated. Two alternative and parallel versions of the test were developed (Form A and Form B) for this study. Data for 539 students seeking admission into an undergraduate mathematics curriculum at a private…
Weighting Test Samples in IRT Linking and Equating: Toward an Improved Sampling Design for Complex Equating. Research Report. ETS RR-13-39

ERIC Educational Resources Information Center

Qian, Jiahe; Jiang, Yanming; von Davier, Alina A.

2013-01-01

Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal…
Test Design Optimization in CAT Early Stage with the Nominal Response Model

ERIC Educational Resources Information Center

Passos, Valeria Lima; Berger, Martijn P. F.; Tan, Frans E.

2007-01-01

The early stage of computerized adaptive testing (CAT) refers to the phase of the trait estimation during the administration of only a few items. This phase can be characterized by bias and instability of estimation. In this study, an item selection criterion is introduced in an attempt to lessen this instability: the D-optimality criterion. A…
Distribution of Reading Time When Questions are Asked about a Restricted Category of Text Information.

ERIC Educational Resources Information Center

Reynolds, Ralph E.; And Others

1979-01-01

College students read a text either with or without inserted questions. Question groups performed better, relative to controls, on post-test items that repeated inserted questions, and on new post-test items from the same categories as the inserted questions. A selective attention interpretation of the effect of inserted questions was made.…
Does It Matter if You "Kill" the Patient or Order Too Many Tests? Scoring Alternatives for a Test of Clinical Reasoning Skill

ERIC Educational Resources Information Center

Childs, Ruth A.; Dunn, Jennifer L.; van Barneveld, Christina; Jaciw, Andrew P.

2007-01-01

This study compares five scoring approaches for a test of clinical reasoning skills. All of the approaches incorporate information about the correct item responses selected and the errors, such as selecting too many responses or selecting a response that is inappropriate and/or harmful to the patient. The approaches are combinations of theoretical…
Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks

PubMed Central

Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando

2014-01-01

Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843

Impact of Eliminating Anchor Items Flagged from Statistical Criteria on Test Score Classifications in Common Item Equating

ERIC Educational Resources Information Center

Karkee, Thakur; Choi, Seung

2005-01-01

Proper maintenance of a scale established in the baseline year would assure the accurate estimation of growth in subsequent years. Scale maintenance is especially important when the state performance standards must be preserved for future administrations. To ensure proper maintenance of a scale, the selection of anchor items and evaluation of…
Leveling the playing field: attention mitigates the effects of intelligence on memory.

PubMed

Markant, Julie; Amso, Dima

2014-05-01

Effective attention and memory skills are fundamental to typical development and essential for achievement during the formal education years. It is critical to identify the specific mechanisms linking efficiency of attentional selection of an item and the quality of its memory retention. The present study capitalized on the spatial cueing paradigm to examine the role of selection via suppression in modulating children and adolescents' memory encoding. By varying a single parameter, the spatial cueing task can elicit either a simple orienting mechanism (i.e., facilitation) or one that involves both target selection and simultaneous suppression of competing information (i.e., IOR). We modified this paradigm to include images of common items in target locations. Participants were not instructed to learn the items and were not told they would be completing a memory test later. Following the cueing task, we imposed a 7-min delay and then asked participants to complete a recognition memory test. Results indicated that selection via suppression promoted recognition memory among 7-17year-olds. Moreover, individual differences in the extent of suppression during encoding predicted recognition memory accuracy. When basic cueing facilitated orienting to target items during encoding, IQ was the best predictor of recognition memory performance for the attended items. In contrast, engaging suppression (i.e., IOR) during encoding counteracted individual differences in intelligence, effectively improving recognition memory performance among children with lower IQs. This work demonstrates that engaging selection via suppression during learning and encoding improves memory retention and has broad implications for developing effective educational techniques. Copyright © 2014 Elsevier B.V. All rights reserved.
Leveling the playing field: Attention mitigates the effects of intelligence on memory

PubMed Central

Markant, Julie; Amso, Dima

2014-01-01

Effective attention and memory skills are fundamental to typical development and essential for achievement during the formal education years. It is critical to identify the specific mechanisms linking efficiency of attentional selection of an item and the quality of its memory retention. The present study capitalized on the spatial cueing paradigm to examine the role of selection via suppression in modulating children and adolescents’ memory encoding. By varying a single parameter, the spatial cueing task can elicit either a simple orienting mechanism (i.e., facilitation) or one that involves both target selection and simultaneous suppression of competing information (i.e., IOR). We modified this paradigm to include images of common items in target locations. Participants were not instructed to learn the items and were not told they would be completing a memory test later. Following the cueing task, we imposed a seven-minute delay and then asked participants to complete a recognition memory test. Results indicated that selection via suppression promoted recognition memory among 7-17 year-olds. Moreover, individual differences in the extent of suppression during encoding predicted recognition memory accuracy. When basic cueing facilitated orienting to target items during encoding, IQ was the best predictor of recognition memory performance for the attended items. In contrast, engaging suppression (i.e, IOR) during encoding counteracted individual differences in intelligence, effectively improving recognition memory performance among children with lower IQs. This work demonstrates that engaging selection via suppression during learning and encoding improves memory retention and has broad implications for developing effective educational techniques. PMID:24549142
Development and validation of the Current Opioid Misuse Measure.

PubMed

Butler, Stephen F; Budman, Simon H; Fernandez, Kathrine C; Houle, Brian; Benoit, Christine; Katz, Nathaniel; Jamison, Robert N

2007-07-01

Clinicians recognize the importance of monitoring aberrant medication-related behaviors of chronic pain patients while being prescribed opioid therapy. The purpose of this study was to develop and validate the Current Opioid Misuse Measure (COMM) for those pain patients already on long-term opioid therapy. An initial pool of 177 items was developed with input from 26 pain management and addiction specialists. Concept mapping identified six primary concepts underlying medication misuse, which were used to develop an initial item pool. Twenty-two pain and addiction specialists rated the items on importance and relevance, resulting in selection of a 40-item alpha COMM. Final item selection was based on empirical evaluation of items with patients taking opioids for chronic, noncancer pain (N=227). One-week test-retest reliability was examined with 55 participants. All participants were administered the alpha version of the COMM, the Prescription Drug Use Questionnaire (PDUQ) interview, and submitted a urine sample for toxicology screening. Physician ratings of patient aberrant behaviors were also obtained. Of the 40 items, 17 items appeared to adequately measure aberrant behavior, demonstrating excellent internal consistency and test-retest reliability. Cutoff scores were examined using ROC curve analysis and reasonable sensitivity and specificity were established. To evaluate the COMM's ability to capture change in patient status, it was tested on a subset of patients (N=86) that were followed and reassessed three months later. The COMM was found to have promise as a brief, self-report measure of current aberrant drug-related behavior. Further cross-validation and replication of these preliminary results is pending.
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed

Feinberg, Richard A; Clauser, Amanda L

2016-10-01

In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.
Cigarette dependence questionnaire: development and psychometric testing with male smokers.

PubMed

Huang, Chih-Ling; Lin, Hsi-Hui; Wang, Hsiu-Hung

2010-10-01

This paper is a report of a study conducted to develop and test a theoretically derived Cigarette Dependence Questionnaire for adult male smokers. Fagerstrom questionnaires have been used worldwide to assess cigarette dependence. However, these assessments lack any theoretical perspective. A theory-based approach is needed to ensure valid assessment. In 2007, an initial pool of 103 Cigarette Dependence Questionnaire items was distributed to 109 adult smokers in Taiwan. Item analysis was conducted to select items for inclusion in the refined scale. The psychometric properties of the Cigarette Dependence Questionnaire were further evaluated 2007-08, when it was administered to 256 respondents and their saliva was collected and analysed for cotinine levels. Criterion validity was established through the Pearson correlation between the scale and saliva cotinine levels. Exploratory factor analysis was used to test construct validity. Reliability was determined with Cronbach's alpha coefficient and a 2-week test-retest coefficient. The selection of 30 items for seven perspectives was based on item analysis. One factor accounting for 44.9% of the variance emerged from the factor analysis. The factor was named as cigarette dependence. Cigarette Dependence Questionnaire scores were statistically significantly correlated with saliva cotinine levels (r = 0.21, P = 0.01). Cronbach's alpha was 0.95 and test-retest reliability using an intra-class correlation was 0.92. The Cigarette Dependence Questionnaire showed sound reliability and validity and could be used by nurses to set up smoking cessation interventions based on assessment of cigarette dependence. © 2010 Blackwell Publishing Ltd.
Development of Self-Report Measures of Social Attitudes that Act as Environmental Barriers and Facilitators for People with Disabilities

PubMed Central

Garcia, Sofia F.; Hahn, Elizabeth A.; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W.

2014-01-01

Objective To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. Design A mixed methods approach included a literature review; item classification, selection and writing; cognitive interviews and field testing with participants with spinal cord injury (SCI), traumatic brain injury (TBI) or stroke; and rating scale analysis to evaluate initial psychometric properties. Setting General community. Participants Nine individuals with SCI, TBI or stroke participated in cognitive interviews; 305 community residents with those same conditions participated in field testing. Interventions None. Main Outcome Measure(s) Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. Results An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing 82 items. Field test data indicated that the pool satisfies a one-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Conclusions Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample in order to develop a social attitudes item bank for persons with disabilities. PMID:25045803
Development of self-report measures of social attitudes that act as environmental barriers and facilitators for people with disabilities.

PubMed

Garcia, Sofia F; Hahn, Elizabeth A; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W

2015-04-01

To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. A mixed-methods approach included a literature review; item classification, selection, and writing; cognitive interviews and field testing of participants with spinal cord injury (SCI), traumatic brain injury (TBI), or stroke; and rating scale analysis to evaluate initial psychometric properties. General community. Individuals with SCI, TBI, or stroke participated in cognitive interviews (n=9); community residents with those same conditions participated in field testing (n=305). None. Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing of 82 items. Field test data indicated that the pool satisfies a 1-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample to develop a social attitudes item bank for persons with disabilities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Science Literacy: How do High School Students Solve PISA Test Items?

NASA Astrophysics Data System (ADS)

Wati, F.; Sinaga, P.; Priyandoko, D.

2017-09-01

The Programme for International Students Assessment (PISA) does assess students’ science literacy in a real-life contexts and wide variety of situation. Therefore, the results do not provide adequate information for the teacher to excavate students’ science literacy because the range of materials taught at schools depends on the curriculum used. This study aims to investigate the way how junior high school students in Indonesia solve PISA test items. Data was collected by using PISA test items in greenhouse unit employed to 36 students of 9th grade. Students’ answer was analyzed qualitatively for each item based on competence tested in the problem. The way how students answer the problem exhibits their ability in particular competence which is influenced by a number of factors. Those are students’ unfamiliarity with test construction, low performance on reading, low in connecting available information and question, and limitation on expressing their ideas effectively and easy-read. As the effort, selected PISA test items can be used in accordance teaching topic taught to familiarize students with science literacy.
The Multidimensional Assessment of Interoceptive Awareness (MAIA)

PubMed Central

Mehling, Wolf E.; Price, Cynthia; Daubenmier, Jennifer J.; Acree, Mike; Bartmess, Elizabeth; Stewart, Anita

2012-01-01

This paper describes the development of a multidimensional self-report measure of interoceptive body awareness. The systematic mixed-methods process involved reviewing the current literature, specifying a multidimensional conceptual framework, evaluating prior instruments, developing items, and analyzing focus group responses to scale items by instructors and patients of body awareness-enhancing therapies. Following refinement by cognitive testing, items were field-tested in students and instructors of mind-body approaches. Final item selection was achieved by submitting the field test data to an iterative process using multiple validation methods, including exploratory cluster and confirmatory factor analyses, comparison between known groups, and correlations with established measures of related constructs. The resulting 32-item multidimensional instrument assesses eight concepts. The psychometric properties of these final scales suggest that the Multidimensional Assessment of Interoceptive Awareness (MAIA) may serve as a starting point for research and further collaborative refinement. PMID:23133619
Item Selection for the Development of Parallel Forms from an IRT-Based Seed Test Using a Sampling and Classification Approach

ERIC Educational Resources Information Center

Chen, Pei-Hua; Chang, Hua-Hua; Wu, Haiyan

2012-01-01

Two sampling-and-classification-based procedures were developed for automated test assembly: the Cell Only and the Cell and Cube methods. A simulation study based on a 540-item bank was conducted to compare the performance of the procedures with the performance of a mixed-integer programming (MIP) method for assembling multiple parallel test…
Initial retrieval shields against retrieval-induced forgetting.

PubMed

Racsmány, Mihály; Keresztes, Attila

2015-01-01

Testing, as a form of retrieval, can enhance learning but it can also induce forgetting of related memories, a phenomenon known as retrieval-induced forgetting (RIF). In four experiments we explored whether selective retrieval and selective restudy of target memories induce forgetting of related memories with or without initial retrieval of the entire learning set. In Experiment 1, subjects studied category-exemplar associations, some of which were then either restudied or retrieved. RIF occurred on a delayed final test only when memories were retrieved and not when they were restudied. In Experiment 2, following the study phase of category-exemplar associations, subjects attempted to recall all category-exemplar associations, then they selectively retrieved or restudied some of the exemplars. We found that, despite the huge impact on practiced items, selective retrieval/restudy caused no decrease in final recall of related items. In Experiment 3, we replicated the main result of Experiment 2 by manipulating initial retrieval as a within-subject variable. In Experiment 4 we replicated the main results of the previous experiments with non-practiced (Nrp) baseline items. These findings suggest that initial retrieval of the learning set shields against the forgetting effect of later selective retrieval. Together, our results support the context shift theory of RIF.
Initial evaluation of an interactive test of sentence gist recognition.

PubMed

Tye-Murray, N; Witt, S; Castelloe, J

1996-12-01

The laser videodisc-based Sentence Gist Recognition (SGR) test consists of sets of topically related sentences that are cued by short film clips. Clients respond to test items by selecting picture illustrations and may interact with the talker by using repair strategies when they do not recognize a test item. The two experiments, involving 40 and 35 adult subjects, respectively, indicated that the SGR may better predict subjective measures of speechreading and listening performance than more traditional audiologic sentence and nonsense syllable tests. Data from cochlear implant users indicated that the SGR accounted for a greater percentage of the variance for selected items of the Communication Profile for the Hearing-Impaired and the Speechreading Questionnaire for Cochlear-Implant Users than two other audiologic tests. As in previous work, subjects were most apt to ask the talker to repeat an utterance that they did not recognize than to ask the talker to restructure it. It is suggested that the SGR may reflect the interactive nature of conversation and provide a simulated real-world listening and/or speechreading task. The principles underlaying this test are consistent with the development of other computer technologies and concepts, such as compact discinteractive and virtual reality.
Methodology for developing and evaluating the PROMIS smoking item banks.

PubMed

Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando

2014-09-01

This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The development of a science process assessment for fourth-grade students

NASA Astrophysics Data System (ADS)

Smith, Kathleen A.; Welliver, Paul W.

In this study, a multiple-choice test entitled the Science Process Assessment was developed to measure the science process skills of students in grade four. Based on the Recommended Science Competency Continuum for Grades K to 6 for Pennsylvania Schools, this instrument measured the skills of (1) observing, (2) classifying, (3) inferring, (4) predicting, (5) measuring, (6) communicating, (7) using space/time relations, (8) defining operationally, (9) formulating hypotheses, (10) experimenting, (11) recognizing variables, (12) interpreting data, and (13) formulating models. To prepare the instrument, classroom teachers and science educators were invited to participate in two science education workshops designed to develop an item bank of test questions applicable to measuring process skill learning. Participants formed writing teams and generated 65 test items representing the 13 process skills. After a comprehensive group critique of each item, 61 items were identified for inclusion into the Science Process Assessment item bank. To establish content validity, the item bank was submitted to a select panel of science educators for the purpose of judging item acceptability. This analysis yielded 55 acceptable test items and produced the Science Process Assessment, Pilot 1. Pilot 1 was administered to 184 fourth-grade students. Students were given a copy of the test booklet; teachers read each test aloud to the students. Upon completion of this first administration, data from the item analysis yielded a reliability coefficient of 0.73. Subsequently, 40 test items were identified for the Science Process Assessment, Pilot 2. Using the test-retest method, the Science Process Assessment, Pilot 2 (Test 1 and Test 2) was administered to 113 fourth-grade students. Reliability coefficients of 0.80 and 0.82, respectively, were ascertained. The correlation between Test 1 and Test 2 was 0.77. The results of this study indicate that (1) the Science Process Assessment, Pilot 2, is a valid and reliable instrument applicable to measuring the science process skills of students in grade four, (2) using educational workshops as a means of developing item banks of test questions is viable and productive in the test development process, and (3) involving classroom teachers and science educators in the test development process is educationally efficient and effective.
Thyroid-specific questions on work ability showed known-groups validity among Danes with thyroid diseases.

PubMed

Nexo, Mette Andersen; Watt, Torquil; Bonnema, Steen Joop; Hegedüs, Laszlo; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue

2015-07-01

We aimed to identify the best approach to work ability assessment in patients with thyroid disease by evaluating the factor structure, measurement equivalence, known-groups validity, and predictive validity of a broad set of work ability items. Based on the literature and interviews with thyroid patients, 24 work ability items were selected from previous questionnaires, revised, or developed anew. Items were tested among 632 patients with thyroid disease (non-toxic goiter, toxic nodular goiter, Graves' disease (with or without orbitopathy), autoimmune hypothyroidism, and other thyroid diseases), 391 of which had participated in a study 5 years previously. Responses to select items were compared to general population data. We used confirmatory factor analyses for categorical data, logistic regression analyses and tests of differential item function, and head-to-head comparisons of relative validity in distinguishing known groups. Although all work ability items loaded on a common factor, the optimal factor solution included five factors: role physical, role emotional, thyroid-specific limitations, work limitations (without disease attribution), and work performance. The scale on thyroid-specific limitations showed the most power in distinguishing clinical groups and time since diagnosis. A global single item proved useful for comparisons with the general population, and a thyroid-specific item predicted labor market exclusion within the next 5 years (OR 5.0, 95 % CI 2.7-9.1). Items on work limitations with attribution to thyroid disease were most effective in detecting impact on work ability and showed good predictive validity. Generic work ability items remain useful for general population comparisons.
The development and validation of a test of science critical thinking for fifth graders.

PubMed

Mapeala, Ruslan; Siew, Nyet Moi

2015-01-01

The paper described the development and validation of the Test of Science Critical Thinking (TSCT) to measure the three critical thinking skill constructs: comparing and contrasting, sequencing, and identifying cause and effect. The initial TSCT consisted of 55 multiple choice test items, each of which required participants to select a correct response and a correct choice of critical thinking used for their response. Data were obtained from a purposive sampling of 30 fifth graders in a pilot study carried out in a primary school in Sabah, Malaysia. Students underwent the sessions of teaching and learning activities for 9 weeks using the Thinking Maps-aided Problem-Based Learning Module before they answered the TSCT test. Analyses were conducted to check on difficulty index (p) and discrimination index (d), internal consistency reliability, content validity, and face validity. Analysis of the test-retest reliability data was conducted separately for a group of fifth graders with similar ability. Findings of the pilot study showed that out of initial 55 administered items, only 30 items with relatively good difficulty index (p) ranged from 0.40 to 0.60 and with good discrimination index (d) ranged within 0.20-1.00 were selected. The Kuder-Richardson reliability value was found to be appropriate and relatively high with 0.70, 0.73 and 0.92 for identifying cause and effect, sequencing, and comparing and contrasting respectively. The content validity index obtained from three expert judgments equalled or exceeded 0.95. In addition, test-retest reliability showed good, statistically significant correlations ([Formula: see text]). From the above results, the selected 30-item TSCT was found to have sufficient reliability and validity and would therefore represent a useful tool for measuring critical thinking ability among fifth graders in primary science.
Item-method directed forgetting: Effects at retrieval?

PubMed

Taylor, Tracy L; Cutmore, Laura; Pries, Lotta

2018-02-01

In an item-method directed forgetting paradigm, words are presented one at a time, each followed by an instruction to Remember or Forget; a directed forgetting effect is measured as better subsequent memory for Remember words than Forget words. The dominant view is that the directed forgetting effect arises during encoding due to selective rehearsal of Remember over Forget items. In three experiments we attempted to falsify a strong view that directed forgetting effects in recognition are due only to encoding mechanisms when an item method is used. Across 3 experiments we tested for retrieval-based processes by colour-coding the recognition test items. Black colour provided no information; green colour cued a potential Remember item; and, red colour cued a potential Forget item. Recognition cues were mixed within-blocks in Experiment 1 and between-blocks in Experiments 2 and 3; Experiment 3 added explicit feedback on the accuracy of the recognition decision. Although overall recognition improved with cuing when explicit test performance feedback was added in Experiment 3, in no case was the magnitude of the directed forgetting effect influenced by recognition cueing. Our results argue against a role for retrieval-based strategies that limit recognition of Forget items at test and posit a role for encoding intentions only. Copyright © 2017 Elsevier B.V. All rights reserved.
Assessing Patients’ Experiences with Communication Across the Cancer Care Continuum

PubMed Central

Mazor, Kathleen M.; Street, Richard L.; Sue, Valerie M.; Williams, Andrew E.; Rabin, Borsika A.; Arora, Neeraj K.

2016-01-01

Objective To evaluate the relevance, performance and potential usefulness of the Patient Assessment of cancer Communication Experiences (PACE) items. Methods Items focusing on specific communication goals related to exchanging information, fostering healing relationships, responding to emotions, making decisions, enabling self-management, and managing uncertainty were tested via a retrospective, cross-sectional survey of adults who had been diagnosed with cancer. Analyses examined response frequencies, inter-item correlations, and coefficient alpha. Results A total of 366 adults were included in the analyses. Relatively few selected “Does Not Apply”, suggesting that items tap relevant communication experiences. Ratings of whether specific communication goals were achieved were strongly correlated with overall ratings of communication, suggesting item content reflects important aspects of communication. Coefficient alpha was ≥.90 for each item set, indicating excellent reliability. Variations in the percentage of respondents selecting the most positive response across items suggest results can identify strengths and weaknesses. Conclusion The PACE items tap relevant, important aspects of communication during cancer care, and may be useful to cancer care teams desiring detailed feedback. PMID:26979476
Effects of aging on neural connectivity underlying selective memory for emotional scenes

PubMed Central

Waring, Jill D.; Addis, Donna Rose; Kensinger, Elizabeth A.

2012-01-01

Older adults show age-related reductions in memory for neutral items within complex visual scenes, but just like young adults, older adults exhibit a memory advantage for emotional items within scenes compared with the background scene information. The present study examined young and older adults’ encoding-stage effective connectivity for selective memory of emotional items versus memory for both the emotional item and its background. In a functional magnetic resonance imaging (fMRI) study, participants viewed scenes containing either positive or negative items within neutral backgrounds. Outside the scanner, participants completed a memory test for items and backgrounds. Irrespective of scene content being emotionally positive or negative, older adults had stronger positive connections among frontal regions and from frontal regions to medial temporal lobe structures than did young adults, especially when items and backgrounds were subsequently remembered. These results suggest there are differences between young and older adults’ connectivity accompanying the encoding of emotional scenes. Older adults may require more frontal connectivity to encode all elements of a scene rather than just encoding the emotional item. PMID:22542836

Effects of aging on neural connectivity underlying selective memory for emotional scenes.

PubMed

Waring, Jill D; Addis, Donna Rose; Kensinger, Elizabeth A

2013-02-01

Older adults show age-related reductions in memory for neutral items within complex visual scenes, but just like young adults, older adults exhibit a memory advantage for emotional items within scenes compared with the background scene information. The present study examined young and older adults' encoding-stage effective connectivity for selective memory of emotional items versus memory for both the emotional item and its background. In a functional magnetic resonance imaging (fMRI) study, participants viewed scenes containing either positive or negative items within neutral backgrounds. Outside the scanner, participants completed a memory test for items and backgrounds. Irrespective of scene content being emotionally positive or negative, older adults had stronger positive connections among frontal regions and from frontal regions to medial temporal lobe structures than did young adults, especially when items and backgrounds were subsequently remembered. These results suggest there are differences between young and older adults' connectivity accompanying the encoding of emotional scenes. Older adults may require more frontal connectivity to encode all elements of a scene rather than just encoding the emotional item. Published by Elsevier Inc.
Development and validation of a socioculturally competent trust in physician scale for a developing country setting.

PubMed

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-05-03

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. To develop and validate a new trust in physician scale for a developing country setting. Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. The final 12 item trust in physician scale has a good construct validity and internal consistency. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Development and validation of a socioculturally competent trust in physician scale for a developing country setting

PubMed Central

Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar

2015-01-01

Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. Objectives To develop and validate a new trust in physician scale for a developing country setting. Methods Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Results Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. Conclusions The final 12 item trust in physician scale has a good construct validity and internal consistency. PMID:25941182
Free recall test experience potentiates strategy-driven effects of value on memory.

PubMed

Cohen, Michael S; Rissman, Jesse; Hovhannisyan, Mariam; Castel, Alan D; Knowlton, Barbara J

2017-10-01

People tend to show better memory for information that is deemed valuable or important. By one mechanism, individuals selectively engage deeper, semantic encoding strategies for high value items (Cohen, Rissman, Suthana, Castel, & Knowlton, 2014). By another mechanism, information paired with value or reward is automatically strengthened in memory via dopaminergic projections from midbrain to hippocampus (Shohamy & Adcock, 2010). We hypothesized that the latter mechanism would primarily enhance recollection-based memory, while the former mechanism would strengthen both recollection and familiarity. We also hypothesized that providing interspersed tests during study is a key to encouraging selective engagement of strategies. To test these hypotheses, we presented participants with sets of words, and each word was associated with a high or low point value. In some experiments, free recall tests were given after each list. In all experiments, a recognition test was administered 5 minutes after the final word list. Process dissociation was accomplished via remember/know judgments at recognition, a recall test probing both item memory and memory for a contextual detail (word plurality), and a task dissociation combining a recognition test for plurality (intended to probe recollection) with a speeded item recognition test (to probe familiarity). When recall tests were administered after study lists, high value strengthened both recollection and familiarity. When memory was not tested after each study list, but rather only at the end, value increased recollection but not familiarity. These dual process dissociations suggest that interspersed recall tests guide learners' use of metacognitive control to selectively apply effective encoding strategies. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Evaluation of Item Candidates: The PROMIS Qualitative Item Review

PubMed Central

DeWalt, Darren A.; Rothrock, Nan; Yount, Susan; Stone, Arthur A.

2009-01-01

One of the PROMIS (Patient-Reported Outcome Measurement Information System) network's primary goals is the development of a comprehensive item bank for patient-reported outcomes of chronic diseases. For its first set of item banks, PROMIS chose to focus on pain, fatigue, emotional distress, physical function, and social function. An essential step for the development of an item pool is the identification, evaluation, and revision of extant questionnaire items for the core item pool. In this work, we also describe the systematic process wherein items are classified for subsequent statistical processing by the PROMIS investigators. Six phases of item development are documented: identification of extant items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. Identification of items refers to the systematic search for existing items in currently available scales. Expert item review and revision was conducted by trained professionals who reviewed the wording of each item and revised as appropriate for conventions adopted by the PROMIS network. Focus groups were used to confirm domain definitions and to identify new areas of item development for future PROMIS item banks. Cognitive interviews were used to examine individual items. Items successfully screened through this process were sent to field testing and will be subjected to innovative scale construction procedures. PMID:17443114
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed Central

Feinberg, Richard A.; Clauser, Amanda L.

2016-01-01

ABSTRACT Background In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. Objective The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Methods Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Results Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Conclusions Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation. PMID:27777664
Predicting fatty acid profiles in blood based on food intake and the FADS1 rs174546 SNP.

PubMed

Hallmann, Jacqueline; Kolossa, Silvia; Gedrich, Kurt; Celis-Morales, Carlos; Forster, Hannah; O'Donovan, Clare B; Woolhead, Clara; Macready, Anna L; Fallaize, Rosalind; Marsaux, Cyril F M; Lambrinou, Christina-Paulina; Mavrogianni, Christina; Moschonis, George; Navas-Carretero, Santiago; San-Cristobal, Rodrigo; Godlewska, Magdalena; Surwiłło, Agnieszka; Mathers, John C; Gibney, Eileen R; Brennan, Lorraine; Walsh, Marianne C; Lovegrove, Julie A; Saris, Wim H M; Manios, Yannis; Martinez, Jose Alfredo; Traczyk, Iwona; Gibney, Michael J; Daniel, Hannelore

2015-12-01

A high intake of n-3 PUFA provides health benefits via changes in the n-6/n-3 ratio in blood. In addition to such dietary PUFAs, variants in the fatty acid desaturase 1 (FADS1) gene are also associated with altered PUFA profiles. We used mathematical modeling to predict levels of PUFA in whole blood, based on multiple hypothesis testing and bootstrapped LASSO selected food items, anthropometric and lifestyle factors, and the rs174546 genotypes in FADS1 from 1607 participants (Food4Me Study). The models were developed using data from the first reported time point (training set) and their predictive power was evaluated using data from the last reported time point (test set). Among other food items, fish, pizza, chicken, and cereals were identified as being associated with the PUFA profiles. Using these food items and the rs174546 genotypes as predictors, models explained 26-43% of the variability in PUFA concentrations in the training set and 22-33% in the test set. Selecting food items using multiple hypothesis testing is a valuable contribution to determine predictors, as our models' predictive power is higher compared to analogue studies. As unique feature, we additionally confirmed our models' power based on a test set. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Selective Maintenance in Visual Working Memory Does Not Require Sustained Visual Attention

PubMed Central

Hollingworth, Andrew; Maxcey-Richard, Ashleigh M.

2012-01-01

In four experiments, we tested whether sustained visual attention is required for the selective maintenance of objects in VWM. Participants performed a color change-detection task. During the retention interval, a valid cue indicated the item that would be tested. Change detection performance was higher in the valid-cue condition than in a neutral-cue control condition. To probe the role of visual attention in the cuing effect, on half of the trials, a difficult search task was inserted after the cue, precluding sustained attention on the cued item. The addition of the search task produced no observable decrement in the magnitude of the cuing effect. In a complementary test, search efficiency was not impaired by simultaneously prioritizing an object for retention in VWM. The results demonstrate that selective maintenance in VWM can be dissociated from the locus of visual attention. PMID:23067118
Selective maintenance in visual working memory does not require sustained visual attention.

PubMed

Hollingworth, Andrew; Maxcey-Richard, Ashleigh M

2013-08-01

In four experiments, we tested whether sustained visual attention is required for the selective maintenance of objects in visual working memory (VWM). Participants performed a color change-detection task. During the retention interval, a valid cue indicated the item that would be tested. Change-detection performance was higher in the valid-cue condition than in a neutral-cue control condition. To probe the role of visual attention in the cuing effect, on half of the trials, a difficult search task was inserted after the cue, precluding sustained attention on the cued item. The addition of the search task produced no observable decrement in the magnitude of the cuing effect. In a complementary test, search efficiency was not impaired by simultaneously prioritizing an object for retention in VWM. The results demonstrate that selective maintenance in VWM can be dissociated from the locus of visual attention. 2013 APA, all rights reserved
Reliability of a store observation tool in measuring availability of alcohol and selected foods.

PubMed

Cohen, Deborah A; Schoeff, Diane; Farley, Thomas A; Bluthenthal, Ricky; Scribner, Richard; Overton, Adrian

2007-11-01

Alcohol and food items can compromise or contribute to health, depending on the quantity and frequency with which they are consumed. How much people consume may be influenced by product availability and promotion in local retail stores. We developed and tested an observational tool to objectively measure in-store availability and promotion of alcoholic beverages and selected food items that have an impact on health. Trained observers visited 51 alcohol outlets in Los Angeles and southeastern Louisiana. Using a standardized instrument, two independent observations were conducted documenting the type of outlet, the availability and shelf space for alcoholic beverages and selected food items, the purchase price of standard brands, the placement of beer and malt liquor, and the amount of in-store alcohol advertising. Reliability of the instrument was excellent for measures of item availability, shelf space, and placement of malt liquor. Reliability was lower for alcohol advertising, beer placement, and items that measured the "least price" of apples and oranges. The average kappa was 0.87 for categorical items and the average intraclass correlation coefficient was 0.83 for continuous items. Overall, systematic observation of the availability and promotion of alcoholic beverages and food items was feasible, acceptable, and reliable. Measurement tools such as the one we evaluated should be useful in studies of the impact of availability of food and beverages on consumption and on health outcomes.
Conditional Covariance-Based Subtest Selection for DIMTEST

ERIC Educational Resources Information Center

Froelich, Amy G.; Habing, Brian

2008-01-01

DIMTEST is a nonparametric hypothesis-testing procedure designed to test the assumptions of a unidimensional and locally independent item response theory model. Several previous Monte Carlo studies have found that using linear factor analysis to select the assessment subtest for DIMTEST results in a moderate to severe loss of power when the exam…
Measuring sexual orientation in adolescent health surveys: evaluation of eight school-based surveys.

PubMed

Saewyc, Elizabeth M; Bauer, Greta R; Skay, Carol L; Bearinger, Linda H; Resnick, Michael D; Reis, Elizabeth; Murphy, Aileen

2004-10-01

To examine the performance of various items measuring sexual orientation within 8 school-based adolescent health surveys in the United States and Canada from 1986 through 1999. Analyses examined nonresponse and unsure responses to sexual orientation items compared with other survey items, demographic differences in responses, tests for response set bias, and congruence of responses to multiple orientation items; analytical methods included frequencies, contingency tables with Chi-square, and ANOVA with least significant differences (LSD)post hoc tests; all analyses were conducted separately by gender. In all surveys, nonresponse rates for orientation questions were similar to other sexual questions, but not higher; younger students, immigrants, and students with learning disabilities were more likely to skip items or select "unsure." Sexual behavior items had the lowest nonresponse, but fewer than half of all students reported sexual behavior, limiting its usefulness for indicating orientation. Item placement in the survey, wording, and response set bias all appeared to influence nonresponse and unsure rates. Specific recommendations include standardizing wording across future surveys, and pilot testing items with diverse ages and ethnic groups of teens before use. All three dimensions of orientation should be assessed where possible; when limited to single items, sexual attraction may be the best choice. Specific wording suggestions are offered for future surveys.
Development of a health literacy assessment for young adult college students: a pilot study.

PubMed

Harper, Raquel

2014-01-01

The purpose of this study was to develop a comprehensive health literacy assessment tool for young adult college students. Participants were 144 undergraduate students. Two hundred and twenty-nine questions were developed, which were based on concepts identified by the US Department of Health and Human Services, the World Health Organization, and health communication scholars. Four health education experts reviewed this pool of items and helped select 87 questions for testing. Students completed an online assessment consisting of these 87 questions in June and October of 2012. Item response theory and goodness-of-fit values were used to help eliminate nonperforming questions. Fifty-one questions were selected based on good item response theory discrimination parameter values. The instrument has 51 questions that look promising for measuring health literacy in college students, but needs additional testing with a larger student population to see how these questions continue to perform.
Development of new selection tests for air traffic controllers.

DOT National Transportation Integrated Search

1977-12-01

This report describes the development of a new Multiplex Controller Aptitude Test for initial screening of FAA Air Traffic Controller applicants. Its content includes the traditional types of aptitude test items used for today's screening. In additio...
Development of an item bank for computerized adaptive test (CAT) measurement of pain.

PubMed

Petersen, Morten Aa; Aaronson, Neil K; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Hammerlid, Eva; Hjermstad, Marianne J; Kaasa, Stein; Loge, Jon H; Velikova, Galina; Young, Teresa; Groenvold, Mogens

2016-01-01

Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured by the EORTC QLQ-C30 questionnaire. The development process consisted of four steps: (1) literature search, (2) formulation of new items and expert evaluations, (3) pretesting and (4) field-testing and psychometric analyses for the final selection of items. In step 1, we identified 337 pain items from the literature. Twenty-nine new items fitting the QLQ-C30 item style were formulated in step 2 that were reduced to 26 items by expert evaluations. Based on interviews with 31 patients from Denmark, France and the UK, the list was further reduced to 21 items in step 3. In phase 4, responses were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25% compared to using the QLQ-C30 pain scale. We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ-C30 pain items. The EORTC pain CAT is currently available for "experimental" purposes.
Initial retrieval shields against retrieval-induced forgetting

PubMed Central

Racsmány, Mihály; Keresztes, Attila

2015-01-01

Testing, as a form of retrieval, can enhance learning but it can also induce forgetting of related memories, a phenomenon known as retrieval-induced forgetting (RIF). In four experiments we explored whether selective retrieval and selective restudy of target memories induce forgetting of related memories with or without initial retrieval of the entire learning set. In Experiment 1, subjects studied category-exemplar associations, some of which were then either restudied or retrieved. RIF occurred on a delayed final test only when memories were retrieved and not when they were restudied. In Experiment 2, following the study phase of category-exemplar associations, subjects attempted to recall all category-exemplar associations, then they selectively retrieved or restudied some of the exemplars. We found that, despite the huge impact on practiced items, selective retrieval/restudy caused no decrease in final recall of related items. In Experiment 3, we replicated the main result of Experiment 2 by manipulating initial retrieval as a within-subject variable. In Experiment 4 we replicated the main results of the previous experiments with non-practiced (Nrp) baseline items. These findings suggest that initial retrieval of the learning set shields against the forgetting effect of later selective retrieval. Together, our results support the context shift theory of RIF. PMID:26052293
Anatomy of a physics test: Validation of the physics items on the Texas Assessment of Knowledge and Skills

NASA Astrophysics Data System (ADS)

Marshall, Jill A.; Hagedorn, Eric A.; O'Connor, Jerry

2009-06-01

We report the results of an analysis of the Texas Assessment of Knowledge and Skills (TAKS) designed to determine whether the TAKS is a valid indicator of whether students know and can do physics at the level necessary for success in future coursework, STEM careers, and life in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam, performing full-information factor analysis, calculating classical test indices, and determining each item's response curve using item response theory. Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing.
[The effects of cue upon selective memorization].

PubMed

Watanabe, I

1982-10-01

The subjects listened to a list of 21 words read aloud and memorized eight words among them indicated by the cue. In VC (voice-cue) condition, where the change of voice between male and female cued the difference between the to-be-memorized and not-to-be-memorized items, the percentage of correct recall was higher and the number of intrusion-errors was fewer than in NVC (non-voice-cue) condition, where the cue was the sound of chime given immediately before the to-be-memorized item. The results suggest that the physical characteristics of cue facilitate the selective memorization, but do not necessarily support the early-selection theory of attention. Next, in order to confirm Watanabe's (1976) assertion that the transformation of to-be-memorized items into long-term memory and the exclusion of not-to-be-memorized items take place in parallel, the subjects were required to rehearse aloud every word as it was presented. However, it was found that the method of voiced rehearsal was inadequate to test the assertion.
PSSA Released Reading Items, 2000-2001. The Pennsylvania System of School Assessment.

ERIC Educational Resources Information Center

Pennsylvania State Dept. of Education, Harrisburg. Bureau of Curriculum and Academic Services.

This document contains materials directly related to the actual reading test of the Pennsylvania System of School Assessment (PSSA), including the reading rubric, released passages, selected-response questions with answer keys, performance tasks, and scored samples of students' responses to the tasks. All of these items may be duplicated to…
Two-Phase Item Selection Procedure for Flexible Content Balancing in CAT

ERIC Educational Resources Information Center

Cheng, Ying; Chang, Hua-Hua; Yi, Qing

2007-01-01

Content balancing is an important issue in the design and implementation of computerized adaptive testing (CAT). Content-balancing techniques that have been applied in fixed content balancing, where the number of items from each content area is fixed, include constrained CAT (CCAT), the modified multinomial model (MMM), modified constrained CAT…

An Empirically Keyed Scale for Measuring Managerial Attitudes toward Women Executives.

ERIC Educational Resources Information Center

Dubno, Peter; And Others

1979-01-01

A scale (Managerial Attitudes toward Women Executives Scale -- MATWES) provides reliability and validity measures regarding managerial attitudes toward women executives. It employs a projective test for item generation and uses a panel of women executives as Q-sorters to select items. The Scale and its value in minimizing researcher bias in its…
A sampling and classification item selection approach with content balancing.

PubMed

Chen, Pei-Hua

2015-03-01

Existing automated test assembly methods typically employ constrained combinatorial optimization. Constructing forms sequentially based on an optimization approach usually results in unparallel forms and requires heuristic modifications. Methods based on a random search approach have the major advantage of producing parallel forms sequentially without further adjustment. This study incorporated a flexible content-balancing element into the statistical perspective item selection method of the cell-only method (Chen et al. in Educational and Psychological Measurement, 72(6), 933-953, 2012). The new method was compared with a sequential interitem distance weighted deviation model (IID WDM) (Swanson & Stocking in Applied Psychological Measurement, 17(2), 151-166, 1993), a simultaneous IID WDM, and a big-shadow-test mixed integer programming (BST MIP) method to construct multiple parallel forms based on matching a reference form item-by-item. The results showed that the cell-only method with content balancing and the sequential and simultaneous versions of IID WDM yielded results comparable to those obtained using the BST MIP method. The cell-only method with content balancing is computationally less intensive than the sequential and simultaneous versions of IID WDM.
Assessing patients' experiences with communication across the cancer care continuum.

PubMed

Mazor, Kathleen M; Street, Richard L; Sue, Valerie M; Williams, Andrew E; Rabin, Borsika A; Arora, Neeraj K

2016-08-01

To evaluate the relevance, performance and potential usefulness of the Patient Assessment of cancer Communication Experiences (PACE) items. Items focusing on specific communication goals related to exchanging information, fostering healing relationships, responding to emotions, making decisions, enabling self-management, and managing uncertainty were tested via a retrospective, cross-sectional survey of adults who had been diagnosed with cancer. Analyses examined response frequencies, inter-item correlations, and coefficient alpha. A total of 366 adults were included in the analyses. Relatively few selected Does Not Apply, suggesting that items tap relevant communication experiences. Ratings of whether specific communication goals were achieved were strongly correlated with overall ratings of communication, suggesting item content reflects important aspects of communication. Coefficient alpha was ≥.90 for each item set, indicating excellent reliability. Variations in the percentage of respondents selecting the most positive response across items suggest results can identify strengths and weaknesses. The PACE items tap relevant, important aspects of communication during cancer care, and may be useful to cancer care teams desiring detailed feedback. The PACE is a new tool for eliciting patients' perspectives on communication during cancer care. It is freely available online for practitioners, researchers and others. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Developing and investigating the use of single-item measures in organizational research.

PubMed

Fisher, Gwenith G; Matthews, Russell A; Gibbons, Alyssa Mitchell

2016-01-01

The validity of organizational research relies on strong research methods, which include effective measurement of psychological constructs. The general consensus is that multiple item measures have better psychometric properties than single-item measures. However, due to practical constraints (e.g., survey length, respondent burden) there are situations in which certain single items may be useful for capturing information about constructs that might otherwise go unmeasured. We evaluated 37 items, including 18 newly developed items as well as 19 single items selected from existing multiple-item scales based on psychometric characteristics, to assess 18 constructs frequently measured in organizational and occupational health psychology research. We examined evidence of reliability; convergent, discriminant, and content validity assessments; and test-retest reliabilities at 1- and 3-month time lags for single-item measures using a multistage and multisource validation strategy across 3 studies, including data from N = 17 occupational health subject matter experts and N = 1,634 survey respondents across 2 samples. Items selected from existing scales generally demonstrated better internal consistency reliability and convergent validity, whereas these particular new items generally had higher levels of content validity. We offer recommendations regarding when use of single items may be more or less appropriate, as well as 11 items that seem acceptable, 14 items with mixed results that might be used with caution due to mixed results, and 12 items we do not recommend using as single-item measures. Although multiple-item measures are preferable from a psychometric standpoint, in some circumstances single-item measures can provide useful information. (c) 2016 APA, all rights reserved).
Validation of Physics Standardized Test Items

NASA Astrophysics Data System (ADS)

Marshall, Jill

2008-10-01

The Texas Physics Assessment Team (TPAT) examined the Texas Assessment of Knowledge and Skills (TAKS) to determine whether it is a valid indicator of physics preparation for future course work and employment, and of the knowledge and skills needed to act as an informed citizen in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam using standard statistical methods employed by test developers (factor analysis and Item Response Theory). Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing..
Criterion-Referenced Testing: A Critical Analysis of Selected Models

DTIC Science & Technology

1978-08-01

158025 .159372 4 (all .5 .5 0 0 .5 .5 fai l) a, bme probability that a master will be misclassified when the cutoff score is set at 2 correct equals...used the 45-item spiral - omnibus intelligence test for screening applicants to the Australian Army or Royal Australian Navy. Samples of 608 recruit...applicants to the Citizen Military Force (CM?) and 874 recruit applicants to the Royal Australian Navy were studied. Twelve items were deleted for zero
Criterion-Referenced Testing in Foreign Language Teaching.

ERIC Educational Resources Information Center

Takala, Sauli

A review of literature serves as the basis for a discussion of various aspects of criterion-referenced tests. The aspects discussed are: teaching and evaluation objectives, criterion- and norm-referenced measurement, stages in construction of criterion-referenced tests, construction and selection of items, test validity, and test reliability.…
Cognitive control over working memory biases of selection.

PubMed

Kiyonaga, Anastasia; Egner, Tobias; Soto, David

2012-08-01

Across many studies, researchers have found that representations in working memory (WM) can guide visual attention toward items that match the features of the WM contents. While some researchers have contended that this occurs involuntarily, others have suggested that the impact of WM contents on attention can be strategically controlled. Here, we varied the probability that WM items would coincide with either targets or distractors in a visual search task to examine (1) whether participants could intentionally enhance or inhibit the influence of WM items on attention and (2) whether cognitive control over WM biases would also affect access to the memory contents in a surprise recognition test. We found visual search to be faster when the WM item coincided with the search target, and this effect was enhanced when the memory item reliably predicted the location of the target. Conversely, visual search was slowed when the memory item coincided with a search distractor, and this effect was diminished, but not abolished, when the memory item was reliably associated with distractors. This strategic dampening of the influence of WM items on attention came at a price to memory, however, as participants were slowest to perform WM recognition tests on blocks in which the WM contents were consistently invalid. These results document that attentional capture by WM contents is partly, but not fully, malleable by top-down control, which appears to adjust the state of the WM contents to optimize search behavior. These data illustrate the role of cognitive control in modulating the strength of WM biases of selection, and they support a tight coupling between WM and attention.
Development of Listening Comprehension Tests with Narrative and Expository Texts for Portuguese Students.

PubMed

Santos, Sandra; Viana, Fernanda Leopoldina; Ribeiro, Iolanda; Prieto, Gerardo; Brandão, Sara; Cadime, Irene

2015-03-03

This investigation aimed to develop and collect psychometric data for two tests assessing listening comprehension of Portuguese students in primary school: the Test of Listening Comprehension of Narrative Texts (TLC-n) and the Test of Listening Comprehension of Expository Texts (TLC-e). Two studies were conducted. The purpose of study 1 was to construct four test forms for each of the two tests to assess first, second, third and fourth grade students of the primary school. The TLC-n was administered to 1042 students, and the TLC-e was administered to 848 students. The purpose of study 2 was to test the psychometric properties of new items for the TLC-n form for fourth graders, given that the results in study 1 indicated a severe lack of difficult items. The participants were 260 fourth graders. The data were analysed using the Rasch model. Thirty items were selected for each test form. The results provided support for the model assumptions: Unidimensionality and local independence of the items. The reliability coefficients were higher than .70 for all test forms. The TLC-n and the TLC-e present good psychometric properties and represent an important contribution to the learning disabilities assessment field.
Development and testing of item response theory-based item banks and short forms for eye, skin and lung problems in sarcoidosis.

PubMed

Victorson, David E; Choi, Seung; Judson, Marc A; Cella, David

2014-05-01

Sarcoidosis is a multisystem disease that can negatively impact health-related quality of life (HRQL) across generic (e.g., physical, social and emotional wellbeing) and disease-specific (e.g., pulmonary, ocular, dermatologic) domains. Measurement of HRQL in sarcoidosis has largely relied on generic patient-reported outcome tools, with little disease-specific measures available. The purpose of this paper is to present the development and testing of disease-specific item banks and short forms of lung, skin and eye problems, which are a part of a new patient-reported outcome (PRO) instrument called the sarcoidosis assessment tool. After prioritizing and selecting the most important disease-specific domains, we wrote new items to reflect disease-specific problems by drawing from patient focus group and clinician expert survey data that were used to create our conceptual model of HRQL in sarcoidosis. Item pools underwent cognitive interviews by sarcoidosis patients (n = 13), and minor modifications were made. These items were administered in a multi-site study (n = 300) to obtain item calibrations and create calibrated short forms using item response theory (IRT) approaches. From the available item pools, we created four new item banks and short forms: (1) skin problems, (2) skin stigma, (3) lung problems, and (4) eye Problems. We also created and tested supplemental forms of the most common constitutional symptoms and negative effects of corticosteroids. Several new sarcoidosis-specific PROs were developed and tested using IRT approaches. These new measures can advance more precise and targeted HRQL assessment in sarcoidosis clinical trials and clinical practice.
Overview and current management of computerized adaptive testing in licensing/certification examinations.

PubMed

Seo, Dong Gi

2017-01-01

Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees' ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.
Overview and current management of computerized adaptive testing in licensing/certification examinations

PubMed Central

2017-01-01

Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations. PMID:28811394
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

PubMed

Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

2014-01-01

Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Feature-based and spatial attentional selection in visual working memory.

PubMed

Heuer, Anna; Schubö, Anna

2016-05-01

The contents of visual working memory (VWM) can be modulated by spatial cues presented during the maintenance interval ("retrocues"). Here, we examined whether attentional selection of representations in VWM can also be based on features. In addition, we investigated whether the mechanisms of feature-based and spatial attention in VWM differ with respect to parallel access to noncontiguous locations. In two experiments, we tested the efficacy of valid retrocues relying on different kinds of information. Specifically, participants were presented with a typical spatial retrocue pointing to two locations, a symbolic spatial retrocue (numbers mapping onto two locations), and two feature-based retrocues: a color retrocue (a blob of the same color as two of the items) and a shape retrocue (an outline of the shape of two of the items). The two cued items were presented at either contiguous or noncontiguous locations. Overall retrocueing benefits, as compared to a neutral condition, were observed for all retrocue types. Whereas feature-based retrocues yielded benefits for cued items presented at both contiguous and noncontiguous locations, spatial retrocues were only effective when the cued items had been presented at contiguous locations. These findings demonstrate that attentional selection and updating in VWM can operate on different kinds of information, allowing for a flexible and efficient use of this limited system. The observation that the representations of items presented at noncontiguous locations could only be reliably selected with feature-based retrocues suggests that feature-based and spatial attentional selection in VWM rely on different mechanisms, as has been shown for attentional orienting in the external world.
Item generation and design testing of a questionnaire to assess degenerative joint disease-associated pain in cats.

PubMed

Zamprogno, Helia; Hansen, Bernie D; Bondell, Howard D; Sumrell, Andrea Thomson; Simpson, Wendy; Robertson, Ian D; Brown, James; Pease, Anthony P; Roe, Simon C; Hardie, Elizabeth M; Wheeler, Simon J; Lascelles, B Duncan X

2010-12-01

To determine the items (question topics) for a subjective instrument to assess degenerative joint disease (DJD)-associated chronic pain in cats and determine the instrument design most appropriate for use by cat owners. 100 randomly selected client-owned cats from 6 months to 20 years old. Cats were evaluated to determine degree of radiographic DJD and signs of pain throughout the skeletal system. Two groups were identified: high DJD pain and low DJD pain. Owner-answered questions about activity and signs of pain were compared between the 2 groups to define items relating to chronic DJD pain. Interviews with 45 cat owners were performed to generate items. Fifty-three cat owners who had not been involved in any other part of the study, 19 veterinarians, and 2 statisticians assessed 6 preliminary instrument designs. 22 cats were selected for each group; 19 important items were identified, resulting in 12 potential items for the instrument; and 3 additional items were identified from owner interviews. Owners and veterinarians selected a 5-point descriptive instrument design over 11-point or visual analogue scale formats. Behaviors relating to activity were substantially different between healthy cats and cats with signs of DJD-associated pain. Fifteen items were identified as being potentially useful, and the preferred instrument design was identified. This information could be used to construct an owner-based questionnaire to assess feline DJD-associated pain. Once validated, such a questionnaire would assist in evaluating potential analgesic treatments for these patients.
Balancing Flexible Constraints and Measurement Precision in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Moyer, Eric L.; Galindo, Jennifer L.; Dodd, Barbara G.

2012-01-01

Managing test specifications--both multiple nonstatistical constraints and flexibly defined constraints--has become an important part of designing item selection procedures for computerized adaptive tests (CATs) in achievement testing. This study compared the effectiveness of three procedures: constrained CAT, flexible modified constrained CAT,…
Measuring Alexithymia via Trait Approach-I: A Alexithymia Scale Item Selection and Formation of Factor Structure

PubMed Central

TATAR, Arkun; SALTUKOĞLU, Gaye; ALİOĞLU, Seda; ÇİMEN, Sümeyye; GÜVEN, Hülya; AY, Çağla Ebru

2017-01-01

Introduction It is not clear in the literature whether available instruments are sufficient to measure alexithymia because of its theoretical structure. Moreover, it has been reported that several measuring instruments are needed to measure this construct, and all the instruments have different error sources. The old and the new forms of Toronto Alexithymia Scale are the only instruments available in Turkish. Thus, the purpose of this study was to develop a new scale to measure alexithymia, selecting items and constructing the factor structure. Methods A total of 1117 patients aged from 19 to 82 years (mean = 35.05 years) were included. A 100-item pool was prepared and applied to 628 women and 489 men. Data were analyzed using Explanatory Factor Analysis, Confirmatory Factor Analysis, and Item Response Theory and 28 items were selected. The new form of 28 items was applied to 415 university students, including 271 women and 144 men aged from 18 to 30 (mean=21.44). Results The results of Explanatory Factor Analysis revealed a five-factor construct of “Solving and Expressing Affective Experiences,” “External Locused Cognitive Style,” “Tendency to Somatize Affections,” “Imaginary Life and Visualization,” and “Acting Impulsively,” along with a two-factor construct representing the “Affective” and “Cognitive” components. All the components of the construct showed good model fit and high internal consistency. The new form was tested in terms of internal consistency, test-retest reliability, and concurrent validity using Toronto Alexithymia Scale as criteria and discriminative validity using Five-Factor Personality Inventory Short Form. Conclusion The results showed that the new scale met the basic psychometric requirements. Results have been discussed in line with related studies. PMID:29033633
Testing comparison models of DASS-12 and its reliability among adolescents in Malaysia.

PubMed

Osman, Zubaidah Jamil; Mukhtar, Firdaus; Hashim, Hairul Anuar; Abdul Latiff, Latiffah; Mohd Sidik, Sherina; Awang, Hamidin; Ibrahim, Normala; Abdul Rahman, Hejar; Ismail, Siti Irma Fadhilah; Ibrahim, Faisal; Tajik, Esra; Othman, Norlijah

2014-10-01

The 21-item Depression, Anxiety and Stress Scale (DASS-21) is frequently used in non-clinical research to measure mental health factors among adults. However, previous studies have concluded that the 21 items are not stable for utilization among the adolescent population. Thus, the aims of this study are to examine the structure of the factors and to report on the reliability of the refined version of the DASS that consists of 12 items. A total of 2850 students (aged 13 to 17 years old) from three major ethnic in Malaysia completed the DASS-21. The study was conducted at 10 randomly selected secondary schools in the northern state of Peninsular Malaysia. The study population comprised secondary school students (Forms 1, 2 and 4) from the selected schools. Based on the results of the EFA stage, 12 items were included in a final CFA to test the fit of the model. Using maximum likelihood procedures to estimate the model, the selected fit indices indicated a close model fit (χ(2)=132.94, df=57, p=.000; CFI=.96; RMR=.02; RMSEA=.04). Moreover, significant loadings of all the unstandardized regression weights implied an acceptable convergent validity. Besides the convergent validity of the item, a discriminant validity of the subscales was also evident from the moderate latent factor inter-correlations, which ranged from .62 to .75. The subscale reliability was further estimated using Cronbach's alpha and the adequate reliability of the subscales was obtained (Total=76; Depression=.68; Anxiety=.53; Stress=.52). The new version of the 12-item DASS for adolescents in Malaysia (DASS-12) is reliable and has a stable factor structure, and thus it is a useful instrument for distinguishing between depression, anxiety and stress. Copyright © 2014 Elsevier Inc. All rights reserved.
Concurrent Validity of Selected Movement Skill Items in the New Zealand Ministry of Education's Health and Physical Education Assessment

ERIC Educational Resources Information Center

Miyahara, Motohide; Clarkson, Jenny

2005-01-01

The concurrent validity of the New Zealand Ministry of Education's Health and Physical Education Assessment (HPEA) (Crooks & Flockton, 1999) was examined with the respective items from the Movement Assessment Battery for Children (Henderson & Sugden, 2000) and the Bruininks-Oseretsky Test of Motor Proficiency (Bruininks, 1978) on manual…
Role of the Dorsal Hippocampus in Object Memory Load

ERIC Educational Resources Information Center

Sannino, Sara; Russo, Fabio; Torromino, Giulia; Pendolino, Valentina; Calabresi, Paolo; De Leonibus, Elvira

2012-01-01

The dorsal hippocampus is crucial for mammalian spatial memory, but its exact role in item memory is still hotly debated. Recent evidence in humans suggested that the hippocampus might be selectively involved in item short-term memory to deal with an increasing memory load. In this study, we sought to test this hypothesis. To this aim we developed…

Talent identification model for sprinter using discriminant factor

NASA Astrophysics Data System (ADS)

Kusnanik, N. W.; Hariyanto, A.; Herdyanto, Y.; Satia, A.

2018-01-01

The main purpose of this study was to identify young talented sprinter using discriminant factor. The research was conducted in 3 steps including item pool, screening of item pool, and trial of instruments at the small and big size of samples. 315 male elementary school students participated in this study with mean age of 11-13 years old. Data were collected by measuring anthropometry (standing height, sitting height, body mass, and leg length); testing physical fitness (40m sprint for speed, shuttle run for agility, standing broad jump for power, multistage fitness test for endurance). Data were analyzed using discriminant factor. The result of this study found that there were 5 items that selected as an instrument to identify young talented sprinter: sitting height, body mass, leg length, sprint 40m, and multistage fitness test. Model of Discriminant for talent identification in sprinter was D = -24,497 + (0,155 sitting height) + (0,080 body mass) + (0,148 leg length) + (-1,225 Sprint 40m) + (0,563 MFT). The conclusion of this study: instrument tests that have been selected and discriminant model that have been found can be applied to identify young talented as a sprinter.
78 FR 36576 - Agency Information Collection Activities; Reinstatement of a Previously Approved Collection...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-18

... estimate of the burden of the proposed collection of information, including the validity of the methodology... contacted a second time to participate in reliability testing for selected items. This testing will average...
The value of item response theory in clinical assessment: a review.

PubMed

Thomas, Michael L

2011-09-01

Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical assessment are reviewed to appraise its current and potential value. Benefits of IRT include comprehensive analyses and reduction of measurement error, creation of computer adaptive tests, meaningful scaling of latent variables, objective calibration and equating, evaluation of test and item bias, greater accuracy in the assessment of change due to therapeutic intervention, and evaluation of model and person fit. The theory may soon reinvent the manner in which tests are selected, developed, and scored. Although challenges remain to the widespread implementation of IRT, its application to clinical assessment holds great promise. Recommendations for research, test development, and clinical practice are provided.
Computerized Adaptive Testing: An Overview and an Example.

ERIC Educational Resources Information Center

McBride, James R.

The advantages of computerized adaptive testing are discussed, and an example illustrates its use in sixth grade mathematics. These tests are administered at a computer terminal, and the test items to be administered are selected according to the difficulty level appropriate to the individual's ability. Tailoring increases the psychometric…
Creating IRT-Based Parallel Test Forms Using the Genetic Algorithm Method

ERIC Educational Resources Information Center

Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen

2008-01-01

In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…
Development of the PROMIS coping expectancies of smoking item banks.

PubMed

Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

2014-09-01

Smoking is a coping strategy for many smokers who then have difficulty finding new ways to cope with negative affect when they quit. This paper describes analyses conducted to develop and evaluate item banks for assessing the coping expectancies of smoking for daily and nondaily smokers. Using data from a large sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning (DIF) analyses (according to gender, age, and ethnicity) to arrive at a unidimensional set of items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) for assessing coping expectancies of smoking. For both daily and nondaily smokers, the unidimensional Coping Expectancies item banks (21 items) are relatively DIF free and are highly reliable (0.96 and 0.97, respectively). A common 4-item SF for daily and nondaily smokers also showed good reliability (0.85). Adaptive tests required an average of 4.3 and 3.7 items for simulated daily and nondaily respondents, respectively, and achieved reliabilities of 0.91 for both when the maximum test length was 10 items. This research provides a new set of items that can be used to reliably assess coping expectancies of smoking, through a SF, CAT, or a tailored set selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Item-focussed Trees for the Identification of Items in Differential Item Functioning.

PubMed

Tutz, Gerhard; Berger, Moritz

2016-09-01

A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
Development of a brief measure of generativity and ego-integrity for use in palliative care settings.

PubMed

Vuksanovic, Dean; Dyck, Murray; Green, Heather

2015-10-01

Our aim was to develop and test a brief measure of generativity and ego-integrity that is suitable for use in palliative care settings. Two measures of generativity and ego-integrity were modified and combined to create a new 11-item questionnaire, which was then administered to 143 adults. A principal-component analysis with oblique rotation was performed in order to identify underlying components that can best account for variation in the 11 questionnaire items. The two-component solution was consistent with the items that, on conceptual grounds, were intended to comprise the two constructs assessed by the questionnaire. Results suggest that the selected 11 items were good representatives of the larger scales from which they were selected, and they are expected to provide a useful means of measuring these concepts near the end of life.
Web-based computer adaptive assessment of individual perceptions of job satisfaction for hospital workplace employees

PubMed Central

2011-01-01

Background To develop a web-based computer adaptive testing (CAT) application for efficiently collecting data regarding workers' perceptions of job satisfaction, we examined whether a 37-item Job Content Questionnaire (JCQ-37) could evaluate the job satisfaction of individual employees as a single construct. Methods The JCQ-37 makes data collection via CAT on the internet easy, viable and fast. A Rasch rating scale model was applied to analyze data from 300 randomly selected hospital employees who participated in job-satisfaction surveys in 2008 and 2009 via non-adaptive and computer-adaptive testing, respectively. Results Of the 37 items on the questionnaire, 24 items fit the model fairly well. Person-separation reliability for the 2008 surveys was 0.88. Measures from both years and item-8 job satisfaction for groups were successfully evaluated through item-by-item analyses by using t-test. Workers aged 26 - 35 felt that job satisfaction was significantly worse in 2009 than in 2008. Conclusions A Web-CAT developed in the present paper was shown to be more efficient than traditional computer-based or pen-and-paper assessments at collecting data regarding workers' perceptions of job content. PMID:21496311
Varying the valuating function and the presentable bank in computerized adaptive testing.

PubMed

Barrada, Juan Ramón; Abad, Francisco José; Olea, Julio

2011-05-01

In computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test. In this study we expand the idea of combining the matching criterion with the Fisher information function. We also manipulate the number of strata into which the bank is divided. We find that the manipulation of the number of items administered with each function makes it possible to move from the pole of high accuracy and low security to the opposite pole. It is possible to greatly improve item bank security with much fewer losses in accuracy by selecting several items with the matching criterion. In general, it seems more appropriate not to stratify the bank.
Detecting Intrajudge Inconsistency in Standard Setting Using Test Items with a Selected-Response Format. Research Report.

ERIC Educational Resources Information Center

van der Linden, Wim J.; Vos, Hans J.; Chang, Lei

In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of performance. Methods to check standard setting…
FamilyPso - a new questionnaire to assess the impact of psoriasis on partners and family of patients.

PubMed

Mrowietz, U; Hartmann, A; Weißmann, W; Zschocke, I

2017-01-01

Psoriasis is a lifelong disease for which there is no cure. It has been conclusively shown across all ethnicities that patients suffering from psoriasis have a significantly reduced health-related quality of life and a high disease burden. Surprisingly little is known about the impact of a patient's psoriasis on partners or family members. To address this issue a systematic literature search has been conducted and interviews with relatives of psoriasis patients living in the same household were performed. From this collected information, items were generated that were commonly mentioned to affect living and tested in a large group of relatives before the final item selection was done. A first set of 29 items was selected and tested in a study with 96 patient relatives. After adjustment and statistical analysis, the final FamilyPso questionnaire was condensed to 15 items to assess the burden of partners or family members living together with psoriasis patients. The FamilyPso enables physicians to achieve a better understanding of the impact of psoriasis as a lifelong chronic disease on partners and the family environment. © 2016 European Academy of Dermatology and Venereology.
Parameter Estimation in Rasch Models for Examinee-Selected Items

ERIC Educational Resources Information Center

Liu, Chen-Wei; Wang, Wen-Chung

2017-01-01

The examinee-selected-item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., choose one item to respond from a pair of items), always yields incomplete data (i.e., only the selected items are answered and the others have missing data) that are likely nonignorable. Therefore, using…
Applying Computerized Adaptive Testing to the Negative Acts Questionnaire-Revised: Rasch Analysis of Workplace Bullying

PubMed Central

Ma, Shu-Ching; Li, Yu-Chi; Yui, Mei-Shu

2014-01-01

Background Workplace bullying is a prevalent problem in contemporary work places that has adverse effects on both the victims of bullying and organizations. With the rapid development of computer technology in recent years, there is an urgent need to prove whether item response theory–based computerized adaptive testing (CAT) can be applied to measure exposure to workplace bullying. Objective The purpose of this study was to evaluate the relative efficiency and measurement precision of a CAT-based test for hospital nurses compared to traditional nonadaptive testing (NAT). Under the preliminary conditions of a single domain derived from the scale, a CAT module bullying scale model with polytomously scored items is provided as an example for evaluation purposes. Methods A total of 300 nurses were recruited and responded to the 22-item Negative Acts Questionnaire-Revised (NAQ-R). All NAT (or CAT-selected) items were calibrated with the Rasch rating scale model and all respondents were randomly selected for a comparison of the advantages of CAT and NAT in efficiency and precision by paired t tests and the area under the receiver operating characteristic curve (AUROC). Results The NAQ-R is a unidimensional construct that can be applied to measure exposure to workplace bullying through CAT-based administration. Nursing measures derived from both tests (CAT and NAT) were highly correlated (r=.97) and their measurement precisions were not statistically different (P=.49) as expected. CAT required fewer items than NAT (an efficiency gain of 32%), suggesting a reduced burden for respondents. There were significant differences in work tenure between the 2 groups (bullied and nonbullied) at a cutoff point of 6 years at 1 worksite. An AUROC of 0.75 (95% CI 0.68-0.79) with logits greater than –4.2 (or >30 in summation) was defined as being highly likely bullied in a workplace. Conclusions With CAT-based administration of the NAQ-R for nurses, their burden was substantially reduced without compromising measurement precision. PMID:24534113
Variable-Length Computerized Adaptive Testing: Adaptation of the A-Stratified Strategy in Item Selection with Content Balancing

ERIC Educational Resources Information Center

Huo, Yan

2009-01-01

Variable-length computerized adaptive testing (CAT) can provide examinees with tailored test lengths. With the fixed standard error of measurement ("SEM") termination rule, variable-length CAT can achieve predetermined measurement precision by using relatively shorter tests compared to fixed-length CAT. To explore the application of…
Role of Cognitive Testing in the Development of the CAHPS® Hospital Survey

PubMed Central

Levine, Roger E; Fowler, Floyd J; Brown, Julie A

2005-01-01

Objective To describe how cognitive testing results were used to inform the modification and selection of items for the Consumer Assessment of Health Providers and Systems (CAHPS®) Hospital Survey pilot test instrument. Data Sources Cognitive interviews were conducted on 31 subjects in two rounds of testing: in December 2002–January 2003 and in February 2003. In both rounds, interviews were conducted in northern California, southern California, Massachusetts, and North Carolina. Study Design A common protocol served as the basis for cognitive testing activities in each round. This protocol was modified to enable testing of the items as interviewer-administered and self-administered items and to allow members of each of three research teams to use their preferred cognitive research tools. Data Collection/Extraction Methods Each research team independently summarized, documented, and reported their findings. Item-specific and general issues were noted. The results were reviewed and discussed by senior staff from each research team after each round of testing, to inform the acceptance, modification, or elimination of candidate items. Principal Findings Many candidate items required modification because respondents lacked the information required to answer them, respondents failed to understand them consistently, the items were not measuring the constructs they were intended to measure, the items were based on erroneous assumptions about what respondents wanted or experienced during their hospitalization, or the items were asking respondents to make distinctions that were too fine for them to make. Cognitive interviewing enabled the detection of these problems; an understanding of the etiology of the problem informed item revisions. However, for some constructs, the revisions proved to be inadequate. Accordingly, items could not be developed to provide acceptable measures of certain constructs such as shared decision making, coordination of care, and delays in the admissions process. Conclusions Cognitive testing is the most direct way of finding out whether respondents understand questions consistently, have the information needed to answer the questions, and can use the response alternatives provided to describe their experiences or their opinions accurately. Many of the candidate questions failed to meet these standards. Cognitive testing only evaluates the way in which respondents understand and answer questions. Although it does not directly assess the validity of the answers, it is a reasonable premise that cognitive problems will seriously compromise validity and reliability. PMID:16316437
Optimal Item Selection with Credentialing Examinations.

ERIC Educational Resources Information Center

Hambleton, Ronald K.; And Others

The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…
Methodology for the development and calibration of the SCI-QOL item banks

PubMed Central

Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David

2015-01-01

Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Methodology for the development and calibration of the SCI-QOL item banks.

PubMed

Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David

2015-05-01

To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Development of the PROMIS health expectancies of smoking item banks.

PubMed

Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cerully, Jennifer; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

Smokers' health-related outcome expectancies are associated with a number of important constructs in smoking research, yet there are no measures currently available that focus exclusively on this domain. This paper describes the development and evaluation of item banks for assessing the health expectancies of smoking. Using data from a sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of health expectancies items for daily and nondaily smokers. We also evaluated the performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess health expectancies. A total of 24 items were included in the Health Expectancies item banks; 13 items are common across daily and nondaily smokers, 6 are unique to daily, and 5 are unique to nondaily. For both daily and nondaily smokers, the Health Expectancies item banks are unidimensional, reliable (reliability = 0.95 and 0.96, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.87). Results from simulated CATs showed that health expectancies can be assessed with good precision with an average of 5-6 items adaptively selected from the item banks. Health expectancies of smoking can be assessed on the basis of these item banks via SFs, CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Development of the PROMIS nicotine dependence item banks.

PubMed

Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

2014-09-01

Nicotine dependence is a core construct important for understanding cigarette smoking and smoking cessation behavior. This article describes analyses conducted to develop and evaluate item banks for assessing nicotine dependence among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of nicotine dependence items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess dependence. A total of 32 items were included in the Nicotine Dependence item banks; 22 items are common across daily and nondaily smokers, 5 are unique to daily smokers, and 5 are unique to nondaily smokers. For both daily and nondaily smokers, the Nicotine Dependence item banks are strongly unidimensional, highly reliable (reliability = 0.97 and 0.97, respectively), and perform similarly across gender, age, and race/ethnicity groups. SFs common to daily and nondaily smokers consist of 8 and 4 items (reliability = 0.91 and 0.81, respectively). Results from simulated CATs showed that dependence can be assessed with very good precision for most respondents using fewer than 6 items adaptively selected from the item banks. Nicotine dependence on cigarettes can be assessed on the basis of these item banks via one of the SFs, by using CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of a subjective cognitive decline questionnaire using item response theory: a pilot study.

PubMed

Gifford, Katherine A; Liu, Dandan; Romano, Raymond; Jones, Richard N; Jefferson, Angela L

2015-12-01

Subjective cognitive decline (SCD) may indicate unhealthy cognitive changes, but no standardized SCD measurement exists. This pilot study aims to identify reliable SCD questions. 112 cognitively normal (NC, 76±8 years, 63% female), 43 mild cognitive impairment (MCI; 77±7 years, 51% female), and 33 diagnostically ambiguous participants (79±9 years, 58% female) were recruited from a research registry and completed 57 self-report SCD questions. Psychometric methods were used for item-reduction. Factor analytic models assessed unidimensionality of the latent trait (SCD); 19 items were removed with extreme response distribution or trait-fit. Item response theory (IRT) provided information about question utility; 17 items with low information were dropped. Post-hoc simulation using computerized adaptive test (CAT) modeling selected the most commonly used items (n=9 of 21 items) that represented the latent trait well (r=0.94) and differentiated NC from MCI participants (F(1,146)=8.9, p=0.003). Item response theory and computerized adaptive test modeling identified nine reliable SCD items. This pilot study is a first step toward refining SCD assessment in older adults. Replication of these findings and validation with Alzheimer's disease biomarkers will be an important next step for the creation of a SCD screener.
Measuring pain phenomena after spinal cord injury: Development and psychometric properties of the SCI-QOL Pain Interference and Pain Behavior assessment tools.

PubMed

Cohen, Matthew L; Kisala, Pamela A; Dyson-Hudson, Trevor A; Tulsky, David S

2018-05-01

To develop modern patient-reported outcome measures that assess pain interference and pain behavior after spinal cord injury (SCI). Grounded-theory based qualitative item development; large-scale item calibration field-testing; confirmatory factor analyses; graded response model item response theory analyses; statistical linking techniques to transform scores to the Patient Reported Outcome Measurement Information System (PROMIS) metric. Five SCI Model Systems centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. N/A. Spinal Cord Injury - Quality of Life (SCI-QOL) Pain Interference item bank, SCI-QOL Pain Interference short form, and SCI-QOL Pain Behavior scale. Seven hundred fifty-seven individuals with traumatic SCI completed 58 items addressing various aspects of pain. Items were then separated by whether they assessed pain interference or pain behavior, and poorly functioning items were removed. Confirmatory factor analyses confirmed that each set of items was unidimensional, and item response theory analyses were used to estimate slopes and thresholds for the items. Ultimately, 7 items (4 from PROMIS) comprised the Pain Behavior scale and 25 items (18 from PROMIS) comprised the Pain Interference item bank. Ten of these 25 items were selected to form the Pain Interference short form. The SCI-QOL Pain Interference item bank and the SCI-QOL Pain Behavior scale demonstrated robust psychometric properties. The Pain Interference item bank is available as a computer adaptive test or short form for research and clinical applications, and scores are transformed to the PROMIS metric.
Toward a More Systematic Assessment of Smoking: Development of a Smoking Module for PROMIS®

PubMed Central

Tucker, Joan S.; Shadel, William G.; Stucky, Brian D.; Cai, Li

2012-01-01

Introduction The aim of the PROMIS® Smoking Initiative is to develop, evaluate, and standardize item banks to assess cigarette smoking behavior and biopsychosocial constructs associated with smoking for both daily and non-daily smokers. Methods We used qualitative methods to develop the item pool (following the PROMIS® approach: e.g., literature search, “binning and winnowing” of items, and focus groups and cognitive interviews to finalize wording and format), and quantitative methods (e.g., factor analysis) to develop the item banks. Results We considered a total of 1622 extant items, and 44 new items for inclusion in the smoking item banks. A final set of 277 items representing 11 conceptual domains was selected for field testing in a national sample of smokers. Using data from 3021 daily smokers in the field test, an iterative series of exploratory factor analyses and project team discussions resulted in six item banks: Positive Consequences of Smoking (40 items), Smoking Dependence/Craving (55 items), Health Consequences of Smoking (26 items), Psychosocial Consequences of Smoking (37 items), Coping Aspects of Smoking (30 items), and Social Factors of Smoking (23 items). Conclusions Inclusion of a smoking domain in the PROMIS® framework will standardize measurement of key smoking constructs using state-of-the-art psychometric methods, and make them widely accessible to health care providers, smoking researchers and the large community of researchers using PROMIS® who might not otherwise include an assessment of smoking in their design. Next steps include reducing the number of items in each domain, conducting confirmatory analyses, and duplicating the process for non-daily smokers. PMID:22770824
Toward a more systematic assessment of smoking: development of a smoking module for PROMIS®.

PubMed

Edelen, Maria O; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cai, Li

2012-11-01

The aim of the PROMIS® Smoking Initiative is to develop, evaluate, and standardize item banks to assess cigarette smoking behavior and biopsychosocial constructs associated with smoking for both daily and non-daily smokers. We used qualitative methods to develop the item pool (following the PROMIS® approach: e.g., literature search, "binning and winnowing" of items, and focus groups and cognitive interviews to finalize wording and format), and quantitative methods (e.g., factor analysis) to develop the item banks. We considered a total of 1622 extant items, and 44 new items for inclusion in the smoking item banks. A final set of 277 items representing 11 conceptual domains was selected for field testing in a national sample of smokers. Using data from 3021 daily smokers in the field test, an iterative series of exploratory factor analyses and project team discussions resulted in six item banks: Positive Consequences of Smoking (40 items), Smoking Dependence/Craving (55 items), Health Consequences of Smoking (26 items), Psychosocial Consequences of Smoking (37 items), Coping Aspects of Smoking (30 items), and Social Factors of Smoking (23 items). Inclusion of a smoking domain in the PROMIS® framework will standardize measurement of key smoking constructs using state-of-the-art psychometric methods, and make them widely accessible to health care providers, smoking researchers and the large community of researchers using PROMIS® who might not otherwise include an assessment of smoking in their design. Next steps include reducing the number of items in each domain, conducting confirmatory analyses, and duplicating the process for non-daily smokers. Copyright © 2012 Elsevier Ltd. All rights reserved.
Development and psychometric evaluation of a cardiovascular risk and disease management knowledge assessment tool.

PubMed

Rosneck, James S; Hughes, Joel; Gunstad, John; Josephson, Richard; Noe, Donald A; Waechter, Donna

2014-01-01

This article describes the systematic construction and psychometric analysis of a knowledge assessment instrument for phase II cardiac rehabilitation (CR) patients measuring risk modification disease management knowledge and behavioral outcomes derived from national standards relevant to secondary prevention and management of cardiovascular disease. First, using adult curriculum based on disease-specific learning outcomes and competencies, a systematic test item development process was completed by clinical staff. Second, a panel of educational and clinical experts used an iterative process to identify test content domain and arrive at consensus in selecting items meeting criteria. Third, the resulting 31-question instrument, the Cardiac Knowledge Assessment Tool (CKAT), was piloted in CR patients to ensure use of application. Validity and reliability analyses were performed on 3638 adults before test administrations with additional focused analyses on 1999 individuals completing both pretreatment and posttreatment administrations within 6 months. Evidence of CKAT content validity was substantiated, with 85% agreement among content experts. Evidence of construct validity was demonstrated via factor analysis identifying key underlying factors. Estimates of internal consistency, for example, Cronbach's α = .852 and Spearman-Brown split-half reliability = 0.817 on pretesting, support test reliability. Item analysis, using point biserial correlation, measured relationships between performance on single items and total score (P < .01). Analyses using item difficulty and item discrimination indices further verified item stability and validity of the CKAT. A knowledge instrument specifically designed for an adult CR population was systematically developed and tested in a large representative patient population, satisfying psychometric parameters, including validity and reliability.
The tinnitus functional index: development of a new clinical measure for chronic, intrusive tinnitus.

PubMed

Meikle, Mary B; Henry, James A; Griest, Susan E; Stewart, Barbara J; Abrams, Harvey B; McArdle, Rachel; Myers, Paula J; Newman, Craig W; Sandridge, Sharon; Turk, Dennis C; Folmer, Robert L; Frederick, Eric J; House, John W; Jacobson, Gary P; Kinney, Sam E; Martin, William H; Nagler, Stephen M; Reich, Gloria E; Searchfield, Grant; Sweetow, Robert; Vernon, Jack A

2012-01-01

Chronic subjective tinnitus is a prevalent condition that causes significant distress to millions of Americans. Effective tinnitus treatments are urgently needed, but evaluating them is hampered by the lack of standardized measures that are validated for both intake assessment and evaluation of treatment outcomes. This work was designed to develop a new self-report questionnaire, the Tinnitus Functional Index (TFI), that would have documented validity both for scaling the severity and negative impact of tinnitus for use in intake assessment and for measuring treatment-related changes in tinnitus (responsiveness) and that would provide comprehensive coverage of multiple tinnitus severity domains. To use preexisting knowledge concerning tinnitus-related problems, an Item Selection Panel (17 expert judges) surveyed the content (175 items) of nine widely used tinnitus questionnaires. From those items, the Panel identified 13 separate domains of tinnitus distress and selected 70 items most likely to be responsive to treatment effects. Eliminating redundant items while retaining good content validity and adding new items to achieve the recommended minimum of 3 to 4 items per domain yielded 43 items, which were then used for constructing TFI Prototype 1.Prototype 1 was tested at five clinics. The 326 participants included consecutive patients receiving tinnitus treatment who provided informed consent-constituting a convenience sample. Construct validity of Prototype 1 as an outcome measure was evaluated by measuring responsiveness of the overall scale and its individual items at 3 and 6 mo follow-up with 65 and 42 participants, respectively. Using a predetermined list of criteria, the 30 best-functioning items were selected for constructing TFI Prototype 2.Prototype 2 was tested at four clinics with 347 participants, including 155 and 86 who provided 3 and 6 mo follow-up data, respectively. Analyses were the same as for Prototype 1. Results were used to select the 25 best-functioning items for the final TFI. Both prototypes and the final TFI displayed strong measurement properties, with few missing data, high validity for scaling of tinnitus severity, and good reliability. All TFI versions exhibited the same eight factors characterizing tinnitus severity and negative impact. Responsiveness, evaluated by computing effect sizes for responses at follow-up, was satisfactory in all TFI versions.In the final TFI, Cronbach's alpha was 0.97 and test-retest reliability 0.78. Convergent validity (r = 0.86 with Tinnitus Handicap Inventory [THI]; r = 0.75 with Visual Analog Scale [VAS]) and discriminant validity (r = 0.56 with Beck Depression Inventory-Primary Care [BDI-PC]) were good. The final TFI was successful at detecting improvement from the initial clinic visit to 3 mo with moderate to large effect sizes and from initial to 6 mo with large effect sizes. Effect sizes for the TFI were generally larger than those obtained for the VAS and THI. After careful evaluation, a 13-point reduction was considered a preliminary criterion for meaningful reduction in TFI outcome scores. The TFI should be useful in both clinical and research settings because of its responsiveness to treatment-related change, validity for scaling the overall severity of tinnitus, and comprehensive coverage of multiple domains of tinnitus severity.
Development process of an assessment tool for disruptive behavior problems in cross-cultural settings: the Disruptive Behavior International Scale – Nepal version (DBIS-N)

PubMed Central

Burkey, Matthew D.; Ghimire, Lajina; Adhikari, Ramesh P.; Kohrt, Brandon A.; Jordans, Mark J. D.; Haroz, Emily; Wissow, Lawrence

2017-01-01

Systematic processes are needed to develop valid measurement instruments for disruptive behavior disorders (DBDs) in cross-cultural settings. We employed a four-step process in Nepal to identify and select items for a culturally valid assessment instrument: 1) We extracted items from validated scales and local free-list interviews. 2) Parents, teachers, and peers (n=30) rated the perceived relevance and importance of behavior problems. 3) Highly rated items were piloted with children (n=60) in Nepal. 4) We evaluated internal consistency of the final scale. We identified 49 symptoms from 11 scales, and 39 behavior problems from free-list interviews (n=72). After dropping items for low ratings of relevance and severity and for poor item-test correlation, low frequency, and/or poor acceptability in pilot testing, 16 items remained for the Disruptive Behavior International Scale—Nepali version (DBIS-N). The final scale had good internal consistency (α=0.86). A 4-step systematic approach to scale development including local participation yielded an internally consistent scale that included culturally relevant behavior problems. PMID:28093575
‘Forget me (not)?’ – Remembering Forget-Items Versus Un-Cued Items in Directed Forgetting

PubMed Central

Zwissler, Bastian; Schindler, Sebastian; Fischer, Helena; Plewnia, Christian; Kissler, Johanna M.

2015-01-01

Humans need to be able to selectively control their memories. This capability is often investigated in directed forgetting (DF) paradigms. In item-method DF, individual items are presented and each is followed by either a forget- or remember-instruction. On a surprise test of all items, memory is then worse for to-be-forgotten items (TBF) compared to to-be-remembered items (TBR). This is thought to result mainly from selective rehearsal of TBR, although inhibitory mechanisms also appear to be recruited by this paradigm. Here, we investigate whether the mnemonic consequences of a forget instruction differ from the ones of incidental encoding, where items are presented without a specific memory instruction. Four experiments were conducted where un-cued items (UI) were interspersed and recognition performance was compared between TBR, TBF, and UI stimuli. Accuracy was encouraged via a performance-dependent monetary bonus. Experiments varied the number of items and their presentation speed and used either letter-cues or symbolic cues. Across all experiments, including perceptually fully counterbalanced variants, memory accuracy for TBF was reduced compared to TBR, but better than for UI. Moreover, participants made consistently fewer false alarms and used a very conservative response criterion when responding to TBF stimuli. Thus, the F-cue results in active processing and reduces false alarm rate, but this does not impair recognition memory beyond an un-cued baseline condition, where only incidental encoding occurs. Theoretical implications of these findings are discussed. PMID:26635657
Online drug databases: a new method to assess and compare inclusion of clinically relevant information.

PubMed

Silva, Cristina; Fresco, Paula; Monteiro, Joaquim; Rama, Ana Cristina Ribeiro

2013-08-01

Evidence-Based Practice requires health care decisions to be based on the best available evidence. The model "Information Mastery" proposes that clinicians should use sources of information that have previously evaluated relevance and validity, provided at the point of care. Drug databases (DB) allow easy and fast access to information and have the benefit of more frequent content updates. Relevant information, in the context of drug therapy, is that which supports safe and effective use of medicines. Accordingly, the European Guideline on the Summary of Product Characteristics (EG-SmPC) was used as a standard to evaluate the inclusion of relevant information contents in DB. To develop and test a method to evaluate relevancy of DB contents, by assessing the inclusion of information items deemed relevant for effective and safe drug use. Hierarchical organisation and selection of the principles defined in the EGSmPC; definition of criteria to assess inclusion of selected information items; creation of a categorisation and quantification system that allows score calculation; calculation of relative differences (RD) of scores for comparison with an "ideal" database, defined as the one that achieves the best quantification possible for each of the information items; pilot test on a sample of 9 drug databases, using 10 drugs frequently associated in literature with morbidity-mortality and also being widely consumed in Portugal. Main outcome measure Calculate individual and global scores for clinically relevant information items of drug monographs in databases, using the categorisation and quantification system created. A--Method development: selection of sections, subsections, relevant information items and corresponding requisites; system to categorise and quantify their inclusion; score and RD calculation procedure. B--Pilot test: calculated scores for the 9 databases; globally, all databases evaluated significantly differed from the "ideal" database; some DB performed better but performance was inconsistent at subsections level, within the same DB. The method developed allows quantification of the inclusion of relevant information items in DB and comparison with an "ideal database". It is necessary to consult diverse DB in order to find all the relevant information needed to support clinical drug use.
Working memory capacity predicts the beneficial effect of selective memory retrieval.

PubMed

Schlichting, Andreas; Aslan, Alp; Holterman, Christoph; Bäuml, Karl-Heinz T

2015-01-01

Selective retrieval of some studied items can both impair and improve recall of the other items. This study examined the role of working memory capacity (WMC) for the two effects of memory retrieval. Participants studied an item list consisting of predefined target and nontarget items. After study of the list, half of the participants performed an imagination task supposed to induce a change in mental context, whereas the other half performed a counting task which does not induce such context change. Following presentation of a second list, memory for the original list's target items was tested, either with or without preceding retrieval of the list's nontarget items. Consistent with previous work, preceding nontarget retrieval impaired target recall in the absence of the context change, but improved target recall in its presence. In particular, there was a positive relationship between WMC and the beneficial, but not the detrimental effect of memory retrieval. On the basis of the view that the beneficial effect of memory retrieval reflects context-reactivation processes, the results indicate that individuals with higher WMC are better able to capitalise on retrieval-induced context reactivation than individuals with lower WMC.
Evaluating test-retest reliability in patient-reported outcome measures for older people: A systematic review.

PubMed

Park, Myung Sook; Kang, Kyung Ja; Jang, Sun Joo; Lee, Joo Yun; Chang, Sun Ju

2018-03-01

This study aimed to evaluate the components of test-retest reliability including time interval, sample size, and statistical methods used in patient-reported outcome measures in older people and to provide suggestions on the methodology for calculating test-retest reliability for patient-reported outcomes in older people. This was a systematic literature review. MEDLINE, Embase, CINAHL, and PsycINFO were searched from January 1, 2000 to August 10, 2017 by an information specialist. This systematic review was guided by both the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist and the guideline for systematic review published by the National Evidence-based Healthcare Collaborating Agency in Korea. The methodological quality was assessed by the Consensus-based Standards for the selection of health Measurement Instruments checklist box B. Ninety-five out of 12,641 studies were selected for the analysis. The median time interval for test-retest reliability was 14days, and the ratio of sample size for test-retest reliability to the number of items in each measure ranged from 1:1 to 1:4. The most frequently used statistical methods for continuous scores was intraclass correlation coefficients (ICCs). Among the 63 studies that used ICCs, 21 studies presented models for ICC calculations and 30 studies reported 95% confidence intervals of the ICCs. Additional analyses using 17 studies that reported a strong ICC (>0.09) showed that the mean time interval was 12.88days and the mean ratio of the number of items to sample size was 1:5.37. When researchers plan to assess the test-retest reliability of patient-reported outcome measures for older people, they need to consider an adequate time interval of approximately 13days and the sample size of about 5 times the number of items. Particularly, statistical methods should not only be selected based on the types of scores of the patient-reported outcome measures, but should also be described clearly in the studies that report the results of test-retest reliability. Copyright © 2017 Elsevier Ltd. All rights reserved.
Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

PubMed

Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

2018-02-01

The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Two systems drive attention to rewards.

PubMed

Kovach, Christopher K; Sutterer, Matthew J; Rushia, Sara N; Teriakidis, Adrianna; Jenison, Rick L

2014-01-01

How options are framed can dramatically influence choice preference. While salience of information plays a central role in this effect, precisely how it is mediated by attentional processes remains unknown. Current models assume a simple relationship between attention and choice, according to which preference should be uniformly biased towards the attended item over the whole time-course of a decision between similarly valued items. To test this prediction we considered how framing alters the orienting of gaze during a simple choice between two options, using eye movements as a sensitive online measure of attention. In one condition participants selected the less preferred item to discard and in the other, the more preferred item to keep. We found that gaze gravitates towards the item ultimately selected, but did not observe the effect to be uniform over time. Instead, we found evidence for distinct early and late processes that guide attention according to preference in the first case and task demands in the second. We conclude that multiple time-dependent processes govern attention during choice, and that these may contribute to framing effects in different ways.
Two systems drive attention to rewards

PubMed Central

Kovach, Christopher K.; Sutterer, Matthew J.; Rushia, Sara N.; Teriakidis, Adrianna; Jenison, Rick L.

2014-01-01

How options are framed can dramatically influence choice preference. While salience of information plays a central role in this effect, precisely how it is mediated by attentional processes remains unknown. Current models assume a simple relationship between attention and choice, according to which preference should be uniformly biased towards the attended item over the whole time-course of a decision between similarly valued items. To test this prediction we considered how framing alters the orienting of gaze during a simple choice between two options, using eye movements as a sensitive online measure of attention. In one condition participants selected the less preferred item to discard and in the other, the more preferred item to keep. We found that gaze gravitates towards the item ultimately selected, but did not observe the effect to be uniform over time. Instead, we found evidence for distinct early and late processes that guide attention according to preference in the first case and task demands in the second. We conclude that multiple time-dependent processes govern attention during choice, and that these may contribute to framing effects in different ways. PMID:24550868
The development and psychometric validation of the Ethical Awareness Scale.

PubMed

Milliken, Aimee; Ludlow, Larry; DeSanto-Madeya, Susan; Grace, Pamela

2018-04-19

To develop and psychometrically assess the Ethical Awareness Scale using Rasch measurement principles and a Rasch item response theory model. Critical care nurses must be equipped to provide good (ethical) patient care. This requires ethical awareness, which involves recognizing the ethical implications of all nursing actions. Ethical awareness is imperative in successfully addressing patient needs. Evidence suggests that the ethical import of everyday issues may often go unnoticed by nurses in practice. Assessing nurses' ethical awareness is a necessary first step in preparing nurses to identify and manage ethical issues in the highly dynamic critical care environment. A cross-sectional design was used in two phases of instrument development. Using Rasch principles, an item bank representing nursing actions was developed (33 items). Content validity testing was performed. Eighteen items were selected for face validity testing. Two rounds of operational testing were performed with critical care nurses in Boston between February-April 2017. A Rasch analysis suggests sufficient item invariance across samples and sufficient construct validity. The analysis further demonstrates a progression of items uniformly along a hierarchical continuum; items that match respondent ability levels; response categories that are sufficiently used; and adequate internal consistency. Mean ethical awareness scores were in the low/moderate range. The results suggest the Ethical Awareness Scale is a psychometrically sound, reliable and valid measure of ethical awareness in critical care nurses. © 2018 John Wiley & Sons Ltd.
Source recognition by stimulus content in the MTL.

PubMed

Park, Heekyeong; Abellanoza, Cheryl; Schaeffer, James; Gandy, Kellen

2014-03-17

Source memory is considered to be the cornerstone of episodic memory that enables us to discriminate similar but different events. In the present fMRI study, we investigated whether neural correlates of source retrieval differed by stimulus content in the medial temporal lobe (MTL) when the item and context had been integrated as a perceptually unitized entity. Participants were presented with a list of items either in verbal or pictorial form overlaid on a colored square and instructed to integrate both the item and context into a single image. At test, participants judged the study status of test items and the color in which studied items were presented. Source recognition invariant of stimulus content elicited retrieval activity in both the left anterior hippocampus extending to the perirhinal cortex and the right posterior hippocampus. Word-selective source recognition was related to activity in the left perirhinal cortex, whereas picture-selective source recognition was identified in the left posterior hippocampus. Neural activity sensitive to novelty detection common to both words and pictures was found in the left anterior and right posterior hippocampus. Novelty detection selective to words was associated with the left perirhinal cortex, while activity sensitive to new pictures was identified in the bilateral hippocampus and adjacent MTL cortices, including the parahippocampal, entorhinal, and perirhinal cortices. These findings provide further support for the integral role of the hippocampus both in source recognition and in detection of new stimuli across stimulus content. Additionally, novelty effects in the MTL reveal the integral role of the MTL cortex as the interface for processing new information. Collectively, the present findings demonstrate the importance of the MTL for both previously experienced and novel events. Copyright © 2014 Elsevier B.V. All rights reserved.
Predicting Differential Item Functioning in Cross-Lingual Testing: The Case of a High Stakes Test in the Kyrgyz Republic

ERIC Educational Resources Information Center

Drummond, Todd W.

2011-01-01

Cross-lingual tests are assessment instruments created in one language and adapted for use with another language group. Practitioners and researchers use cross-lingual tests for various descriptive, analytical and selection purposes both in comparative studies across nations and within countries marked by linguistic diversity (Hambleton, 2005).…
EXSPRT: An Expert Systems Approach to Computer-Based Adaptive Testing.

ERIC Educational Resources Information Center

Frick, Theodore W.; And Others

Expert systems can be used to aid decision making. A computerized adaptive test (CAT) is one kind of expert system, although it is not commonly recognized as such. A new approach, termed EXSPRT, was devised that combines expert systems reasoning and sequential probability ratio test stopping rules. EXSPRT-R uses random selection of test items,…
Using Optimal Test Assembly Methods for Shortening Patient-Reported Outcome Measures: Development and Validation of the Cochin Hand Function Scale-6: A Scleroderma Patient-Centered Intervention Network Cohort Study.

PubMed

Levis, Alexander W; Harel, Daphna; Kwakkenbos, Linda; Carrier, Marie-Eve; Mouthon, Luc; Poiraudeau, Serge; Bartlett, Susan J; Khanna, Dinesh; Malcarne, Vanessa L; Sauve, Maureen; van den Ende, Cornelia H M; Poole, Janet L; Schouffoer, Anne A; Welling, Joep; Thombs, Brett D

2016-11-01

To develop and validate a short form of the Cochin Hand Function Scale (CHFS), which measures hand disability, for use in systemic sclerosis, using objective criteria and reproducible techniques. Responses on the 18-item CHFS were obtained from English-speaking patients enrolled in the Scleroderma Patient-Centered Intervention Network Cohort. CHFS unidimensionality was verified using confirmatory factor analysis, and an item response theory model was fit to CHFS items. Optimal test assembly (OTA) methods identified a maximally precise short form for each possible form length between 1 and 17 items. The final short form selected was the form with the least number of items that maintained statistically equivalent convergent validity, compared to the full-length CHFS, with the Health Assessment Questionnaire (HAQ) disability index (DI) and the physical function domain of the 29-item Patient-Reported Outcomes Measurement Information System (PROMIS-29). There were 601 patients included. A 6-item short form of the CHFS (CHFS-6) was selected. The CHFS-6 had a Cronbach's alpha of 0.93. Correlations of the CHFS-6 summed score with HAQ DI (r = 0.79) and PROMIS-29 physical function (r = -0.54) were statistically equivalent to the CHFS (r = 0.81 and r = -0.56). The correlation with the full CHFS was high (r = 0.98). The OTA procedure generated a valid short form of the CHFS with minimal loss of information compared to the full-length form. The OTA method used was based on objective, prespecified criteria, but should be further studied for viability as a general procedure for shortening patient-reported outcome measures in health research. © 2016, American College of Rheumatology.

Do healthier foods cost more in Saudi Arabia than less healthier options?

PubMed Central

Gosadi, Ibrahim M.; Alshehri, Muner A.; Alawad, Saud H.

2016-01-01

Objectives: To investigate whether healthy foods in Saudi Arabia cost more compared with less healthy options. Method: This is a cross-sectional study conducted in Riyadh, Saudi Arabia during June and July 2015. The study targeted well-known market chains in the city of Riyadh. The selection of food items was purposive to include healthy and less healthy food items in each category. Price, caloric value, salt, fat, sugar, and fiber contents for each food item were collected. To test for the correlation between nutritional contents and average price, Spearman’s correlation coefficients were calculated. The Mann-Whitney U test was used to test for the presence of average price difference between healthy and less healthy food items. Results: A total of 162 food items were collected. Sixty-six food items were classified as healthy compared with 96 less healthier options. The calculated correlation coefficients indicate an association between increased cost of food with increased caloric values (0.649 p=0.0000001), increased fat content (0.610 p=0.0000003), and increased salt contents (0.273 p=0.001). Prices of food items with higher fiber contents showed a weaker association (0.191 p=0.015). The overall average cost of healthy food was approximately 10 Saudi riyals cheaper than less healthy food (p=0.000001). Conclusion: The findings of the study suggest that the cost of healthy food is lower than that of less healthy items in the Saudi market. PMID:27570859
On the Issue of Item Selection in Computerized Adaptive Testing with Response Times

ERIC Educational Resources Information Center

Veldkamp, Bernard P.

2016-01-01

Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second…
AAHPER Youth Fitness Test Manual. Revised Edition.

ERIC Educational Resources Information Center

American Alliance for Health, Physical Education, and Recreation, Washington, DC.

The Revised AAHPER Youth Fitness Test is a battery of six test items designed to give a measure of physical fitness for boys and girls in grades 5-12. The tests were selected to evaluate specific aspects of physical status which, taken together, give an overall picture of fitness. Tests can be given in the gymnasium or outdoors. They are as…
A Comparison of a Bayesian and a Maximum Likelihood Tailored Testing Procedure.

ERIC Educational Resources Information Center

McKinley, Robert L.; Reckase, Mark D.

A study was conducted to compare tailored testing procedures based on a Bayesian ability estimation technique and on a maximum likelihood ability estimation technique. The Bayesian tailored testing procedure selected items so as to minimize the posterior variance of the ability estimate distribution, while the maximum likelihood tailored testing…
Indicators of Family Care for Development for Use in Multicountry Surveys

PubMed Central

Kariger, Patricia; Engle, Patrice; Britto, Pia M. Rebello; Sywulka, Sara M.; Menon, Purnima

2012-01-01

Indicators of family care for development are essential for ascertaining whether families are providing their children with an environment that leads to positive developmental outcomes. This project aimed to develop indicators from a set of items, measuring family care practices and resources important for caregiving, for use in epidemiologic surveys in developing countries. A mixed method (quantitative and qualitative) design was used for item selection and evaluation. Qualitative and quantitative analyses were conducted to examine the validity of candidate items in several country samples. Qualitative methods included the use of global expert panels to identify and evaluate the performance of each candidate item as well as in-country focus groups to test the content validity of the items. The quantitative methods included analyses of item-response distributions, using bivariate techniques. The selected items measured two family care practices (support for learning/stimulating environment and limit-setting techniques) and caregiving resources (adequacy of the alternate caregiver when the mother worked). Six play-activity items, indicative of support for learning/stimulating environment, were included in the core module of UNICEF's Multiple Cluster Indictor Survey 3. The other items were included in optional modules. This project provided, for the first time, a globally-relevant set of items for assessing family care practices and resources in epidemiological surveys. These items have multiple uses, including national monitoring and cross-country comparisons of the status of family care for development used globally. The obtained information will reinforce attention to efforts to improve the support for development of children. PMID:23304914
Evaluating HIV Knowledge Questionnaires Among Men Who Have Sex with Men: A Multi-Study Item Response Theory Analysis.

PubMed

Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian

2018-01-01

Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.
Age-related differences in agenda-driven monitoring of format and task information

PubMed Central

Mitchell, Karen J.; Ankudowich, Elizabeth; Durbin, Kelly A.; Greene, Erich J.; Johnson, Marcia K.

2013-01-01

Age-related source memory deficits may arise, in part, from changes in the agenda-driven processes that control what features of events are relevant during remembering. Using fMRI, we compared young and older adults on tests assessing source memory for format (picture, word) or encoding task (self-, other-referential), as well as on old-new recognition. Behaviorally, relative to old-new recognition, older adults showed disproportionate and equivalent deficits on both source tests compared to young adults. At encoding, both age groups showed expected activation associated with format in posterior visual processing areas, and with task in medial prefrontal cortex. At test, the groups showed similar selective, agenda-related activity in these representational areas. There were, however, marked age differences in the activity of control regions in lateral and medial prefrontal cortex and lateral parietal cortex. Results of correlation analyses were consistent with the idea that young adults had greater trial-by-trial agenda-driven modulation of activity (i.e., greater selectivity) than did older adults in representational regions. Thus, under selective remembering conditions where older adults showed clear differential regional activity in representational areas depending on type of test, they also showed evidence of disrupted frontal and parietal function and reduced item-by-item modulation of test-appropriate features. This pattern of results is consistent with an age-related deficit in the engagement of selective reflective attention. PMID:23357375
The Focus of Attention in Visual Working Memory: Protection of Focused Representations and Its Individual Variation.

PubMed

Heuer, Anna; Schubö, Anna

2016-01-01

Visual working memory can be modulated according to changes in the cued task relevance of maintained items. Here, we investigated the mechanisms underlying this modulation. In particular, we studied the consequences of attentional selection for selected and unselected items, and the role of individual differences in the efficiency with which attention is deployed. To this end, performance in a visual working memory task as well as the CDA/SPCN and the N2pc, ERP components associated with visual working memory and attentional processes, were analysed. Selection during the maintenance stage was manipulated by means of two successively presented retrocues providing spatial information as to which items were most likely to be tested. Results show that attentional selection serves to robustly protect relevant representations in the focus of attention while unselected representations which may become relevant again still remain available. Individuals with larger retrocueing benefits showed higher efficiency of attentional selection, as indicated by the N2pc, and showed stronger maintenance-associated activity (CDA/SPCN). The findings add to converging evidence that focused representations are protected, and highlight the flexibility of visual working memory, in which information can be weighted according its relevance.
Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.

PubMed

Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li

2014-09-01

The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The short- and long-term fates of memory items retained outside the focus of attention

PubMed Central

Eichenbaum, Adam S.; Starrett, Michael J.; Rose, Nathan S.; Emrich, Stephen M.; Postle, Bradley R.

2015-01-01

When a test of working memory (WM) requires the retention of multiple items, a subset of them can be prioritized. Recent studies have shown that, although prioritized (i.e., attended) items are associated with active neural representations, unprioritized (i.e., unattended) memory items can be retained in WM despite the absence of such active representations, and with no decrement in their recognition if they are cued later in the trial. These findings raise two intriguing questions about the nature of the short-term retention of information outside the focus of attention. First, when the focus of attention shifts from items in WM, is there a loss of fidelity for those unattended memory items? Second, could the retention of unattended memory items be accomplished by long-term memory mechanisms? We addressed the first question by comparing the precision of recall of attended versus unattended memory items, and found a significant decrease in precision for unattended memory items, reflecting a degradation in the quality of those representations. We addressed the second question by asking subjects to perform a WM task, followed by a surprise memory test for the items that they had seen in the WM task. Long-term memory for unattended memory items from the WM task was not better than memory for items that had remained selected by the focus of attention in the WM task. These results show that unattended WM representations are degraded in quality and are not preferentially represented in long-term memory, as compared to attended memory items. PMID:25472902
The short- and long-term fates of memory items retained outside the focus of attention.

PubMed

LaRocque, Joshua J; Eichenbaum, Adam S; Starrett, Michael J; Rose, Nathan S; Emrich, Stephen M; Postle, Bradley R

2015-04-01

When a test of working memory (WM) requires the retention of multiple items, a subset of them can be prioritized. Recent studies have shown that, although prioritized (i.e., attended) items are associated with active neural representations, unprioritized (i.e., unattended) memory items can be retained in WM despite the absence of such active representations, and with no decrement in their recognition if they are cued later in the trial. These findings raise two intriguing questions about the nature of the short-term retention of information outside the focus of attention. First, when the focus of attention shifts from items in WM, is there a loss of fidelity for those unattended memory items? Second, could the retention of unattended memory items be accomplished by long-term memory mechanisms? We addressed the first question by comparing the precision of recall of attended versus unattended memory items, and found a significant decrease in precision for unattended memory items, reflecting a degradation in the quality of those representations. We addressed the second question by asking subjects to perform a WM task, followed by a surprise memory test for the items that they had seen in the WM task. Long-term memory for unattended memory items from the WM task was not better than memory for items that had remained selected by the focus of attention in the WM task. These results show that unattended WM representations are degraded in quality and are not preferentially represented in long-term memory, as compared to attended memory items.
Qualitative modeling of the decision-making process using electrooculography.

PubMed

Zargari Marandi, Ramtin; Sabzpoushan, S H

2015-12-01

A novel method based on electrooculography (EOG) has been introduced in this work to study the decision-making process. An experiment was designed and implemented wherein subjects were asked to choose between two items from the same category that were presented within a limited time. The EOG and voice signals of the subjects were recorded during the experiment. A calibration task was performed to map the EOG signals to their corresponding gaze positions on the screen by using an artificial neural network. To analyze the data, 16 parameters were extracted from the response time and EOG signals of the subjects. Evaluation and comparison of the parameters, together with subjects' choices, revealed functional information. On the basis of this information, subjects switched their eye gazes between items about three times on average. We also found, according to statistical hypothesis testing-that is, a t test, t(10) = 71.62, SE = 1.25, p < .0001-that the correspondence rate of a subjects' gaze at the moment of selection with the selected item was significant. Ultimately, on the basis of these results, we propose a qualitative choice model for the decision-making task.
The Detection and Influence of Problematic Item Content in Ability Tests: An Examination of Sensitivity Review Practices for Personnel Selection Test Development

ERIC Educational Resources Information Center

Grand, James A.; Golubovich, Juliya; Ryan, Ann Marie; Schmitt, Neal

2013-01-01

In organizational and educational practices, sensitivity reviews are commonly advocated techniques for reducing test bias and enhancing fairness. In the present paper, results from two studies are reported which investigate how effective individuals are at detecting problematic test content and the influence such content has on important testing…
Assessment of mastication in healthy children and children with cerebral palsy: a validity and consistency study.

PubMed

Remijn, L; Speyer, R; Groen, B E; Holtus, P C M; van Limbeek, J; Nijhuis-van der Sanden, M W G

2013-05-01

The aim of this study was to develop the Mastication Observation and Evaluation instrument for observing and assessing the chewing ability of children eating solid and lumpy foods. This study describes the process of item definition and item selection and reports the content validity, reproducibility and consistency of the instrument. In the developmental phase, 15 experienced speech therapists assessed item relevance and descriptions over three Delphi rounds. Potential items were selected based on the results from a literature review. At the initial Delphi round, 17 potential items were included. After three Delphi rounds, 14 items that regarded as providing distinctive value in assessment of mastication (consensus >75%) were included in the Mastication Observation and Evaluation instrument. To test item reproducibility and consistency, two experts and five students evaluated video recordings of 20 children (10 children with cerebral palsy aged 29-65 months and 10 healthy children aged 11-42 months) eating bread and a biscuit. Reproducibility was estimated by means of the intraclass correlation coefficient (ICC). With the exception of one item concerning chewing duration, all items showed good to excellent intra-observer agreement (ICC students: 0.73-1.0). With the exception of chewing duration and number of swallows, inter-observer agreement was fair to excellent for all items (ICC experts: 0.68-1.0 and ICC students: 0.42-1.0). Results indicate that this tool is a feasible instrument and could be used in clinical practice after further research is completed on the reliability of the tool. © 2013 Blackwell Publishing Ltd.
A disease-specific quality of life instrument for non-alcoholic fatty liver disease and non-alcoholic steatohepatitis: CLDQ-NAFLD.

PubMed

Younossi, Zobair M; Stepanova, Maria; Henry, Linda; Racila, Andrei; Lam, Brian; Pham, Huong T; Hunt, Sharon

2017-08-01

Non-alcoholic fatty liver disease and non-alcoholic steatohepatitis are the most common causes of chronic liver disease with known negative impact on patients' health-related quality of life. Our aim was to validate a disease-specific health-related quality of life instrument useful for efficacy trials involving patients with non-alcoholic fatty liver disease and non-alcoholic steatohepatitis. From a long item selection questionnaire, we selected relevant items which, by factor analysis, were grouped into domains constituting Chronic Liver Disease Questionnaire-Non-Alcoholic Fatty Liver Disease version. The developed instrument was subjected to internal validity, test-retest reliability and construct validity assessment using standard methods. For development of the Chronic Liver Disease Questionnaire-Non-Alcoholic Fatty Liver Disease version instrument, a 75-item-long item selection questionnaire was administered to 25 patients with non-alcoholic fatty liver disease. After item reduction, factor analysis found that 98.7% of variance in the remaining items would be explained by six factors. Thus, the resulting Chronic Liver Disease Questionnaire-Non-Alcoholic Fatty Liver Disease version instrument had 36 items grouped into six domains: Abdominal Symptoms, Activity, Emotional, Fatigue, Systemic Symptoms, and Worry. The independent validation group included another 104 patients with non-alcoholic fatty liver disease. The Cronbach's alphas of 0.74-0.90 suggested good to excellent internal consistency of the domains. Furthermore, the presence of obesity and history of depression were discriminated best by Chronic Liver Disease Questionnaire-Non-Alcoholic Fatty Liver Disease version scores (P<.05). The domains' correlations with the most relevant domains of Short Form-36 exceeded 0.70. Test-retest reliability in a subgroup of patients (N=27) demonstrated no significant within-patient variability with multiple administrations (all median differences were zero, all P>.15, intraclass correlations .76-.88). The Chronic Liver Disease Questionnaire-Non-Alcoholic Fatty Liver Disease version is a disease-specific health-related quality of life instrument developed and validated using an established methodology and useful for clinical trials of non-alcoholic fatty liver disease. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
14 CFR Appendix B to Part 147 - General Curriculum Subjects

Code of Federal Regulations, 2012 CFR

2012-01-01

... subject heading indicates the level of proficiency at which that item must be taught. Teaching level a... fittings. e. materials and processes (1) 14. Identify and select appropriate nondestructive testing methods...
14 CFR Appendix B to Part 147 - General Curriculum Subjects

Code of Federal Regulations, 2014 CFR

2014-01-01

... subject heading indicates the level of proficiency at which that item must be taught. Teaching level a... fittings. e. materials and processes (1) 14. Identify and select appropriate nondestructive testing methods...
14 CFR Appendix B to Part 147 - General Curriculum Subjects

Code of Federal Regulations, 2011 CFR

2011-01-01

... subject heading indicates the level of proficiency at which that item must be taught. Teaching level a... fittings. e. materials and processes (1) 14. Identify and select appropriate nondestructive testing methods...
14 CFR Appendix B to Part 147 - General Curriculum Subjects

Code of Federal Regulations, 2010 CFR

2010-01-01

... subject heading indicates the level of proficiency at which that item must be taught. Teaching level a... fittings. e. materials and processes (1) 14. Identify and select appropriate nondestructive testing methods...
14 CFR Appendix B to Part 147 - General Curriculum Subjects

Code of Federal Regulations, 2013 CFR

2013-01-01

... subject heading indicates the level of proficiency at which that item must be taught. Teaching level a... fittings. e. materials and processes (1) 14. Identify and select appropriate nondestructive testing methods...

Development and preliminary evaluation of a music-based attention assessment for patients with traumatic brain injury.

PubMed

Jeong, Eunju; Lesiuk, Teresa L

2011-01-01

Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.
A practical guide to assessing clinical decision-making skills using the key features approach.

PubMed

Farmer, Elizabeth A; Page, Gordon

2005-12-01

This paper in the series on professional assessment provides a practical guide to writing key features problems (KFPs). Key features problems test clinical decision-making skills in written or computer-based formats. They are based on the concept of critical steps or 'key features' in decision making and represent an advance on the older, less reliable patient management problem (PMP) formats. The practical steps in writing these problems are discussed and illustrated by examples. Steps include assembling problem-writing groups, selecting a suitable clinical scenario or problem and defining its key features, writing the questions, selecting question response formats, preparing scoring keys, reviewing item quality and item banking. The KFP format provides educators with a flexible approach to testing clinical decision-making skills with demonstrated validity and reliability when constructed according to the guidelines provided.
Development and psychometric testing of the Canine Owner-Reported Quality of Life questionnaire, an instrument designed to measure quality of life in dogs with cancer.

PubMed

Giuffrida, Michelle A; Brown, Dorothy Cimino; Ellenberg, Susan S; Farrar, John T

2018-05-01

OBJECTIVE To describe development and initial psychometric testing of an owner-reported questionnaire designed to standardize measurement of general quality of life (QOL) in dogs with cancer. DESIGN Key-informant interviews, questionnaire development, and field trial. SAMPLE Owners of 25 dogs with cancer for item development and pretesting and owners of 90 dogs with cancer for reliability and validity testing. PROCEDURES Standard methods for development and testing of questionnaire instruments intended to measure subjective states were used. Items were generated, selected, scaled, and pretested for content, meaning, and readability. Response items were evaluated with exploratory factor analysis and by assessing internal consistency (Cronbach α) and convergence with global QOL as determined with a visual analog scale. Preliminary tests of stability and responsiveness were performed. RESULTS The final questionnaire-which was named the Canine Owner-Reported Quality of Life (CORQ) questionnaire-contained 17 items related to observable behaviors commonly used by owners to evaluate QOL in their dogs. Several items pertaining to physical symptoms performed poorly and were omitted. The 17 items were assigned to 4 factors-vitality, companionship, pain, and mobility-on the basis of the items they contained. The CORQ questionnaire and its factors had high internal consistency (Cronbach α = 0.68 to 0.90) and moderate to strong correlations (r = 0.49 to 0.71) with global QOL as measured on a visual analog scale. Preliminary testing indicated good test-retest reliability and responsiveness to improvements in overall QOL. CONCLUSIONS AND CLINICAL RELEVANCE The CORQ questionnaire was a valid, reliable owner-reported questionnaire that measured general QOL in dogs with cancer and showed promise as a clinical trial outcome measure for quantifying changes in individual dog QOL occurring in response to cancer treatment and progression.
Testing for lead in toys at day care centers.

PubMed

Sanders, Martha; Stolz, Julie; Chacon-Baker, Ashley

2013-01-01

Exposure to lead-based paint or material has been found to impact children's cognitive and behavioral development at blood lead levels far below current standards. The purpose of the project was to screen for lead in toy items in daycare centers in order to raise awareness of inside environmental lead exposures and minimize lead-based exposures for children. Occupational therapy students in a service learning class tested for lead in ten daycare or public centers using the XRF Thermo Scientific Niton XL3t, a method accepted by the Consumer Product Safety Commission (CPSC). A total of 460 items were tested over a two-month period for an average of 66 toys per setting. Fifty six (56) items tested > 100 ppm, which represented 12% of the entire sample. Items with high lead levels included selected toys constructed with lead-based paint, lead metals, plastics using lead as a color enhancer, and decorative objects. While the actual number of lead-based products is small, the cumulative exposure or habitual use may pose an unnecessary risk to children. Indoor exposures occurred for all day care centers regardless of socio-economic levels. Recommendations to minimize exposures are provided.
Better assessment of physical function: item improvement is neglected but essential

PubMed Central

2009-01-01

Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.

PubMed

Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

2009-01-01

Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
Encoding Processes and Sex-Role Preferences

ERIC Educational Resources Information Center

Kail, Robert V., Jr.; Levine, Laura E.

1976-01-01

Seven and 10-year-olds were tested on memory and sex-role preference tasks. The memory task was the Wickens release from proactive inhibition paradigm in which short-term recall of words is tested on successive trials. Children selected favorite pictures from an array including masculine and feminine items. (JH)
State Minimum Competency Testing Programs. Resource Catalog. Final Report.

ERIC Educational Resources Information Center

Mills, Gladys H.

Focusing on state-mandated minimum competency testing programs, this annotated bibliography cites 200 items selected from more than 700. The Resource Catalog is intended for state education policy makers and therefore includes resource and study guides; legislative and board action; conference speeches, reports and proceedings; curriculum guides,…
Value-based modulation of memory encoding involves strategic engagement of fronto-temporal semantic processing regions

PubMed Central

Cohen, Michael S.; Rissman, Jesse; Suthana, Nanthia A.; Castel, Alan D.; Knowlton, Barbara J.

2014-01-01

A number of prior fMRI studies have focused on the ways in which the midbrain dopaminergic reward system co-activates with hippocampus to potentiate memory for valuable items. However, another means by which people could selectively remember more valuable to-be-remembered items is to be selective in their use of effective but effortful encoding strategies. To broadly examine the neural mechanisms of value on subsequent memory, we used fMRI to examine how differences in brain activity at encoding as a function of value relate to subsequent free recall for words. Each word was preceded by an arbitrarily assigned point value, and participants went through multiple study-test cycles with feedback on their point total at the end of each list, allowing for sculpting of cognitive strategies. We examined the correlation between value-related modulation of brain activity and participants’ selectivity index, a measure of how close participants were to their optimal point total given the number of items recalled. Greater selectivity scores were associated with greater differences in activation of semantic processing regions, including left inferior frontal gyrus and left posterior lateral temporal cortex, during encoding of high-value words relative to low-value words. Although we also observed value-related modulation within midbrain and ventral striatal reward regions, our fronto-temporal findings suggest that strategic engagement of deep semantic processing may be an important mechanism for selectively encoding valuable items. PMID:24683066
Value-based modulation of memory encoding involves strategic engagement of fronto-temporal semantic processing regions.

PubMed

Cohen, Michael S; Rissman, Jesse; Suthana, Nanthia A; Castel, Alan D; Knowlton, Barbara J

2014-06-01

A number of prior fMRI studies have focused on the ways in which the midbrain dopaminergic reward system coactivates with hippocampus to potentiate memory for valuable items. However, another means by which people could selectively remember more valuable to-be-remembered items is to be selective in their use of effective but effortful encoding strategies. To broadly examine the neural mechanisms of value on subsequent memory, we used fMRI to assess how differences in brain activity at encoding as a function of value relate to subsequent free recall for words. Each word was preceded by an arbitrarily assigned point value, and participants went through multiple study-test cycles with feedback on their point total at the end of each list, allowing for sculpting of cognitive strategies. We examined the correlation between value-related modulation of brain activity and participants' selectivity index, which measures how close participants were to their optimal point total, given the number of items recalled. Greater selectivity scores were associated with greater differences in the activation of semantic processing regions, including left inferior frontal gyrus and left posterior lateral temporal cortex, during the encoding of high-value words relative to low-value words. Although we also observed value-related modulation within midbrain and ventral striatal reward regions, our fronto-temporal findings suggest that strategic engagement of deep semantic processing may be an important mechanism for selectively encoding valuable items.
A new item response theory model to adjust data allowing examinee choice

PubMed Central

Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo

2018-01-01

In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996
Subjective health literacy: Development of a brief instrument for school-aged children.

PubMed

Paakkari, Olli; Torppa, Minna; Kannas, Lasse; Paakkari, Leena

2016-12-01

The present paper focuses on the measurement of health literacy (HL), which is an important determinant of health and health behaviours. HL starts to develop in childhood and adolescence; hence, there is a need for instruments to monitor HL among younger age groups. These instruments are still rare. The aim of the project reported here was, therefore, to develop a brief, multidimensional, theory-based instrument to measure subjective HL among school-aged children. The development of the instrument covered four phases: item generation based on a conceptual framework; a pilot study ( n = 405); test-retest ( n = 117); and construction of the instrument ( n = 3853). All the samples were taken from Finnish 7th and 9th graders. Initially, 65 items were generated, of which 32 items were selected for the pilot study. After item reduction, the instrument contained 16 items. The test-retest phase produced estimates of stability. In the final phase a 10-item instrument was constructed, referred to as Health Literacy for School-Aged Children (HLSAC). The instrument exhibited a high Cronbach alpha (0.93), and included two items from each of the five predetermined theoretical components (theoretical knowledge, practical knowledge, critical thinking, self-awareness, citizenship). The iterative and validity-driven development process made it possible to construct a brief multidimensional HLSAC instrument. Such instruments are suitable for large-scale studies, and for use with children and adolescents. Validation will require further testing for use in other countries.
Assessing psychological well-being: self-report instruments for the NIH Toolbox.

PubMed

Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David

2014-02-01

Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.
[Development and validation of the Korean patient safety culture scale for nursing homes].

PubMed

Yoon, Sook Hee; Kim, Byungsoo; Kim, Se Young

2013-06-01

The purpose of this study was to develop a tool to evaluate patient safety culture in nursing homes and to test its validity and reliability. A preliminary tool was developed through interviews with focus group, content validity tests, and a pilot study. A nationwide survey was conducted from February to April, 2011, using self-report questionnaires. Participants were 982 employees in nursing homes. Data were analyzed using Cronbach's alpha, item analysis, factor analysis, and multitrait/multi-Item analysis. From the results of the analysis, 27 final items were selected from 49 items on the preliminary tool. Items with low correlation with total scale were excluded. The 4 factors sorted by factor analysis contributed 63.4% of the variance in the total scale. The factors were labeled as leadership, organizational system, working attitude, management practice. Cronbach's alpha for internal consistency was .95 and the range for the 4 factors was from .86 to .93. The results of this study indicate that the Korean Patient Safety Culture Scale has reliability and validity and is suitable for evaluation of patient safety culture in Korean nursing homes.
Assessment of Genetics Understanding. Under What Conditions Do Situational Features Have an Impact on Measures?

NASA Astrophysics Data System (ADS)

Schmiemann, Philipp; Nehm, Ross H.; Tornabene, Robyn E.

2017-12-01

Understanding how situational features of assessment tasks impact reasoning is important for many educational pursuits, notably the selection of curricular examples to illustrate phenomena, the design of formative and summative assessment items, and determination of whether instruction has fostered the development of abstract schemas divorced from particular instances. The goal of our study was to employ an experimental research design to quantify the degree to which situational features impact inferences about participants' understanding of Mendelian genetics. Two participant samples from different educational levels and cultural backgrounds (high school, n = 480; university, n = 444; Germany and USA) were used to test for context effects. A multi-matrix test design was employed, and item packets differing in situational features (e.g., plant, animal, human, fictitious) were randomly distributed to participants in the two samples. Rasch analyses of participant scores from both samples produced good item fit, person reliability, and item reliability and indicated that the university sample displayed stronger performance on the items compared to the high school sample. We found, surprisingly, that in both samples, no significant differences in performance occurred among the animal, plant, and human item contexts, or between the fictitious and "real" item contexts. In the university sample, we were also able to test for differences in performance between genders, among ethnic groups, and by prior biology coursework. None of these factors had a meaningful impact upon performance or context effects. Thus some, but not all, types of genetics problem solving or item formats are impacted by situational features.
The Dominance Concept Inventory: A Tool for Assessing Undergraduate Student Alternative Conceptions about Dominance in Mendelian and Population Genetics

PubMed Central

Perez, Kathryn E.; Price, Rebecca M.

2014-01-01

Despite the impact of genetics on daily life, biology undergraduates understand some key genetics concepts poorly. One concept requiring attention is dominance, which many students understand as a fixed property of an allele or trait and regularly conflate with frequency in a population or selective advantage. We present the Dominance Concept Inventory (DCI), an instrument to gather data on selected alternative conceptions about dominance. During development of the 16-item test, we used expert surveys (n = 12), student interviews (n = 42), and field tests (n = 1763) from introductory and advanced biology undergraduates at public and private, majority- and minority-serving, 2- and 4-yr institutions in the United States. In the final field test across all subject populations (n = 709), item difficulty ranged from 0.08 to 0.84 (0.51 ± 0.049 SEM), while item discrimination ranged from 0.11 to 0.82 (0.50 ± 0.048 SEM). Internal reliability (Cronbach's alpha) was 0.77, while test–retest reliability values were 0.74 (product moment correlation) and 0.77 (intraclass correlation). The prevalence of alternative conceptions in the field tests shows that introductory and advanced students retain confusion about dominance after instruction. All measures support the DCI as a useful instrument for measuring undergraduate biology student understanding and alternative conceptions about dominance. PMID:26086665
Development of a questionnaire to measure consumers' perceptions of service quality in Australian community pharmacies.

PubMed

Mirzaei, Ardalan; Carter, Stephen R; Chen, Jenny Yimin; Rittsteuer, Claudia; Schneider, Carl R

2018-06-11

Recent changes within community pharmacy have seen a shift towards some pharmacies providing "value-added" services. However, providing high levels of service is resource intensive yet revenues from dispensing are declining. Of significance therefore, is how consumers perceive service quality (SQ). However, at present there are no validated and reliable instruments to measure consumers' perceptions of SQ in Australian community pharmacies. The aim of this study was to build a theory-grounded model of service quality (SQ) in community pharmacies and to create a valid survey instrument to measure consumers' perceptions of service quality. Stage 1 dealt with item generation using theory, prior research and qualitative interviews with pharmacy consumers. Selected items were then subjected to content validity and face validity. Stages 2 and 3 included psychometric testing among English-speaking adult consumers of Australian pharmacies. Exploratory factor analysis was used for item reduction and to explain the domains of SQ. In stage 1, item generation for SQ initially generated 113 items which were then refined, through content and face validity, down to 61 items. In stage 2, after subjecting the questionnaire to psychometric testing on the data from the first pharmacy (n = 374), the use of the primary dimensions of SQ was abandoned leaving 32 items representing 5 domains of SQ. In stage 3, the questionnaire was subject to further testing and item reduction in 3 other pharmacies (n = 320). SQ was best described using 23 items representing 6 domains: 'health and medicines advice', 'relationship quality', 'technical quality', 'environmental quality', 'non-prescription service', and 'health outcomes'. This research presents a theoretically-grounded and robust measurement scale developed for consumer perceptions of SQ in a community pharmacy. Copyright © 2018. Published by Elsevier Inc.
Attention modulates maintenance of representations in visual short-term memory.

PubMed

Kuo, Bo-Cheng; Stokes, Mark G; Nobre, Anna Christina

2012-01-01

Recent studies have shown that selective attention is of considerable importance for encoding task-relevant items into visual short-term memory (VSTM) according to our behavioral goals. However, it is not known whether top-down attentional biases can continue to operate during the maintenance period of VSTM. We used ERPs to investigate this question across two experiments. Specifically, we tested whether orienting attention to a given spatial location within a VSTM representation resulted in modulation of the contralateral delay activity (CDA), a lateralized ERP marker of VSTM maintenance generated when participants selectively encode memory items from one hemifield. In both experiments, retrospective cues during the maintenance period could predict a specific item (spatial retrocue) or multiple items (neutral retrocue) that would be probed at the end of the memory delay. Our results revealed that VSTM performance is significantly improved by orienting attention to the location of a task-relevant item. The behavioral benefit was accompanied by modulation of neural activity involved in VSTM maintenance. Spatial retrocues reduced the magnitude of the CDA, consistent with a reduction in memory load. Our results provide direct evidence that top-down control modulates neural activity associated with maintenance in VSTM, biasing competition in favor of the task-relevant information.
Sociolinguistic and Measurement Considerations for Construction of Armed Services Selection Batteries. Final Report for Period October 1975-June 1977.

ERIC Educational Resources Information Center

Boldt, R. F.; And Others

Test fairness or bias may be defined in many different ways, and the existence of possible bias is difficult to demonstrate. Sociolinguistic analysis may be used to check for fairness or bias in test directions, test content specifications, or test items. Four sociolinguistic principles are held to be relevant for this task: (1) pragmatics--that…
Item-cued directed forgetting of related words and pictures in children and adults: selective rehearsal versus cognitive inhibition.

PubMed

Lehman, E B; McKinley-Pace, M; Leonard, A M; Thompson, D; Johns, K

2001-01-01

The main purpose of this study was to compare the relative importance of selective rehearsal and cognitive inhibition in accounting for developmental changes in the directed-forgetting paradigm developed by R. A. Bjork (1972). In two experiments, children in Grades 2 and 5 and college students were asked to remember some words or pictures and to forget others when items were categorically related. Their memory for both items and the associated remember or forget cues was then tested with recall and recognition. Fifth graders recognized more of the forget-cued words than college students did. The pattern of results suggested that age differences in rehearsal and source monitoring (i.e., remembering whether a word had been cued remember or forget) were better explanatory mechanisms for children's forgetting inefficiencies than retrieval inhibition was. The results are discussed in terms of a multiple process view of inhibition.

Comparing preference assessments: selection- versus duration-based preference assessment procedures.

PubMed

Kodak, Tiffany; Fisher, Wayne W; Kelley, Michael E; Kisamore, April

2009-01-01

In the current investigation, the results of a selection- and a duration-based preference assessment procedure were compared. A Multiple Stimulus With Replacement (MSW) preference assessment [Windsor, J., Piché, L. M., & Locke, P. A. (1994). Preference testing: A comparison of two presentation methods. Research in Developmental Disabilities, 15, 439-455] and a variation of a Free-Operant (FO) preference assessment procedure [Roane, H. S., Vollmer, T. R., Ringdahl, J. E., & Marcus, B. A. (1998). Evaluation of a brief stimulus preference assessment. Journal of Applied Behavior Analysis, 31, 605-620] were conducted with four participants. A reinforcer assessment was conducted to determine which preference assessment procedure identified the item that produced the highest rates of responding. The items identified as most highly preferred were different across preference assessment procedures for all participants. Results of the reinforcer assessment showed that the MSW identified the item that functioned as the most effective reinforcer for two participants.
Development of Officer Selection Battery Forms 3 and 4. Technical Report 603.

ERIC Educational Resources Information Center

Fischl, M. A.; And Others

This report describes the development, standardization, and validation of two parallel forms of the Officer Selection Battery, a 2-hour, group administrable, paper and pencil test for assessing men and women applying for the Reserve Officers Training Corps (ROTC). Based on an extensive job analysis, 1,400 experimental items in 12 job areas were…
Flexible Execution of Cognitive Procedures.

DTIC Science & Technology

1987-06-30

were drawn from three third-grade classrooms The classrooms were pre-tested twice using a paper-and-pencil diagnostic test. We selected 33 students...tested individually in a small room adjacent to their classroom . Each student solved an individualized paper-and-pencil test whose items were designed... tablet , and students filled out the test with a special pen. Equipment malfunctions caused the data from 7 students to be lost. Tablet data from each of
Teacher understanding of the nature of science and its impact on student learning about the nature of science in STS/Constructivist classrooms

NASA Astrophysics Data System (ADS)

Lieu, Sang-Chong

In the National Science Education Standards both STS/Constructivist teaching strategies and student understanding of the nature of science are stressed. If certain teaching practices can achieve both goals at one time, many problems will be solved. Such relationships were investigated in this study. Teacher subjects were selected based on two extremes of scores on the Testing on Understanding Science. The Secondary Teacher Analysis Matrix - Science Version was used to categorize teachers into their use of STS/Constructivist or more traditional strategies based on their teaching behaviors observed from video tapes. After the teacher subjects were selected, a non-equivalent control group design was adapted for the administration of items from the Views on Science-Technology-Society (VOSTS) to the students of these teachers. Pre- and post-test data were collected using 20 VOSTS items. VOSTS options were categorized into a Congruent/Partially Congruent/Naive format by a panel of six science educators. A special scoring procedure was devised for the VOSTS items to allow the use of inferential statistics. When performance on 17 VOSTS items were studied, more understanding of the nature of science by teachers, the presence of an STS/Constructivist learning environment in the classroom, or a combination of both factors was not found to help students learn more about the nature of science. Explanations for such results are offered. A McNemar test was performed to take a closer look at the 17 VOSTS items individually. The results indicated that students who were taught by STS/Constructivist teachers with high TOUS scores moved toward "congruent" views concerning the nature of science on a number of VOSTS items. Also, students who were taught by more traditional teachers with low TOUS scores moved toward "naive" views on other VOSTS items. The findings support the fact that teachers who know more about the nature of science and who practice many of the STS/Constructivist teaching strategies assist students in learning more about the nature of science.
Development and validation of a patient-reported outcome measure for stroke patients.

PubMed

Luo, Yanhong; Yang, Jie; Zhang, Yanbo

2015-05-08

Family support and patient satisfaction with treatment are crucial for aiding in the recovery from stroke. However, current validated stroke-specific questionnaires may not adequately capture the impact of these two variables on patients undergoing clinical trials of new drugs. Therefore, the aim of this study was to develop and evaluate a new stroke patient-reported outcome measure (Stroke-PROM) instrument for capturing more comprehensive effects of stroke on patients participating in clinical trials of new drugs. A conceptual framework and a pool of items for the preliminary Stroke-PROM were generated by consulting the relevant literature and other questionnaires created in China and other countries, and interviewing 20 patients and 4 experts to ensure that all germane parameters were included. During the first item-selection phase, classical test theory and item response theory were applied to an initial scale completed by 133 patients with stroke. During the item-revaluation phase, classical test theory and item response theory were used again, this time with 475 patients with stroke and 104 healthy participants. During the scale assessment phase, confirmatory factor analysis was applied to the final scale of the Stroke-PROM using the same study population as in the second item-selection phase. Reliability, validity, responsiveness and feasibility of the final scale were tested. The final scale of Stroke-PROM contained 46 items describing four domains (physiology, psychology, society and treatment). These four domains were subdivided into 10 subdomains. Cronbach's α coefficients for the four domains ranged from 0.861 to 0.908. Confirmatory factor analysis supported the validity of the final scale, and the model fit index satisfied the criterion. Differences in the Stroke-PROM mean scores were significant between patients with stroke and healthy participants in nine subdomains (P < 0.001), indicating that the scale showed good responsiveness. The Stroke-PROM is a patient-reported outcome multidimensional questionnaire developed especially for clinical trials of new drugs and is focused on issues of family support and patient satisfaction with treatment. Extensive data analyses supported the validity, reliability and responsiveness of the Stroke-PROM.
Selection of 3013 Containers for Field Surveillance. Fiscal Year 2016 Update

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kelly, Elizabeth J.; Berg, John M.; Cheadle, Jesse

2016-04-19

This update is the eighth in a series of reports that document the binning and sample selection of 3013 containers for the Field Surveillance program as part of the Integrated Surveillance Program. This report documents changes made to both the container binning assignments and the sample selection approach. Binning changes documented in this update are a result of changes to the prompt gamma calibration curves and the reassignment of a small number of Hanford items from the Pressure bin to the Pressure and Corrosion (P&C) bin. Field Surveillance sample selection changes are primarily a result of focusing future destructive examinationsmore » (DEs) on the potential for stress corrosion cracking in higher moisture containers in the P&C bin. The decision to focus the Field Surveillance program on higher moisture items is based on findings from both the Shelf-life testing program and DEs.« less
Multiple Mini-Interviews in the Age of the Internet: Does Preparation Help Applicants to Medical School?

ERIC Educational Resources Information Center

Moshinsky, Avital; Ziegler, David; Gafni, Naomi

2017-01-01

Many medical schools have adopted multiple mini-interviews (MMI) as an advanced selection tool. MMIs are expensive and used to test only a few dozen candidates per day, making it infeasible to develop a different test version for each test administration. Therefore, some items are reused both within and across years. This study investigated the…
Applied Reading Test--Forms A and B, Interim Manual, and Answer Sheets.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

Designed for use in the selection of apprentices, trainees, technical and trade personnel, and any other persons who need to read and understand text of a technical nature, this Applied Reading Test specimen set contains six passages and 32 items, has a 30-minute time limit, and is presented in a reusable multiple choice test booklet. The specimen…
The Utility of the GRE Analytical Score for Selection into a Graduate Program in Educational Psychology.

ERIC Educational Resources Information Center

Mowsesian, Richard; Hays, William L.

The Graduate Record Examination (GRE) Aptitude Test has been in use since 1938. In 1975 the GRE Aptitude Test was broadened to include an experimental set of items designed to tap a respondent's recognition of logical relationships and consistency of interrelated statements, and to make inferences from abstract relationships. To test the…
Lower-fat menu items in restaurants satisfy customers.

PubMed

Fitzpatrick, M P; Chapman, G E; Barr, S I

1997-05-01

To evaluate a restaurant-based nutrition program by measuring customer satisfaction with lower-fat menu items and assessing patrons' reactions to the program. Questionnaires to assess satisfaction with menu items were administered to patrons in eight of the nine restaurants that volunteered to participate in the nutrition program. One patron from each participating restaurant was randomly selected for a semistructured interview about nutrition programming in restaurants. Persons dining in eight participating restaurants over a 1-week period (n = 686). Independent samples t tests were used to compare respondents' satisfaction with lower-fat and regular menu items. Two-way analysis of variance tests were completed using overall satisfaction as the dependent variable and menu-item classification (ie, lower fat or regular) and one of eight other menu item and respondent characteristics as independent variables. Qualitative methods were used to analyze interview transcripts. Of 1,127 menu items rated for satisfaction, 205 were lower fat, 878 were regular, and 44 were of unknown classification. Customers were significantly more satisfied with lower-fat than with regular menu items (P < .001). Overall satisfaction did not vary by any of the other independent variables. Interview results indicate the importance of restaurant during as an indulgent experience. High satisfaction with lower-fat menu items suggests that customers will support restaurant providing such choices. Dietitians can use these findings to encourage restaurateurs to include lower-fat choices on their menus, and to assure clients that their expectations of being indulged are not incompatible with these choices.
Emergency department documentation templates: variability in template selection and association with physical examination and test ordering in dizziness presentations.

PubMed

Kerber, Kevin A; Hofer, Timothy P; Meurer, William J; Fendrick, A Mark; Morgenstern, Lewis B

2011-03-24

Clinical documentation systems, such as templates, have been associated with process utilization. The T-System emergency department (ED) templates are widely used but lacking are analyses of the templates association with processes. This system is also unique because of the many different template options available, and thus the selection of the template may also be important. We aimed to describe the selection of templates in ED dizziness presentations and to investigate the association between items on templates and process utilization. Dizziness visits were captured from a population-based study of EDs that use documentation templates. Two relevant process outcomes were assessed: head computerized tomography (CT) scan and nystagmus examination. Multivariable logistic regression was used to estimate the probability of each outcome for patients who did or did not receive a relevant-item template. Propensity scores were also used to adjust for selection effects. The final cohort was 1,485 visits. Thirty-one different templates were used. Use of a template with a head CT item was associated with an increase in the adjusted probability of head CT utilization from 12.2% (95% CI, 8.9%-16.6%) to 29.3% (95% CI, 26.0%-32.9%). The adjusted probability of documentation of a nystagmus assessment increased from 12.0% (95%CI, 8.8%-16.2%) when a nystagmus-item template was not used to 95.0% (95% CI, 92.8%-96.6%) when a nystagmus-item template was used. The associations remained significant after propensity score adjustments. Providers use many different templates in dizziness presentations. Important differences exist in the various templates and the template that is used likely impacts process utilization, even though selection may be arbitrary. The optimal design and selection of templates may offer a feasible and effective opportunity to improve care delivery.
Impaired working memory capacity is not caused by failures of selective attention in schizophrenia.

PubMed

Erickson, Molly A; Hahn, Britta; Leonard, Carly J; Robinson, Benjamin; Gray, Brad; Luck, Steven J; Gold, James

2015-03-01

The cognitive impairments associated with schizophrenia have long been known to involve deficits in working memory (WM) capacity. To date, however, the causes of WM capacity deficits remain unknown. The present study examined selective attention impairments as a putative contributor to observed capacity deficits in this population. To test this hypothesis, we used an experimental paradigm that assesses the role of selective attention in WM encoding and has been shown to involve the prefrontal cortex and the basal ganglia. In experiment 1, participants were required to remember the locations of 3 or 5 target items (red circles). In another condition, 3-target items were accompanied by 2 distractor items (yellow circles), which participants were instructed to ignore. People with schizophrenia (PSZ) exhibited significant impairment in memory for the locations of target items, consistent with reduced WM capacity, but PSZ and healthy control subjects did not differ in their ability to filter the distractors. This pattern was replicated in experiment 2 for distractors that were more salient. Taken together, these results demonstrate that reduced WM capacity in PSZ is not attributable to a failure of filtering irrelevant distractors. © The Author 2014. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Georgia Vocational Student Assessment Project. Final Report.

ERIC Educational Resources Information Center

Vocational Technical Education Consortium of States, Atlanta, GA.

A project was conducted to develop vocational education tests for use in Georgia secondary schools, specifically for welding, machine shop, and sheet metal courses. The project team developed an outline of an assessment model that included the following components: (1) select a program for use in developing test items; (2) verify duties, tasks,…
Effects of Presentation Mode and Computer Familiarity on Summarization of Extended Texts

ERIC Educational Resources Information Center

Yu, Guoxing

2010-01-01

Comparability studies on computer- and paper-based reading tests have focused on short texts and selected-response items via almost exclusively statistical modeling of test performance. The psychological effects of presentation mode and computer familiarity on individual students are under-researched. In this study, 157 students read extended…
Selected Test Items in American History. Bulletin Number 6, Fifth Edition.

ERIC Educational Resources Information Center

Anderson, Howard R.; Lindquist, E. F.

Designed for high school students, this bulletin provides an extensive file of 1,062 multiple-choice questions in American history. Taken largely from the Iowa Every-Pupil Program and the Cooperative Test Service standardized examinations, the questions are chronologically divided into 16 topic areas. They include exploration and discovery;…
The Robustness of IRT-Based Vertical Scaling Methods to Violation of Unidimensionality

ERIC Educational Resources Information Center

Yin, Liqun

2013-01-01

In recent years, many states have adopted Item Response Theory (IRT) based vertically scaled tests due to their compelling features in a growth-based accountability context. However, selection of a practical and effective calibration/scaling method and proper understanding of issues with possible multidimensionality in the test data is critical to…
Empirical versus Random Item Selection in the Design of Intelligence Test Short Forms--The WISC-R Example.

ERIC Educational Resources Information Center

Goh, David S.

1979-01-01

The advantages of using psychometric thoery to design short forms of intelligence tests are demonstrated by comparing such usage to a systematic random procedure that has previously been used. The Wechsler Intelligence Scale for Children Revised (WISC-R) Short Form is presented as an example. (JKS)
Problems in Criterion-Referenced Measurement. CSE Monograph Series in Evaluation, 3.

ERIC Educational Resources Information Center

Harris, Chester W., Ed.; And Others

Six essays on technical measurement problems in criterion referenced tests and four essays by psychometricians proposing solutions are presented: (1) "Criterion-Referenced Measurement" and Other Such Terms, by Marvin C. Alkin which is an overview of the first six papers; (2) Selecting Objectives and Generating Test Items for Objectives-Based…
Collaborative Systems and Multi-user Interfaces: Computer-based Tools for Cooperative Problem Solving

DTIC Science & Technology

1986-10-31

Reference Card Given to Participants) Cognoter Reference Select = LeftButton Menu = MiddleButton TitleBar menu for tool operations Item menu for item...collaborative tools and their uses, the Colab system and the Cognoter presentation tool were implemented and used for both real and posed idea organization...tasks. To test the system design and its effect on structured problem-solving, many early Colab/ Cognoter meetings were monitored and a series of
A computer adaptive testing version of the Addiction Severity Index-Multimedia Version (ASI-MV): The Addiction Severity CAT.

PubMed

Butler, Stephen F; Black, Ryan A; McCaffrey, Stacey A; Ainscough, Jessica; Doucette, Ann M

2017-05-01

The purpose of this study was to develop and validate a computer adaptive testing (CAT) version of the Addiction Severity Index-Multimedia Version (ASI-MV), the Addiction Severity CAT. This goal was accomplished in 4 steps. First, new candidate items for Addiction Severity CAT domains were evaluated after brainstorming sessions with experts in substance abuse treatment. Next, this new item bank was psychometrically evaluated on a large nonclinical (n = 4,419) and substance abuse treatment (n = 845) sample. Based on these results, final items were selected and calibrated for the creation of the Addiction Severity CAT algorithms. Once the algorithms were developed for the entire assessment, a fully functioning prototype of an Addiction Severity CAT was created. CAT simulations were conducted, and optimal termination criteria were selected for the Addiction Severity CAT algorithms. Finally, construct validity of the CAT algorithms was evaluated by examining convergent and discriminant validity and sensitivity to change. The Addiction Severity CAT was determined to be valid, sensitive to change, and reliable. Further, the Addiction Severity CAT's time of completion was found to be significantly less than the average time of completion for the ASI-MV composite scores. This study represents the initial validation of an Addiction Severity CAT based on item response theory, and further exploration of the Addiction Severity CAT is needed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

Development of Korean Smartphone addiction proneness scale for youth.

PubMed

Kim, Dongil; Lee, Yunhee; Lee, Juyoung; Nam, JeeEun Karin; Chung, Yeoju

2014-01-01

This study developed a Smartphone Addiction Proneness Scale (SAPS) based on the existing internet and cellular phone addiction scales. For the development of this scale, 29 items (1.5 times the final number of items) were initially selected as preliminary items, based on the previous studies on internet/phone addiction as well as the clinical experience of involved experts. The preliminary scale was administered to a nationally representative sample of 795 students in elementary, middle, and high schools across South Korea. Then, final 15 items were selected according to the reliability test results. The final scale consisted of four subdomains: (1) disturbance of adaptive functions, (2) virtual life orientation, (3) withdrawal, and (4) tolerance. The final scale indicated a high reliability with Cronbach's α of .880. Support for the scale's criterion validity has been demonstrated by its relationship to the internet addiction scale, KS-II (r = .49). For the analysis of construct validity, we tested the Structural Equation Model. The results showed the four-factor structure to be valid (NFI = .943, TLI = .902, CFI = .902, RMSEA = .034). Smartphone addiction is gaining a greater spotlight as possibly a new form of addiction along with internet addiction. The SAPS appears to be a reliable and valid diagnostic scale for screening adolescents who may be at risk of smartphone addiction. Further implications and limitations are discussed.
Development of Korean Smartphone Addiction Proneness Scale for Youth

PubMed Central

Kim, Dongil; Lee, Yunhee; Lee, Juyoung; Nam, JeeEun Karin; Chung, Yeoju

2014-01-01

This study developed a Smartphone Addiction Proneness Scale (SAPS) based on the existing internet and cellular phone addiction scales. For the development of this scale, 29 items (1.5 times the final number of items) were initially selected as preliminary items, based on the previous studies on internet/phone addiction as well as the clinical experience of involved experts. The preliminary scale was administered to a nationally representative sample of 795 students in elementary, middle, and high schools across South Korea. Then, final 15 items were selected according to the reliability test results. The final scale consisted of four subdomains: (1) disturbance of adaptive functions, (2) virtual life orientation, (3) withdrawal, and (4) tolerance. The final scale indicated a high reliability with Cronbach's α of .880. Support for the scale's criterion validity has been demonstrated by its relationship to the internet addiction scale, KS-II (r = .49). For the analysis of construct validity, we tested the Structural Equation Model. The results showed the four-factor structure to be valid (NFI = .943, TLI = .902, CFI = .902, RMSEA = .034). Smartphone addiction is gaining a greater spotlight as possibly a new form of addiction along with internet addiction. The SAPS appears to be a reliable and valid diagnostic scale for screening adolescents who may be at risk of smartphone addiction. Further implications and limitations are discussed. PMID:24848006
The psychometric properties of the "Reading the Mind in the Eyes" Test: an item response theory (IRT) analysis.

PubMed

Preti, Antonio; Vellante, Marcello; Petretto, Donatella R

2017-05-01

The "Reading the Mind in the Eyes" Test (hereafter: Eyes Test) is considered an advanced task of the Theory of Mind aimed at assessing the performance of the participant in perspective-takingthat is, the ability to sense or understand other people's cognitive and emotional states. In this study, the item response theory analysis was applied to the adult version of the Eyes Test. The Italian version of the Eyes Test was administered to 200 undergraduate students of both genders (males = 46%). Modified parallel analysis (MPA) was used to test unidimensionality. Marginal maximum likelihood estimation was used to fit the 1-, 2-, and 3-parameter logistic (PL) model to the data. Differential Item Functioning (DIF) due to gender was explored with five independent methods. MPA provided evidence in favour of unidimensionality. The Rasch model (1-PL) was superior to the other two models in explaining participants' responses to the Eyes Test. There was no robust evidence of gender-related DIF in the Eyes Test, although some differences may exist for some items as a reflection of real differences by group. The study results support a one-factor model of the Eyes Test. Performance on the Eyes Test is defined by the participant's ability in perspective-taking. Researchers should cease using arbitrarily selected subscores in assessing the performance of participants to the Eyes Test. Lack of gender-related DIF favours the use of the Eyes Test in the investigation of gender differences concerning empathy and social cognition.
Modifying the test of understanding graphs in kinematics

NASA Astrophysics Data System (ADS)

Zavala, Genaro; Tejeda, Santa; Barniol, Pablo; Beichner, Robert J.

2017-12-01

In this article, we present several modifications to the Test of Understanding Graphs in Kinematics. The most significant changes are (i) the addition and removal of items to achieve parallelism in the objectives (dimensions) of the test, thus allowing comparisons of students' performance that were not possible with the original version, and (ii) changes to the distractors of some of the original items that represent the most frequent alternative conceptions. The final modified version (after an iterative process involving four administrations of test variations over two years) was administered to 471 students of an introductory university physics course at a large private university in Mexico. When analyzing the final modified version of the test it was found that the added items satisfied the statistical tests of difficulty, discriminatory power, and reliability; also, that the great majority of the modified distractors were effective in terms of their frequency selection and discriminatory power; and, that the final modified version of the test satisfied the reliability and discriminatory power criteria as well as the original test. Here, we also show the use of the new version of the test, presenting a new analysis of students' understanding not possible to do before with the original version of the test, specifically regarding the objectives and items that in the new version meet parallelisms. Finally, in the PhysPort project (physport.org), we present the final modified version of the test. It can be used by teachers and researchers to assess students' understanding of graphs in kinematics, as well as their learning about them.
Standardizing hysteroscopy teaching: development of a curriculum using the Delphi method.

PubMed

Neveu, Marie-Emmanuelle; Debras, Elodie; Niro, Julien; Fernandez, Hervé; Panel, Pierre

2017-12-01

Hysteroscopy is performed often and in many indications but is challenging to learn. Hands-on training in live patients faces ethical, legal, and economic obstacles. Virtual reality simulation may hold promise as a hysteroscopy training tool. No validated curriculum specific in hysteroscopy exists. The aim of this study was to develop a hysteroscopy curriculum, using the Delphi method to identify skill requirements. Based on a literature review using the key words "curriculum," "simulation," and "hysteroscopy," we identified five technical and non-technical areas in which skills were required. Twenty hysteroscopy experts from different French hospital departments participated in Delphi rounds to select items in these five areas. The rounds were to be continued until 80-100% agreement was obtained for at least 60% of items. A curriculum was built based on the selected items and was evaluated in residents. From November 2014 to April 2015, 18 of 20 invited experts participated in three Delphi rounds. Of the 51 items selected during the first round, only 25 (49%) had 80-100% agreement during the second round, and a third round was therefore conducted. During this last round, 80-100% agreement was achieved for 31 (61%) items, which were used to create the curriculum. All 14 residents tested felt that a simulator training session was acceptable and helped them to improve their skills. We describe a simulation-based hysteroscopy curriculum focusing on skill requirements identified by a Delphi procedure. Its development allows standardization of training programs offered to residents.
Selection of multiple cued items is possible during visual short-term memory maintenance.

PubMed

Matsukura, Michi; Vecera, Shaun P

2015-07-01

Recent neuroimaging studies suggest that maintenance of a selected object feature held in visual short-term/working memory (VSTM/VWM) is supported by the same neural mechanisms that encode the sensory information. If VSTM operates by retaining "reasonable copies" of scenes constructed during sensory processing (Serences, Ester, Vogel, & Awh, 2009, p. 207, the sensory recruitment hypothesis), then attention should be able to select multiple items represented in VSTM as long as the number of these attended items does not exceed the typical VSTM capacity. It is well known that attention can select at least two noncontiguous locations at the same time during sensory processing. However, empirical reports from the studies that examined this possibility are inconsistent. In the present study, we demonstrate that (1) attention can indeed select more than a single item during VSTM maintenance when observers are asked to recognize a set of items in the manner that these items were originally attended, and (2) attention can select multiple cued items regardless of whether these items are perceptually organized into a single group (contiguous locations) or not (noncontiguous locations). The results also replicate and extend the recent finding that selective attention that operates during VSTM maintenance is sensitive to the observers' goal and motivation to use the cueing information.
Airworthiness and Flight Characteristics Test (A&FC) of the CH-47D helicopter

DTIC Science & Technology

1984-02-01

Development Specification which were evaluated during this test. The Advanced Flight Control System heading select capability and the pressure refueling...determine compliance with the CH-47D Prime Item Development Specification (PIDS). 2. This Directorate agrees with the report conclusions and...Evaluations (PAE) (refs 1 and 2. app A), climatic laboratory tests (ref 3), and icing tests (ref 4). The US Army Aviation Research and Development
Nondestructive testing techniques

NASA Astrophysics Data System (ADS)

Bray, Don E.; McBride, Don

A comprehensive reference covering a broad range of techniques in nondestructive testing is presented. Based on years of extensive research and application at NASA and other government research facilities, the book provides practical guidelines for selecting the appropriate testing methods and equipment. Topics discussed include visual inspection, penetrant and chemical testing, nuclear radiation, sonic and ultrasonic, thermal and microwave, magnetic and electromagnetic techniques, and training and human factors. (No individual items are abstracted in this volume)
Perception and Action Selection Dissociate Human Ventral and Dorsal Cortex

ERIC Educational Resources Information Center

Ikkai, Akiko; Jerde, Trenton A.; Curtis, Clayton E.

2011-01-01

We test theories about the functional organization of the human cortex by correlating brain activity with demands on perception versus action selection. Subjects covertly searched for a target among an array of 4, 8, or 12 items (perceptual manipulation) and then, depending on the color of the array, made a saccade toward, away from, or at a right…
Selective rehearsal is affected by the emotionality of the encoding context in item-method directed forgetting: An event-related potential study.

PubMed

Liu, Tzu-Ling; Chen, Nai-Feng; Cheng, Shih-Kuen

2017-02-01

Emotional items are often remembered more clearly than neutral items. However, whether stimuli embedded in an emotional context are more resistant to directed forgetting than those presented in a neutral context remains unclear. This question was tested by recording event-related potentials (ERPs) in an item-method directed forgetting paradigm involving neutral words that were embedded in neutral or negative contexts. During the study phase, participants were asked to associate a neutral word with a negative or neutral picture. A remember (R) or forget (F) cue was then designated to indicate whether the word was a to-be-remembered (TBR) or to-be-forgotten (TBF) word. In the test phase, participants were asked to identify all previously presented old words regardless of the R/F cues. The behavioral results indicated a significant interaction between the valence of the encoding contexts and the R/F cues. The hit rate was lower for the TBR words encoded in negative contexts relative to those encoded in neutral contexts. No such valence effect was observed in the hit rates of the TBF words. For the ERP data, the R cues elicited a P3b-like effect that has been linked to the selective rehearsal of the TBR items. This effect was more sustained in the negative encoding context than in the neutral context. The F cues elicited a frontal positivity that has been linked to the active inhibition of the TBF words; however, this positivity was not modulated by the valence of the encoding context. The sustained P3b-like effect for the R cues in the negative encoding context might reflect a compensative encoding for the TBR words caused by the attention-capturing negative contexts. Therefore, we argue that the emotional context affected the selective elaboration of the TBR words; however, we also argue that there was no supportive evidence of an emotional effect on the forgetting of TBF items. Copyright © 2016 Elsevier B.V. All rights reserved.
The Stigma Resistance Scale: A multi-sample validation of a new instrument to assess mental illness stigma resistance.

PubMed

Firmin, Ruth L; Lysaker, Paul H; McGrew, John H; Minor, Kyle S; Luther, Lauren; Salyers, Michelle P

2017-12-01

Although associated with key recovery outcomes, stigma resistance remains under-studied largely due to limitations of existing measures. This study developed and validated a new measure of stigma resistance. Preliminary items, derived from qualitative interviews of people with lived experience, were pilot tested online with people self-reporting a mental illness diagnosis (n = 489). Best performing items were selected, and the refined measure was administered to an independent sample of people with mental illness at two state mental health consumer recovery conferences (n = 202). Confirmatory factor analyses (CFA) guided by theory were used to test item fit, correlations between the refined stigma resistance measure and theoretically relevant measures were examined for validity, and test-retest correlations of a subsample were examined for stability. CFA demonstrated strong fit for a 5-factor model. The final 20-item measure demonstrated good internal consistency for each of the 5 subscales, adequate test-retest reliability at 3 weeks, and strong construct validity (i.e., positive associations with quality of life, recovery, and self-efficacy, and negative associations with overall symptoms, defeatist beliefs, and self-stigma). The new measure offers a more reliable and nuanced assessment of stigma resistance. It may afford greater personalization of interventions targeting stigma resistance. Copyright © 2017 Elsevier B.V. All rights reserved.
Evaluation of the Clinical LOINC (Logical Observation Identifiers, Names, and Codes) Semantic Structure as a Terminology Model for Standardized Assessment Measures

PubMed Central

Bakken, Suzanne; Cimino, James J.; Haskell, Robert; Kukafka, Rita; Matsumoto, Cindi; Chan, Garrett K.; Huff, Stanley M.

2000-01-01

Objective: The purpose of this study was to test the adequacy of the Clinical LOINC (Logical Observation Identifiers, Names, and Codes) semantic structure as a terminology model for standardized assessment measures. Methods: After extension of the definitions, 1,096 items from 35 standardized assessment instruments were dissected into the elements of the Clinical LOINC semantic structure. An additional coder dissected at least one randomly selected item from each instrument. When multiple scale types occurred in a single instrument, a second coder dissected one randomly selected item representative of each scale type. Results: The results support the adequacy of the Clinical LOINC semantic structure as a terminology model for standardized assessments. Using the revised definitions, the coders were able to dissect into the elements of Clinical LOINC all the standardized assessment items in the sample instruments. Percentage agreement for each element was as follows: component, 100 percent; property, 87.8 percent; timing, 82.9 percent; system/sample, 100 percent; scale, 92.6 percent; and method, 97.6 percent. Discussion: This evaluation was an initial step toward the representation of standardized assessment items in a manner that facilitates data sharing and re-use. Further clarification of the definitions, especially those related to time and property, is required to improve inter-rater reliability and to harmonize the representations with similar items already in LOINC. PMID:11062226
Evaluation and simplification of the occupational slip, trip and fall risk-assessment test

PubMed Central

NAKAMURA, Takehiro; OYAMA, Ichiro; FUJINO, Yoshihisa; KUBO, Tatsuhiko; KADOWAKI, Koji; KUNIMOTO, Masamizu; ODOI, Haruka; TABATA, Hidetoshi; MATSUDA, Shinya

2016-01-01

Objective: The purpose of this investigation is to evaluate the efficacy of the occupational slip, trip and fall (STF) risk assessment test developed by the Japan Industrial Safety and Health Association (JISHA). We further intended to simplify the test to improve efficiency. Methods: A previous cohort study was performed using 540 employees aged ≥50 years who took the JISHA’s STF risk assessment test. We conducted multivariate analysis using these previous results as baseline values and answers to questionnaire items or score on physical fitness tests as variables. The screening efficiency of each model was evaluated based on the obtained receiver operating characteristic (ROC) curve. Results: The area under the ROC obtained in multivariate analysis was 0.79 when using all items. Six of the 25 questionnaire items were selected for stepwise analysis, giving an area under the ROC curve of 0.77. Conclusion: Based on the results of follow-up performed one year after the initial examination, we successfully determined the usefulness of the STF risk assessment test. Administering a questionnaire alone is sufficient for screening subjects at risk of STF during the subsequent one-year period. PMID:27021057
Evidence against associative blocking as a cause of cue-independent retrieval-induced forgetting.

PubMed

Hulbert, Justin C; Shivde, Geeta; Anderson, Michael C

2012-01-01

Selectively retrieving an item from long-term memory reduces the accessibility of competing traces, a phenomenon known as retrieval-induced forgetting (RIF). RIF exhibits cue independence, or the tendency for forgetting to generalize to novel test cues, suggesting an inhibitory basis for this phenomenon. An alternative view (Camp, Pecher, & Schmidt, 2007; Camp et al., 2009; Perfect et al., 2004) suggests that using novel test cues to measure cue independence actually engenders associative interference when participants covertly supplement retrieval with practiced cues that then associatively block retrieval. Accordingly, the covert-cueing hypothesis assumes that the relative strength of the practiced items at final test – and not the inhibition levied on the unpracticed items during retrieval practice – underlies cue-independent forgetting. As such, this perspective predicts that strengthening practiced items by any means, even if not via retrieval practice, should induce forgetting. Contrary to these predictions, however, we present clear evidence that cue-independent forgetting is induced by retrieval practice and not by repeated study exposures. This dissociation occurred despite significant, comparable levels of strengthening of practiced items in each case, and despite the use of Anderson and Spellman's original (1995) independent probe method criticized by covert-cueing theorists as being especially conducive to associative blocking. These results demonstrate that cue-independent RIF is unrelated to the strengthening of practiced items, and thereby fail to support a key prediction of the covert-cueing hypothesis. The results, instead, favor a role of inhibition in resolving retrieval interference. © 2011 Hogrefe Publishing
Cross-cultural adaptation and validation of a Bengali version of the modified fibromyalgia impact questionnaire.

PubMed

Muquith, Mohammed A; Islam, Md Nazrul; Haq, Syed A; Ten Klooster, Peter M; Rasker, Johannes J; Yunus, Muhammad B

2012-08-27

Currently, no validated instruments are available to measure the health status of Bangladeshi patients with fibromyalgia (FM). The aims of this study were to cross-culturally adapt the modified Fibromyalgia Impact Questionnaire (FIQ) into Bengali (B-FIQ) and to test its validity and reliability in Bangladeshi patients with FM. The FIQ was translated following cross-cultural adaptation guidelines and pretested in 30 female patients with FM. Next, the adapted B-FIQ was physician-administered to 102 consecutive female FM patients together with the Health Assessment Questionnaire (HAQ), selected subscales of the SF-36, and visual analog scales for current clinical symptoms. A tender point count (TPC) was performed by an experienced rheumatologist. Forty randomly selected patients completed the B-FIQ again after 7 days. Two control groups of 50 healthy people and 50 rheumatoid arthritis (RA) patients also completed the B-FIQ. For the final B-FIQ, five physical function sub-items were replaced with culturally appropriate equivalents. Internal consistency was adequate for both the 11-item physical function subscale (α = 0.73) and the total scale (α = 0.83). With exception of the physical function subscale, expected correlations were generally observed between the B-FIQ items and selected subscales of the SF-36, HAQ, clinical symptoms, and TPC. The B-FIQ was able to discriminate between FM patients and healthy controls and between FM patients and RA patients. Test-retest reliability was adequate for the physical function subscale (r = 0.86) and individual items (r = 0.73-0.86), except anxiety (r = 0.27) and morning tiredness (r = 0.64). This study supports the reliability and validity of the B-FIQ as a measure of functional disability and health status in Bangladeshi women with FM.
Relative Utility of Selected Software Requirement Metrics

DTIC Science & Technology

1991-12-01

testing . They can also help in deciding if and how to use complexity reduction techniques. In summary, requirement metrics can be useful because they...answer items in a test instrument. In order to differentiate between misinterpretation and comprehension, the measurement technique must be able to...effectively test a requirement, it is verifiable. Ramamoorthy and others have proposed requirements complexity metrics that can be used to infer the
Nutrition Report Cards: An Opportunity to Improve School Lunch Selection

PubMed Central

Wansink, Brian; Just, David R.; Patterson, Richard W.; Smith, Laura E.

2013-01-01

Objective To explore the feasibility and implementation efficiency of Nutritional Report Cards(NRCs) in helping children make healthier food choices at school. Methods Pilot testing was conducted in a rural New York school district (K-12). Over a five-week period, 27 parents received a weekly e-mail containing a NRC listing how many meal components (fruits, vegetables, starches, milk), snacks, and a-la-carte foods their child selected. We analyzed choices of students in the NRC group vs. the control group, both prior to and during the intervention period. Point-of-sale system data for a-la-carte items was analyzed using Generalized Least Squares regressions with clustered standard errors. Results NRCs encouraged more home conversations about nutrition and more awareness of food selections. Despite the small sample, the NRC was associated with reduced selection of some items, such as the percentage of those selecting cookies which decreased from 14.3 to 6.5 percent. Additionally, despite requiring new keys on the check-out registers to generate the NRC, checkout times increased by only 0.16 seconds per transaction, and compiling and sending the NRCs required a total weekly investment of 30 minutes of staff time. Conclusions This test of concept suggests that NRCs are a feasible and inexpensive tool to guide children towards healthier choices. PMID:24098324
Nutrition Report Cards: an opportunity to improve school lunch selection.

PubMed

Wansink, Brian; Just, David R; Patterson, Richard W; Smith, Laura E

2013-01-01

To explore the feasibility and implementation efficiency of Nutritional Report Cards (NRCs) in helping children make healthier food choices at school. Pilot testing was conducted in a rural New York school district (K-12). Over a five-week period, 27 parents received a weekly e-mail containing a NRC listing how many meal components (fruits, vegetables, starches, milk), snacks, and a-la-carte foods their child selected. We analyzed choices of students in the NRC group vs. the control group, both prior to and during the intervention period. Point-of-sale system data for a-la-carte items was analyzed using Generalized Least Squares regressions with clustered standard errors. NRCs encouraged more home conversations about nutrition and more awareness of food selections. Despite the small sample, the NRC was associated with reduced selection of some items, such as the percentage of those selecting cookies which decreased from 14.3 to 6.5 percent. Additionally, despite requiring new keys on the check-out registers to generate the NRC, checkout times increased by only 0.16 seconds per transaction, and compiling and sending the NRCs required a total weekly investment of 30 minutes of staff time. This test of concept suggests that NRCs are a feasible and inexpensive tool to guide children towards healthier choices.
Applications of computerized adaptive testing (CAT) to the assessment of headache impact.

PubMed

Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew

2003-12-01

To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.
An updated Italian normative dataset for the Stroop color word test (SCWT).

PubMed

Brugnolo, A; De Carli, F; Accardo, J; Amore, M; Bosia, L E; Bruzzaniti, C; Cappa, S F; Cocito, L; Colazzo, G; Ferrara, M; Ghio, L; Magi, E; Mancardi, G L; Nobili, F; Pardini, M; Rissotto, R; Serrati, C; Girtler, N

2016-03-01

The Stroop color and word test (SCWT) is widely used to evaluate attention, information processing speed, selective attention, and cognitive flexibility. Normative values for the Italian population are available only for selected age groups, or for the short version of the test. The aim of this study was to provide updated normal values for the full version, balancing groups across gender, age decades, and education. Two kinds of indexes were derived from the performance of 192 normal subjects, divided by decade (from 20 to 90) and level of education (4 levels: 3-5; 6-8; 9-13; >13 years). They were (i) the correct answers achieved for each table in the first 30 s (word items, WI; color items, CI; color word items, CWI) and (ii) the total time required for reading the three tables (word time, WT; color time, CT; color word time, CWT). For each index, the regression model was evaluated using age, education, and gender as independent variables. The normative data were then computed following the equivalent scores method. In the regression model, age and education significantly influenced the performance in each of the 6 indexes, whereas gender had no significant effect. This study confirms the effect of age and education on the main indexes of the Stroop test and provides updated normative data for an Italian healthy population, well balanced across age, education, and gender. It will be useful to Italian researchers studying attentional functions in health and disease.

Domestic violence on children: development and validation of an instrument to evaluate knowledge of health professionals 1

PubMed Central

Oliveira, Lanuza Borges; Soares, Fernanda Amaral; Silveira, Marise Fagundes; de Pinho, Lucinéia; Caldeira, Antônio Prates; Leite, Maísa Tavares de Souza

2016-01-01

ABSTRACT Objective: to develop and validate an instrument to evaluate the knowledge of health professionals about domestic violence on children. Method: this was a study conducted with 194 physicians, nurses and dentists. A literature review was performed for preparation of the items and identification of the dimensions. Apparent and content validation was performed using analysis of three experts and 27 professors of the pediatric health discipline. For construct validation, Cronbach's alpha was used, and the Kappa test was applied to verify reproducibility. The criterion validation was conducted using the Student's t-test. Results: the final instrument included 56 items; the Cronbach alpha was 0.734, the Kappa test showed a correlation greater than 0.6 for most items, and the Student t-test showed a statistically significant value to the level of 5% for the two selected variables: years of education and using the Family Health Strategy. Conclusion: the instrument is valid and can be used as a promising tool to develop or direct actions in public health and evaluate knowledge about domestic violence on children. PMID:27556878
Volume 42, Issue5 (May 2005)Articles in the Current Issue:Developmental growth in students' concept of energy: Analysis of selected items from the TIMSS database

NASA Astrophysics Data System (ADS)

Liu, Xiufeng; McKeough, Anne

2005-05-01

The aim of this study was to develop a model of students' energy concept development. Applying Case's (1985, 1992) structural theory of cognitive development, we hypothesized that students' concept of energy undergoes a series of transitions, corresponding to systematic increases in working memory capacity. The US national sample from the Third International Mathematics and Science Study (TIMSS) database was used to test our hypothesis. Items relevant to the energy concept in the TIMSS test booklets for three populations were identified. Item difficulty from Rasch modeling was used to test the hypothesized developmental sequence, and percentage of students' correct responses was used to test the correspondence between students' age/grade level and level of the energy concepts. The analysis supported our hypothesized sequence of energy concept development and suggested mixed effects of maturation and schooling on energy concept development. Further, the results suggest that curriculum and instruction design take into consideration the developmental progression of students' concept of energy.
A knowledge-based theory of rising scores on "culture-free" tests.

PubMed

Fox, Mark C; Mitchum, Ainsley L

2013-08-01

Secular gains in intelligence test scores have perplexed researchers since they were documented by Flynn (1984, 1987). Gains are most pronounced on abstract, so-called culture-free tests, prompting Flynn (2007) to attribute them to problem-solving skills availed by scientifically advanced cultures. We propose that recent-born individuals have adopted an approach to analogy that enables them to infer higher level relations requiring roles that are not intrinsic to the objects that constitute initial representations of items. This proposal is translated into item-specific predictions about differences between cohorts in pass rates and item-response patterns on the Raven's Matrices (Flynn, 1987), a seemingly culture-free test that registers the largest Flynn effect. Consistent with predictions, archival data reveal that individuals born around 1940 are less able to map objects at higher levels of relational abstraction than individuals born around 1990. Polytomous Rasch models verify predicted violations of measurement invariance, as raw scores are found to underestimate the number of analogical rules inferred by members of the earlier cohort relative to members of the later cohort who achieve the same overall score. The work provides a plausible cognitive account of the Flynn effect, furthers understanding of the cognition of matrix reasoning, and underscores the need to consider how test-takers select item responses. PsycINFO Database Record (c) 2013 APA, all rights reserved.
The "None of the Above" Option in Multiple-Choice Testing: An Experimental Study

ERIC Educational Resources Information Center

DiBattista, David; Sinnige-Egger, Jo-Anne; Fortuna, Glenda

2014-01-01

The authors assessed the effects of using "none of the above" as an option in a 40-item, general-knowledge multiple-choice test administered to undergraduate students. Examinees who selected "none of the above" were given an incentive to write the correct answer to the question posed. Using "none of the above" as the…
The Effects of Feedback and Selected Personality Variables on Aesthetic Judgment.

ERIC Educational Resources Information Center

West, Charles K.; And Others

This study is an attempt to investigate the extent of which knowledge of results in various forms (true, none, and false) may modify aesthetic judgment. Seventy-two graduate students were administered an aesthetic judgment test of fifty items. On half of the test, twenty-four subjects received correct feedback and twenty-four received false…
[Egypt: Selected Readings, Egyptian Mummies, and the Egyptian Pyramid.

ERIC Educational Resources Information Center

National Museum of Natural History, Washington, DC.

This resource packet presents information and resources on ancient Egypt. The bibliography includes readings divided into five sections: (1) "General Information" (46 items); (2) "Religion" (8 items); (3) "Art" (8 items); (4) "Hieroglyphics" (6 items); and (5) selections "For Young Readers" (11…
Test Program for Assessing Vulnerability of Industrial Equipment to Nuclear Air Blast.

DTIC Science & Technology

1983-10-01

PROJECT. TASK 4Scientific Servic, Inc. AREA & WORK UNIT NUMBERS 517 East Bayshore Work Unit 1124F Redwood City, CA 94063___ __________ 11. CONTROLLING ...vulnerability, but perhaps less expensive, to be selected and substituted, with an eye to cost control . 5. MODELING AND SCALING CONSIDERATIONS Reiterating...behavior and properties of the test items and Interfaces that control behavior (e4g., test objects/flow field, test objects/interfacing surface of
Development and preliminary testing of a computerized adaptive assessment of chronic pain.

PubMed

Anatchkova, Milena D; Saris-Baglama, Renee N; Kosinski, Mark; Bjorner, Jakob B

2009-09-01

The aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain.
Development and Validation of the Conceptual Assessment of Natural Selection (CANS)

PubMed Central

Kalinowski, Steven T.; Leonard, Mary J.; Taper, Mark L.

2016-01-01

We developed and validated the Conceptual Assessment of Natural Selection (CANS), a multiple-choice test designed to assess how well college students understand the central principles of natural selection. The expert panel that reviewed the CANS concluded its questions were relevant to natural selection and generally did a good job sampling the specific concepts they were intended to assess. Student interviews confirmed questions on the CANS provided accurate reflections of how students think about natural selection. And, finally, statistical analysis of student responses using item response theory showed that the CANS did a very good job of estimating how well students understood natural selection. The empirical reliability of the CANS was substantially higher than the Force Concept Inventory, a highly regarded test in physics that has a similar purpose. PMID:27856552
Health measurement using the ICF: Test-retest reliability study of ICF codes and qualifiers in geriatric care

PubMed Central

Okochi, Jiro; Utsunomiya, Sakiko; Takahashi, Tai

2005-01-01

Background The International Classification of Functioning, Disability and Health (ICF) was published by the World Health Organization (WHO) to standardize descriptions of health and disability. Little is known about the reliability and clinical relevance of measurements using the ICF and its qualifiers. This study examines the test-retest reliability of ICF codes, and the rate of immeasurability in long-term care settings of the elderly to evaluate the clinical applicability of the ICF and its qualifiers, and the ICF checklist. Methods Reliability of 85 body function (BF) items and 152 activity and participation (AP) items of the ICF was studied using a test-retest procedure with a sample of 742 elderly persons from 59 institutional and at home care service centers. Test-retest reliability was estimated using the weighted kappa statistic. The clinical relevance of the ICF was estimated by calculating immeasurability rate. The effect of the measurement settings and evaluators' experience was analyzed by stratification of these variables. The properties of each item were evaluated using both the kappa statistic and immeasurability rate to assess the clinical applicability of WHO's ICF checklist in the elderly care setting. Results The median of the weighted kappa statistics of 85 BF and 152 AP items were 0.46 and 0.55 respectively. The reproducibility statistics improved when the measurements were performed by experienced evaluators. Some chapters such as genitourinary and reproductive functions in the BF domain and major life area in the AP domain contained more items with lower test-retest reliability measures and rated as immeasurable than in the other chapters. Some items in the ICF checklist were rated as unreliable and immeasurable. Conclusion The reliability of the ICF codes when measured with the current ICF qualifiers is relatively low. The result in increase in reliability according to evaluators' experience suggests proper education will have positive effects to raise the reliability. The ICF checklist contains some items that are difficult to be applied in the geriatric care settings. The improvements should be achieved by selecting the most relevant items for each measurement and by developing appropriate qualifiers for each code according to the interest of the users. PMID:16050960
[Design and validation of a questionnaire for psychosocial nursing diagnosis in Primary Care].

PubMed

Brito-Brito, Pedro Ruymán; Rodríguez-Álvarez, Cristobalina; Sierra-López, Antonio; Rodríguez-Gómez, José Ángel; Aguirre-Jaime, Armando

2012-01-01

To develop a valid, reliable and easy-to-use questionnaire for a psychosocial nursing diagnosis. The study was performed in two phases: first phase, questionnaire design and construction; second phase, validity and reliability tests. A bank of items was constructed using the NANDA classification as a theoretical framework. Each item was assigned a Likert scale or dichotomous response. The combination of responses to the items constituted the diagnostic rules to assign up to 28 labels. A group of experts carried out the validity test for content. Other validated scales were used as reference standards for the criterion validity tests. Forty-five nurses provided the questionnaire to the patients on three separate occasions over a period of three weeks, and the other validated scales only once to 188 randomly selected patients in Primary Care centres in Tenerife (Spain). Validity tests for construct confirmed the six dimensions of the questionnaire with 91% of total variance explained. Validity tests for criterion showed a specificity of 66%-100%, and showed high correlations with the reference scales when the questionnaire was assigning nursing diagnoses. Reliability tests showed agreement of 56%-91% (P<.001), and a 93% internal consistency. The Questionnaire for Psychosocial Nursing Diagnosis was called CdePS, and included 61 items. The CdePS is a valid, reliable and easy-to-use tool in Primary Care centres to improve the assigning of a psychosocial nursing diagnosis. Copyright © 2011 Elsevier España, S.L. All rights reserved.
A Analysis of Saudi Arabian High School Students' Misconceptions about Physics Concepts.

NASA Astrophysics Data System (ADS)

Al-Rubayea, Abdullah A. M.

This study was conducted to explore Saudi high students' misconceptions in selected physics concepts. It also detected the effects of gender, grade level and location of school on Saudi high school students' misconceptions. In addition, a further analysis of students' misconceptions in each question was investigated and a correlation between students' responses, confidence in answers and sensibleness was conducted. There was an investigation of sources of students' answers in this study. Finally, this study included an analysis of students' selection of reasons only in the instrument. The instrument used to detect the students' misconceptions was a modified form of the Misconception Identification in Science Questionnaire (MISQ). This instrument was developed by Franklin (1992) to detected students' misconceptions in selected physics concepts. This test is a two-tier multiple choice test that examines four areas of physics: Force and motion, heat and temperature, light and color and electricity and magnetism. This study included a sample of 1080 Saudi high school students who were randomly selected from six Saudi educational districts. This study also included both genders, the three grade levels of Saudi high schools, six different educational districts, and a city and a town in each educational district. The sample was equally divided between genders, grade levels, and educational districts. The result of this study revealed that Saudi Arabian high school students hold numerous misconceptions about selected physics concepts. It also showed that tenth grade students were significantly different than the other grades. The result also showed that different misconceptions are held by the students for each concept in the MISQ. A positive correlation between students' responses, confidence in answers and sensibleness in many questions was shown. In addition, it showed that guessing was the most dominant source of misconceptions. The result revealed that gender and grade level had an affect on students' choice of decision on the MISQ items. A positive change in the means of gender and grade levels in the multiple choice test and gender differences in selection of reason may be associated with specific concepts. No significant difference in frequencies of the reasons chosen by the student to justify their answers were found in most of the items (10 items).
The development of the "Cantonese receptive vocabulary test' for children aged 2-6 in Hong Kong.

PubMed

Cheung, P S; Lee, K Y; Lee, L W

1997-01-01

The study aims to develop a Cantonese receptive vocabulary test to assess 2-6-year-old children in Hong Kong. The test consists of 100 test items. Each target item is accompanied by a phonological distractor, a semantic distractor and an unrelated distractor. A sample of 609 normal children from four Maternal and Child Health Centres and nine kindergartens was selected. The results show that there is a significant effect of age on the correct score. ANOVA was performed to look at the age effect on each distractor individually. It was found that the scores of the three distractors decrease in their own patterns as age increases. With strong content validity, strong construct validity and high correlation coefficients in the split-half reliability, this test could be used as a reliable measurement for the Cantonese-speaking population in Hong Kong.
Design and development of food safety knowledge and attitude scales for consumer food safety education.

PubMed

Medeiros, Lydia C; Hillers, Virginia N; Chen, Gang; Bergmann, Verna; Kendall, Patricia; Schroeder, Mary

2004-11-01

The objective of this study was to design and develop food safety knowledge and attitude scales based on food-handling guidelines developed by a national panel of food safety experts. Knowledge (n=43) and attitude (n=49) questions were developed and pilot-tested with a variety of consumer groups. Final questions were selected based on item analysis and on validity and reliability statistical tests. Knowledge questions were tested in Washington State with participants in low-income nutrition education programs (pretest/posttest n=58, test/retest n=19) and college students (pretest/posttest n=34). Attitude questions were tested in Ohio with nutrition education program participants (n=30) and college students (non-nutrition majors n=138, nutrition majors n=57). Item analysis, paired sample t tests, Pearson's correlation coefficients, and Cronbach's alpha were used. Reliability and validity tests of individual items and the question sets were used to reduce the scales to 18 knowledge questions and 10 attitude questions. The knowledge and attitude scales covered topics ranked as important by a national panel of experts and met most validity and reliability standards. The 18-item knowledge questionnaire had instructional sensitivity (mean score increase of more than three points after instruction), internal reliability (Cronbach's alpha >.75), and produced similar results in test-retest without intervention (coefficient of stability=.81). Knowledge of correct procedures for hand washing and avoiding cross-contamination was widespread before instruction. Knowledge was limited regarding avoiding food preparation while ill, cooking hamburgers, high-risk foods, and whether cooked rice and potatoes could be stored at room temperature. The 10-item attitude scale had an appropriate range of responses (item difficulty) and produced similar results in test-retest ( P
Executive Functioning in Three Groups of Pupils in D-KEFSs: Selected Issues in Adapting the Test Battery for Slovakia

ERIC Educational Resources Information Center

Ferjencík, Ján; Slavkovská, Miriam; Kresila, Juraj

2015-01-01

The paper reports on the adaptation of a D-KEFS test battery for Slovakia. Drawing on concrete examples, it describes and illustrates the key issues relating to the transfer of test items from one socio-cultural environment to another. The standardisation sample of the population of Slovak pupils in the fourth year of primary school included 250…
Normative data for the Rappel libre/Rappel indicé à 16 items (16-item Free and Cued Recall) in the elderly Quebec-French population.

PubMed

Dion, Mélissa; Potvin, Olivier; Belleville, Sylvie; Ferland, Guylaine; Renaud, Mélanie; Bherer, Louis; Joubert, Sven; Vallet, Guillaume T; Simard, Martine; Rouleau, Isabelle; Lecomte, Sarah; Macoir, Joël; Hudon, Carol

2015-01-01

Performance on verbal memory tests is generally associated with socio-demographic variables such as age, sex, and education level. Performance also varies between different cultural groups. The present study aimed to establish normative data for the Rappel libre/Rappel indicé à 16 items (16-item Free and Cued Recall; RL/RI-16), a French adaptation of the Free and Cued Selective Reminding Test (Buschke, 1984; Grober, Buschke, Crystal, Bang, & Dresner, 1988). The sample consisted of 566 healthy French-speaking older adults (50-88 years old) from the province of Quebec, Canada. Normative data for the RL/RI-16 were derived from 80% of the total sample (normative sample) and cross-validated using the remaining participants (20%; validation sample). The effects of participants' age, sex, and education level were assessed on different indices of memory performance. Results indicated that these variables were independently associated with performance. Normative data are presented as regression equations with standard deviations (symmetric distributions) and percentiles (asymmetric distributions).
Construction of an efficient evaluative instrument for myasthenia gravis: the MG composite.

PubMed

Burns, Ted M; Conaway, Mark R; Cutter, Gary R; Sanders, Donald B

2008-12-01

We assessed the performance of items from the Quantitative Myasthenia Gravis (QMG), MMT (Manual Muscle Test), and MG-ADL (Myasthenia Gravis - Activities of Daily Living) scales, using data from two recently completed treatment trials of generalized MG. Items were selected that were relevant to manifestations of MG, meaningful to both the physician and the patient, and responsive to clinical change. After the 10 items were chosen, they were weighted based on input from MG experts from around the world, considering factors such as quality of life, disease severity, risk, prognosis, validity, and reliability. The MG Composite is easy to administer, takes less than 5 minutes to complete, and requires no equipment. Weighting of the response options of the 10 items should result in ordinal scores that better represent MG status and are more responsive to meaningful clinical change. To better determine its suitability for clinical use and for treatment trials, the MG Composite will be tested prospectively at several academic medical centers and will be used as a secondary measure of efficacy in pending clinical trials of MG.
Visual search by chimpanzees (Pan): assessment of controlling relations.

PubMed Central

Tomonaga, M

1995-01-01

Three experimentally sophisticated chimpanzees (Pan), Akira, Chloe, and Ai, were trained on visual search performance using a modified multiple-alternative matching-to-sample task in which a sample stimulus was followed by the search display containing one target identical to the sample and several uniform distractors (i.e., negative comparison stimuli were identical to each other). After they acquired this task, they were tested for transfer of visual search performance to trials in which the sample was not followed by the uniform search display (odd-item search). Akira showed positive transfer of visual search performance to odd-item search even when the display size (the number of stimulus items in the search display) was small, whereas Chloe and Ai showed a transfer only when the display size was large. Chloe and Ai used some nonrelational cues such as perceptual isolation of the target among uniform distractors (so-called pop-out). In addition to the odd-item search test, various types of probe trials were presented to clarify the controlling relations in multiple-alternative matching to sample. Akira showed a decrement of accuracy as a function of the display size when the search display was nonuniform (i.e., each "distractor" stimulus was not the same), whereas Chloe and Ai showed perfect performance. Furthermore, when the sample was identical to the uniform distractors in the search display, Chloe and Ai never selected an odd-item target, but Akira selected it when the display size was large. These results indicated that Akira's behavior was controlled mainly by relational cues of target-distractor oddity, whereas an identity relation between the sample and the target strongly controlled the performance of Chloe and Ai. PMID:7714449
Development and Validation of the Spanish Numeracy Understanding in Medicine Instrument.

PubMed

Jacobs, Elizabeth A; Walker, Cindy M; Miller, Tamara; Fletcher, Kathlyn E; Ganschow, Pamela S; Imbert, Diana; O'Connell, Maria; Neuner, Joan M; Schapira, Marilyn M

2016-11-01

The Spanish-speaking population in the U.S. is large and growing and is known to have lower health literacy than the English-speaking population. Less is known about the health numeracy of this population due to a lack of health numeracy measures in Spanish. we aimed to develop and validate a short and easy to use measure of health numeracy for Spanish-speaking adults: the Spanish Numeracy Understanding in Medicine Instrument (Spanish-NUMi). Items were generated based on qualitative studies in English- and Spanish-speaking adults and translated into Spanish using a group translation and consensus process. Candidate items for the Spanish NUMi were selected from an eight-item validated English Short NUMi. Differential Item Functioning (DIF) was conducted to evaluate equivalence between English and Spanish items. Cronbach's alpha was computed as a measure of reliability and a Pearson's correlation was used to evaluate the association between test scores and the Spanish Test of Functional Health Literacy (S-TOFHLA) and education level. Two-hundred and thirty-two Spanish-speaking Chicago residents were included in the study. The study population was diverse in age, gender, and level of education and 70 % reported Mexico as their country of origin. Two items of the English eight-item Short NUMi demonstrated DIF and were dropped. The resulting six-item test had a Cronbach's alpha of 0.72, a range of difficulty using classical test statistics (percent correct: 0.48 to 0.86), and adequate discrimination (item-total score correlation: 0.34-0.49). Scores were positively correlated with print literacy as measured by the S- TOFHLA (r = 0.67; p < 0.001) and varied as predicted across grade level; mean scores for up to eighth grade, ninth through twelfth grade, and some college experience or more, respectively, were 2.48 (SD ± 1.64), 4.15 (SD ± 1.45), and 4.82 (SD ± 0.37). The Spanish NUMi is a reliable and valid measure of important numerical concepts used in communicating health information.
Work Functioning Among Firefighters: A Comparison Between Self-Reported Limitations and Functional Task Performance.

PubMed

MacDermid, Joy C; Tang, Kenneth; Sinden, Kathryn E; D'Amico, Robert

2018-05-25

Purpose Performance-based and disease indicators have been widely studied in firefighters; self-reported work role limitations have not. The aim of this study was to describe the distributions and correlations of a generic self-reported Work Limitations Questionnaire (WLQ-26) and firefighting-specific task performance-based tests. Methods Active firefighters from the City of Hamilton Fire Services (n = 293) were recruited. Participants completed the WLQ-26 to quantify on-the-job difficulties over five work domains: work scheduling (4 items), output demands (7 items), physical demands (8 items), mental demands (4 items), and social demands (3 items). A subset of participants (n = 149) were also assessed on hose drag and stair climb with a high-rise pack performance-based tests. Descriptive statistics and correlations were used to compare item/subscale performance; and to describe the inter-relationships between tests. Results The mean WLQ-26 item scores (/5) ranged from 4.1 to 4.4 (median = 5 for all items); most firefighters (54.5-80.5%) selected "difficult none of the time" response option on all items. A substantial ceiling effect was observed across all five WLQ-26 subscales as 44.0-55.6% were in the highest category. Subscale means ranged from 61.8 (social demands) to 78.7 (output demands and physical demands). Internal consistency exceeded 0.90 on all subscales. For the hose drag task, the mean time-to-completion was 48.0 s (SD = 14.5; range 20.4-95.0). For the stair climb task, the mean time-to-completion was 76.7 s (SD = 37.2; range 21.0-218.0). There were no significant correlations between self-report work limitations and performance of firefighting tasks. Conclusions The WLQ-26 measured five domains, but had ceiling effects in firefighters. Performance-based testing showed wider score range, lacked ceiling effects and did not correlate to the WLQ-26. A firefighter-specific, self-report role functioning scale may be needed to identify compromised work role capabilities in firefighters.

How to bet on a memory: developmental linkages between subjective recollection and decision making.

PubMed

Hembacher, Emily; Ghetti, Simona

2013-07-01

The current study investigated the development of subjective recollection and its role in supporting decisions in 6- and 7-year-olds, 9- and 10-year-olds, and adults (N=78). Participants encoded items and details about them. Later, they were asked to recognize the items, recall the details, and report on subjective feelings of recollection and familiarity for test items. Critically, they were required to select a subset of trials to be evaluated for the possibility of a reward. All age groups were more likely to report subjective recollection when they accurately recalled details, demonstrating an ability to introspect on subtle differences in subjective memory states, although 6- and 7-year-olds could do so reliably only for color details. However, only 9- and 10-year-olds and adults were more likely to select trials that were associated with subjective recollection, suggesting that a connection between this subjective experience and decision making emerges later during middle childhood. Copyright © 2013 Elsevier Inc. All rights reserved.
Episodic foresight beyond the very next event in 3- and 4-year-old children.

PubMed

Boden, Hannah; Labuschagne, Lisa G; Hinten, Ashley E; Scarf, Damian

2017-11-01

Testing episodic foresight in children generally involves presenting them with a problem in one location (e.g., Room A) and, after a spending a delay in a different location, telling them they will be returning to Room A. Before they go, children are presented with a number of items, one of which will allow them to solve the problem in Room A. At around 3 to 4 years of age children display episodic foresight, selecting the item that will allow them to solve the problem. To date, however, no study has assessed whether 3- and 4-year-old children can plan beyond the very next event, selecting the correct item when there is a delay before returning to Room A. Here, we show that 3- and 4-year-old children can pass when a delay is imposed but that their performance is significantly worse than when they are planning for an immediate event. © 2017 Wiley Periodicals, Inc.
Validation of the Erlangen Test of Activities of Daily Living in Persons with Mild Dementia or Mild Cognitive Impairment (ETAM).

PubMed

Luttenberger, Katharina; Reppermund, Simone; Schmiedeberg-Sohn, Anke; Book, Stephanie; Graessel, Elmar

2016-05-26

There are currently no valid, fast, and easy-to-administer performance tests that are designed to assess the capacities to perform activities of daily living in persons with mild dementia and mild cognitive impairment (MCI). However, such measures are urgently needed for determining individual support needs as well as the efficacy of interventions. The aim of the present study was therefore to validate the Erlangen Test of Activities of Daily Living in Persons with Mild Dementia and Mild Cognitive Impairment (ETAM), a performance test that is based on the International Classification of Functioning and Health (ICF), which assesses the relevant domains of living in older adults with MCI and mild dementia who live independently. The 10 ICF-based items on the research version of the ETAM were tested in a final sample of 81 persons with MCI or mild dementia. The items were selected for the final version in accordance with 6 criteria: 1) all domains must be represented and have equal weight, 2) all items must load on the same factor, 3) item difficulties and item discriminatory powers, 4) convergent validity (Bayer Activities of Daily Living Scale [B-ADL]) and discriminant validity (Mini Mental State Examination [MMSE], Geriatric Depression Scale 15 [GDS-15]), 5) inter-rater reliabilities of the individual items, 6) as little material as possible. Retest reliability was also examined. Cohen's ds were calculated to determine the magnitudes of the differences in ETAM scores between participants diagnosed with different grades of severity of cognitive impairment. The final version of the ETAM consists of 6 items that cover the five ICF domains communication, mobility, self-care, domestic life (assessed by two 3-point items), and major life areas (specifically, the economic life sub-category) and load on a single factor. The maximum achievable score is 30 points (6 points per domain). The average administration time was 35 min, 19 of which were needed for pure item performance. The internal consistency was α = .71. The three-week test-retest reliability was r = .78, and the inter-rater reliability was r = .97. The ETAM also provided satisfactory discrimination between healthy individuals and persons with MCI or mild dementia as well as between persons with mild and moderate dementia. The 6-item final version of the ETAM shows satisfactory psychometric characteristics and can be administered quickly. It is therefore suitable for use in both clinical practice and research.
The COPD-SIB: a newly developed disease-specific item bank to measure health-related quality of life in patients with chronic obstructive pulmonary disease.

PubMed

Paap, Muirne C S; Lenferink, Lonneke I M; Herzog, Nadine; Kroeze, Karel A; van der Palen, Job

2016-06-27

Health-related quality of life (HRQoL) is widely used as an outcome measure in the evaluation of treatment interventions in patients with chronic obstructive pulmonary disease (COPD). In order to address challenges associated with existing fixed-length measures (e.g., too long to be used routinely, too short to ensure both content validity and reliability), a COPD-specific item bank (COPD-SIB) was developed. Items were selected based on literature review and interviews with Dutch COPD patients, with a strong focus on both content validity and item comprehension. The psychometric quality of the item bank was evaluated using Mokken Scale Analysis and parametric Item Response Theory, using data of 666 COPD patients. The final item bank contains 46 items that form a strong scale, tapping into eight important themes that were identified based on literature review and patient interviews: Coping with disease/symptoms, adaptability; Autonomy; Anxiety about the course/end-state of the disease, hopelessness; Positive psychological functioning; Situations triggering or enhancing breathing problems; Symptoms; Activity; Impact. The 46-item COPD-SIB has good psychometric properties and content validity. Items are available in Dutch and English. The COPD-SIB can be used as a stand-alone instrument, or to inform computerised adaptive testing.
Preliminary psychometric testing of the Fox Simple Quality-of-Life Scale.

PubMed

Fox, Sherry

2004-06-01

Although quality of life is extensively defined as subjective and multidimensional with both affective and cognitive components, few instruments capture important dimensions of the construct, and few are both conceptually congruent and user friendly for the clinical setting. The aim of this study was to develop and test a measure that would be easy to use clinically and capture both cognitive and affective components of quality of life. Initial item sources for the Fox Simple Quality-of-Life Scale (FSQOLS) were literature-based. Thirty items were compiled for content validity assessment by a panel of expert healthcare clinicians from various disciplines, predominantly nursing. Five items were removed as a result of the review because they reflected negatively worded or redundant items. The 25-item scale was mailed to 177 people with lung, colon, and ovarian cancer in various stages. Cancer types were selected theoretically, based on similarity in prognosis, degree of symptom burden, and possible meaning and experience. Of the 145 participants, all provided complete data on the FSQOLS. Psychometric evaluation of the FSQOLS included item-total correlations, principal components analysis with varimax rotation revealing two factors explaining 50% variance, reliability estimation using alpha estimates, and item-factor correlations. The FSQOLS exhibited significant convergent validity with four popular quality-of-life instruments: the Ferrans and Powers Quality of Life Index, the Functional Assessment of Cancer Therapy Scale, the Short-Form-36 Health Survey, and the General Well-Being Scale. Content validity of the scale was explored and supported using qualitative interviews of 14 participants with lung, colon and ovarian cancer, who were a subgroup of the sample for the initial instrument testing.
Distinct Effects of Lexical and Semantic Competition during Picture Naming in Younger Adults, Older Adults, and People with Aphasia

PubMed Central

Britt, Allison E.; Ferrara, Casey; Mirman, Daniel

2016-01-01

Producing a word requires selecting among a set of similar alternatives. When many semantically related items become activated, the difficulty of the selection process is increased. Experiment 1 tested naming of items with either multiple synonymous labels (“Alternate Names,” e.g., gift/present) or closely semantically related but non-equivalent responses (“Near Semantic Neighbors,” e.g., jam/jelly). Picture naming was fastest and most accurate for pictures with only one label (“High Name Agreement”), slower and less accurate in the Alternate Names condition, and slowest and least accurate in the Near Semantic Neighbors condition. These results suggest that selection mechanisms in picture naming operate at two distinct levels of processing: selecting between similar but non-equivalent names requires two selection processes (semantic and lexical), whereas selecting among equivalent names only requires one selection at the lexical level. Experiment 2 examined how these selection mechanisms are affected by normal aging and found that older adults had significantly more difficulty in the Near Semantic Neighbors condition, but not in the Alternate Names condition. This suggests that aging affects semantic processing and selection more strongly than it affects lexical selection. Experiment 3 examined the role of the left inferior frontal gyrus (LIFG) in these selection processes by testing individuals with aphasia secondary to stroke lesions that either affected the LIFG or spared it. Surprisingly, there was no interaction between condition and lesion group: the presence of LIFG damage was not associated with substantively worse naming performance for pictures with multiple acceptable labels. These results are not consistent with a simple view of LIFG as the locus of lexical selection and suggest a more nuanced view of the neural basis of lexical and semantic selection. PMID:27458393
Using Systematic Item Selection Methods to Improve Universal Design of Assessments. Policy Directions. Number 18

ERIC Educational Resources Information Center

Johnstone, Christopher; Thurlow, Martha; Moore, Michael; Altman, Jason

2006-01-01

The No Child Left Behind Act of 2001 (NCLB) and other recent changes in federal legislation have placed greater emphasis on accountability in large-scale testing. Included in this emphasis are regulations that require assessments to be accessible. States are accountable for the success of all students, and tests should be designed in a way that…
Survey Response-Related Biases in Contingent Valuation: Concepts, Remedies, and Empirical Application to Valuing Aquatic Plant Management

Treesearch

Mark L. Messonnier; John C. Bergstrom; Chrisopher M. Cornwell; R. Jeff Teasley; H. Ken Cordell

2000-01-01

Simple nonresponse and selection biases that may occur in survey research such as contingent valuation applications are discussed and tested. Correction mechanisms for these types of biases are demonstrated. Results indicate the importance of testing and correcting for unit and item nonresponse bias in contingent valuation survey data. When sample nonresponse and...
Ballistic Missile Defense System (BMDS)

DTIC Science & Technology

2015-12-01

Assessment and Program Evaluation CARD - Cost Analysis Requirements Description CDD - Capability Development Document CLIN - Contract Line Item Number CPD...Estimate RDT&E - Research, Development, Test, and Evaluation SAR - Selected Acquisition Report SCP - Service Cost Position TBD - To Be Determined TY - Then...BMDS December 2015 SAR March 23, 2016 16:29:09 UNCLASSIFIED 5 Mission and Description Mission and Description To develop, test, and field a layered
A New Tool for Nutrition App Quality Evaluation (AQEL): Development, Validation, and Reliability Testing

PubMed Central

Huang, Wenhao; Chapman-Novakofski, Karen M

2017-01-01

Background The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. Objective The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps’ educational quality and technical functionality. Methods Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Results Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Conclusions Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps’ qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. PMID:29079554
Can We Retrieve the Information Which Was Intentionally Forgotten? Electrophysiological Correlates of Strategic Retrieval in Directed Forgetting.

PubMed

Mao, Xinrui; Tian, Mengxi; Liu, Yi; Li, Bingcan; Jin, Yan; Wu, Yanhong; Guo, Chunyan

2017-01-01

Retrieval inhibition hypothesis of directed forgetting effects assumed TBF (to-be-forgotten) items were not retrieved intentionally, while selective rehearsal hypothesis assumed the memory representation of retrieved TBF (to-be-forgotten) items was weaker than TBR (to-be-remembered) items. Previous studies indicated that directed forgetting effects of item-cueing method resulted from selective rehearsal at encoding, but the mechanism of retrieval inhibition that affected directed forgetting of TBF (to-be-forgotten) items was not clear. Strategic retrieval is a control process allowing the selective retrieval of target information, which includes retrieval orientation and strategic recollection. Retrieval orientation via the comparison of tasks refers to the specific form of processing resulted by retrieval efforts. Strategic recollection is the type of strategies to recollect studied items for the retrieval success of targets. Using a "directed forgetting" paradigm combined with a memory exclusion task, our investigation of strategic retrieval in directed forgetting assisted to explore how retrieval inhibition played a role on directed forgetting effects. When TBF items were targeted, retrieval orientation showed more positive ERPs to new items, indicating that TBF items demanded more retrieval efforts. The results of strategic recollection indicated that: (a) when TBR items were retrieval targets, late parietal old/new effects were only evoked by TBR items but not TBF items, indicating the retrieval inhibition of TBF items; (b) when TBF items were retrieval targets, the late parietal old/new effect were evoked by both TBR items and TBF items, indicating that strategic retrieval could overcome retrieval inhibition of TBF items. These findings suggested the modulation of strategic retrieval on retrieval inhibition of directed forgetting, supporting that directed forgetting effects were not only caused by selective rehearsal, but also retrieval inhibition.
Can We Retrieve the Information Which Was Intentionally Forgotten? Electrophysiological Correlates of Strategic Retrieval in Directed Forgetting

PubMed Central

Mao, Xinrui; Tian, Mengxi; Liu, Yi; Li, Bingcan; Jin, Yan; Wu, Yanhong; Guo, Chunyan

2017-01-01

Retrieval inhibition hypothesis of directed forgetting effects assumed TBF (to-be-forgotten) items were not retrieved intentionally, while selective rehearsal hypothesis assumed the memory representation of retrieved TBF (to-be-forgotten) items was weaker than TBR (to-be-remembered) items. Previous studies indicated that directed forgetting effects of item-cueing method resulted from selective rehearsal at encoding, but the mechanism of retrieval inhibition that affected directed forgetting of TBF (to-be-forgotten) items was not clear. Strategic retrieval is a control process allowing the selective retrieval of target information, which includes retrieval orientation and strategic recollection. Retrieval orientation via the comparison of tasks refers to the specific form of processing resulted by retrieval efforts. Strategic recollection is the type of strategies to recollect studied items for the retrieval success of targets. Using a “directed forgetting” paradigm combined with a memory exclusion task, our investigation of strategic retrieval in directed forgetting assisted to explore how retrieval inhibition played a role on directed forgetting effects. When TBF items were targeted, retrieval orientation showed more positive ERPs to new items, indicating that TBF items demanded more retrieval efforts. The results of strategic recollection indicated that: (a) when TBR items were retrieval targets, late parietal old/new effects were only evoked by TBR items but not TBF items, indicating the retrieval inhibition of TBF items; (b) when TBF items were retrieval targets, the late parietal old/new effect were evoked by both TBR items and TBF items, indicating that strategic retrieval could overcome retrieval inhibition of TBF items. These findings suggested the modulation of strategic retrieval on retrieval inhibition of directed forgetting, supporting that directed forgetting effects were not only caused by selective rehearsal, but also retrieval inhibition. PMID:28900411
Concept development of X-ray mass thickness detection for irradiated items upon electron beam irradiation processing

NASA Astrophysics Data System (ADS)

Qin, Huaili; Yang, Guang; Kuang, Shan; Wang, Qiang; Liu, Jingjing; Zhang, Xiaomin; Li, Cancan; Han, Zhiwei; Li, Yuanjing

2018-02-01

The present project will adopt the principle and technology of X-ray imaging to quickly measure the mass thickness (wherein the mass thickness of the item =density of the item × thickness of the item) of the irradiated items and thus to determine whether the packaging size and inside location of the item will meet the requirements for treating thickness upon electron beam irradiation processing. The development of algorithm of X-ray mass thickness detector as well as the prediction of dose distribution have been completed. The development of the algorithm was based on the X-ray attenuation. 4 standard modules, Al sheet, Al ladders, PMMA sheet and PMMA ladders, were selected for the algorithm development. The algorithm was optimized until the error between tested mass thickness and standard mass thickness was less than 5%. Dose distribution of all energy (1-10 MeV) for each mass thickness was obtained using Monte-carlo method and used for the analysis of dose distribution, which provides the information of whether the item will be penetrated or not, as well as the Max. dose, Min. dose and DUR of the whole item.
The ELPAT living organ donor Psychosocial Assessment Tool (EPAT): from 'what' to 'how' of psychosocial screening - a pilot study.

PubMed

Massey, Emma K; Timmerman, Lotte; Ismail, Sohal Y; Duerinckx, Nathalie; Lopes, Alice; Maple, Hannah; Mega, Inês; Papachristou, Christina; Dobbels, Fabienne

2018-01-01

Thorough psychosocial screening of donor candidates is required in order to minimize potential negative consequences and to strive for optimal safety within living donation programmes. We aimed to develop an evidence-based tool to standardize the psychosocial screening process. Key concepts of psychosocial screening were used to structure our tool: motivation and decision-making, personal resources, psychopathology, social resources, ethical and legal factors and information and risk processing. We (i) discussed how each item per concept could be measured, (ii) reviewed and rated available validated tools, (iii) where necessary developed new items, (iv) assessed content validity and (v) pilot-tested the new items. The resulting ELPAT living organ donor Psychosocial Assessment Tool (EPAT) consists of a selection of validated questionnaires (28 items in total), a semi-structured interview (43 questions) and a Red Flag Checklist. We outline optimal procedures and conditions for implementing this tool. The EPAT and user manual are available from the authors. Use of this tool will standardize the psychosocial screening procedure ensuring that no psychosocial issues are overlooked and ensure that comparable selection criteria are used and facilitate generation of comparable psychosocial data on living donor candidates. © 2017 Steunstichting ESOT.
Dimensional analyses of frontal posed smile attractiveness in Japanese female patients.

PubMed

Hata, Kyoko; Arai, Kazuhito

2016-01-01

To identify appropriate dimensional items in objective diagnostic analysis for attractiveness of frontal posed smile in Japanese female patients by comparing with the result of human judgments. Photographs of frontal posed smiles of 100 Japanese females after orthodontic treatment were evaluated by 20 dental students (10 males and 10 females) using a visual analogue scale (VAS). The photographs were ranked based on the VAS evaluations and the 25 photographs with the highest evaluations were selected as group A, and the 25 photos with the lowest evaluations were designated group B. Then 12 dimensional items of objective analysis selected from a literature review were measured. Means and standard deviations for measurements of the dimensional items were compared between the groups using the unpaired t-test with a significance level of P < .05. Mean values were significantly smaller in group A than in group B for interlabial gap, intervermilion distance, maxillary gingival display, maximum incisor exposure, and lower lip to incisor (P < .05). Significant differences were observed only in the vertical dimension, not in the transverse dimension. Five of the 12 objective diagnostic items were correlated with human judgments of the attractiveness of frontal posed smile in Japanese females after orthodontic treatment.
Minimum Sample Size Requirements for Mokken Scale Analysis

ERIC Educational Resources Information Center

Straat, J. Hendrik; van der Ark, L. Andries; Sijtsma, Klaas

2014-01-01

An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken's original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum…
Head and neck cancer-specific quality of life: instrument validation.

PubMed

Terrell, J E; Nanavati, K A; Esclamado, R M; Bishop, J K; Bradford, C R; Wolf, G T

1997-10-01

The disfigurement and dysfunction associated with head and neck cancer affect emotional well-being and some of the most basic functions of life. Most cancer-specific quality-of-life assessments give a single composite score for head and neck cancer-related quality of life. To develop and evaluate an improved multidimensional instrument to assess head and neck cancer-related functional status and well-being. The item selection process included literature review, interviews with health care workers, and patient surveys. A survey with 37 disease-specific questions and the SF-12 survey were administered to 253 patients in 3 large medical centers. Factor analysis was performed to identify disease-specific domains. Domain scores were calculated as the standardized score of the component items. These domains were assessed for construct validity based on clinical hypotheses and test-retest reliability. Four relevant domains were identified: Eating (6 items), Communication (4 items), Pain (4 items), and Emotion (6 items). Each had an internal consistency (Cronbach alpha value) of greater than 0.80. Construct validity was demonstrated by moderate correlations with the SF-12 Physical and Mental component scores (r=0.43-0.60). Test-retest reliability for each domain demonstrated strong reliability between the 2 time points. Correlations were strong for each individual question, ranging from 0.53 to 0.93. Construct validity testing demonstrated that the direction of differences for each domain were as hypothesized. The Head and Neck Quality of Life questionnaire is a promising multidimensional tool with which to assess head and neck cancer-specific quality of life.
Asymmetric effects of emotion on mnemonic interference

PubMed Central

Leal, Stephanie L.; Tighe, Sarah K.; Yassa, Michael A.

2014-01-01

Emotional experiences can strengthen memories so that they can be used to guide future behavior. Emotional arousal, mediated by the amygdala, is thought to modulate storage by the hippocampus, which may encode unique episodic memories via pattern separation – the process by which similar memories are stored using non-overlapping representations. While prior work has examined mnemonic interference due to similarity and emotional modulation of memory independently, examining the mechanisms by which emotion influences mnemonic interference has not been previously accomplished in humans. To this end, we developed an emotional memory task where emotional content and stimulus similarity were varied to examine the effect of emotion on fine mnemonic discrimination (a putative behavioral correlate of hippocampal pattern separation). When tested immediately after encoding, discrimination was reduced for similar emotional items compared to similar neutral items, consistent with a reduced bias towards pattern separation. After 24 h, recognition of emotional target items was preserved compared to neutral items, whereas similar emotional item discrimination was further diminished. This suggests a potential mechanism for the emotional modulation of memory with a selective remembering of gist, as well as a selective forgetting of detail, indicating an emotion-induced reduction in pattern separation. This can potentially increase the effective signal-to-noise ratio in any given situation to promote survival. Furthermore, we found that individuals with depressive symptoms hyper-discriminate negative items, which correlated with their symptom severity. This suggests that utilizing mnemonic discrimination paradigms allows us to tease apart the nuances of disorders with aberrant emotional mnemonic processing. PMID:24607286
Inductive Selectivity in Children’s Cross-classified Concepts

PubMed Central

Nguyen, Simone P.

2012-01-01

Cross-classified items pose an interesting challenge to children’s induction since these items belong to many different categories, each of which may serve as a basis for a different type of inference. Inductive selectivity is the ability to appropriately make different types of inferences about a single cross-classifiable item based on its different category memberships. This research includes five experiments that examine the development of inductive selectivity in 3-, 4-, and 5-year-olds (N = 272). Overall, the results show that by age 4 years, children have inductive selectivity with taxonomic and script categories. That is, children use taxonomic categories to make biochemical inferences about an item whereas children use script categories to make situational inferences about an item. PMID:22803510
Development of the Facial Skin Care Index: A Health-Related Outcomes Index for Skin Cancer Patients

PubMed Central

Matthews, B. Alex; Rhee, John S.; Neuburg, Marcy; Burzynski, Mary L.; Nattinger, Ann B.

2006-01-01

BACKGROUND Existing health-related quality-of-life (HRQOL) tools do not appear to capture patients' specific skin cancer concerns. OBJECTIVE To describe the conceptual foundation, item generation, reduction process, and reliability testing for the Facial Skin Cancer Index (FSCI), a HRQOL outcomes tool for skin cancer researchers and clinicians. METHODS Participants in Phases I to III consisted of adult patients (N = 134) diagnosed with biopsy-proven nonmelanoma cervicofacial skin cancer. Data were collected via self-report surveys and clinical records. RESULTS Seventy-one distinct items were generated in Phase I and rated for their importance by an independent sample during Phase II; 36 items representing six theoretical HRQOL domains were retained. Test–retest I results indicated that four subscales showed adequate reliability coefficients (α = 0.60 to 0.91). Twenty-six items remained for test–retest II. Results indicated excellent internal consistency for emotional, social, appearance, and modified financial/work subscales (range 0.79 to 0.95); test–retest correlation coefficients were consistent across time (range 0.81 to 0.97; lifestyle omitted). CONCLUSION Pretesting afforded the opportunity to select items that optimally met our a priori conceptual and psychometric criteria for high data quality. Phase IV testing (validity and sensitivity before surgery and 4 months after Mohs micrographic surgery) for the 20-item FSCI is under way. PMID:16875475

Analysis of single items on the Self-Esteem and Relationship questionnaire in men treated with sildenafil citrate for erectile dysfunction: results of two double-blind, placebo-controlled trials.

PubMed

Cappelleri, Joseph C; Althof, Stanley E; O'Leary, Michael P; Tseng, Li-Jung

2008-04-01

To evaluate the effect of sildenafil citrate on each item of the 14-item Self-Esteem And Relationship (SEAR) questionnaire, which is used to measure self-esteem, confidence, satisfaction with sexual relationship, and overall relationship satisfaction in men with erectile dysfunction (ED). Data were combined from two 12-week, double-blind, placebo-controlled, flexible-dose sildenafil trials having identical protocols, one conducted in the USA and the other in Mexico, Brazil, Australia and Japan. All men had ED and were aged >or=18 years. Response categories of each SEAR item used a 4-week reference period and were based on a five-point scale (1, almost never/never; 2, a few times; 3, sometimes; 4, most times; 5, almost always/always). The difference (sildenafil vs placebo) in the change from baseline to week 12 was evaluated with a Wilcoxon rank sum test using ridit analysis, and an analysis of covariance model that included treatment group, centre, study and baseline item score. Compared with the 274 patients receiving placebo, the 279 receiving sildenafil reported significantly greater mean and median improvements (P < 0.001) in each of the 14 SEAR items. The probability of increased psychosocial benefit from baseline to week 12 was higher with sildenafil for each SEAR item, and ranged from 0.60 ('My partner was unhappy with the quality of our sexual relations'[item reverse-scored]) to 0.72 ('I was satisfied with my sexual performance'). Across all items, the mean (sd) probability was 0.67 (0.04) that a randomly selected patient in the sildenafil group would have a more favourable change relative to a randomly selected patient in the placebo group. Sildenafil produced substantial and meaningful improvements at the item-specific level. This analysis complements previously published work on self-esteem, confidence and relationship satisfaction.
Testing and Selection of Fire-Resistant Materials for Spacecraft Use

NASA Technical Reports Server (NTRS)

Friedman, Robert; Jackson, Brian; Olson, Sandra

2000-01-01

Spacecraft fire-safety strategy emphasizes prevention, mostly through the selection of onboard items classified accord- ing to their fire resistance. The principal NASA acceptance tests described in this paper assess the flammability of materials and components under "worst-case" normal-gravity conditions of upward flame spread in controlled-oxygen atmospheres. Tests conducted on the ground, however, cannot duplicate the unique fire characteristics in the nonbuoyant low-gravity environment of orbiting spacecraft. Research shows that flammability an fire-spread rates in low gravity are sensitive to forced convection (ventilation flows) and atmospheric-oxygen concentration. These research results are helping to define new material-screening test methods that will better evaluate material performance in spacecraft.
Promising Areas for Psychometric Research.

ERIC Educational Resources Information Center

Angoff, William H.

1988-01-01

An overview of four papers on useful future directions for psychometric research is provided. The papers were drawn from American Psychological Association symposia; they cover the nature of general intelligence, item bias and selection, cut scores, equating problems, computer-adaptive testing, and individual and group achievement measurement.…
Developing a situational judgment test blueprint for assessing the non-cognitive skills of applicants to the University of Utah School of Medicine, the United States

PubMed Central

2015-01-01

Purpose: The situational judgment test (SJT) shows promise for assessing the non-cognitive skills of medical school applicants, but has only been used in Europe. Since the admissions processes and education levels of applicants to medical school are different in the United States and in Europe, it is necessary to obtain validity evidence of the SJT based on a sample of United States applicants. Methods: Ninety SJT items were developed and Kane’s validity framework was used to create a test blueprint. A total of 489 applicants selected for assessment/interview day at the University of Utah School of Medicine during the 2014-2015 admissions cycle completed one of five SJTs, which assessed professionalism, coping with pressure, communication, patient focus, and teamwork. Item difficulty, each item’s discrimination index, internal consistency, and the categorization of items by two experts were used to create the test blueprint. Results: The majority of item scores were within an acceptable range of difficulty, as measured by the difficulty index (0.50-0.85) and had fair to good discrimination. However, internal consistency was low for each domain, and 63% of items appeared to assess multiple domains. The concordance of categorization between the two educational experts ranged from 24% to 76% across the five domains. Conclusion: The results of this study will help medical school admissions departments determine how to begin constructing a SJT. Further testing with a more representative sample is needed to determine if the SJT is a useful assessment tool for measuring the non-cognitive skills of medical school applicants. PMID:26582629
Comparison between three option, four option and five option multiple choice question tests for quality parameters: A randomized study.

PubMed

Vegada, Bhavisha; Shukla, Apexa; Khilnani, Ajeetkumar; Charan, Jaykaran; Desai, Chetna

2016-01-01

Most of the academic teachers use four or five options per item of multiple choice question (MCQ) test as formative and summative assessment. Optimal number of options in MCQ item is a matter of considerable debate among academic teachers of various educational fields. There is a scarcity of the published literature regarding the optimum number of option in each item of MCQ in the field of medical education. To compare three options, four options, and five options MCQs test for the quality parameters - reliability, validity, item analysis, distracter analysis, and time analysis. Participants were 3 rd semester M.B.B.S. students. Students were divided randomly into three groups. Each group was given one set of MCQ test out of three options, four options, and five option randomly. Following the marking of the multiple choice tests, the participants' option selections were analyzed and comparisons were conducted of the mean marks, mean time, validity, reliability and facility value, discrimination index, point biserial value, distracter analysis of three different option formats. Students score more ( P = 0.000) and took less time ( P = 0.009) for the completion of three options as compared to four options and five options groups. Facility value was more ( P = 0.004) in three options group as compared to four and five options groups. There was no significant difference between three groups for the validity, reliability, and item discrimination. Nonfunctioning distracters were more in the four and five options group as compared to three option group. Assessment based on three option MCQs is can be preferred over four option and five option MCQs.
The Child-care Food and Activity Practices Questionnaire (CFAPQ): development and first validation steps.

PubMed

Gubbels, Jessica S; Sleddens, Ester Fc; Raaijmakers, Lieke Ch; Gies, Judith M; Kremers, Stef Pj

2016-08-01

To develop and validate a questionnaire to measure food-related and activity-related practices of child-care staff, based on existing, validated parenting practices questionnaires. A selection of items from the Comprehensive Feeding Practices Questionnaire (CFPQ) and the Preschooler Physical Activity Parenting Practices (PPAPP) questionnaire was made to include items most suitable for the child-care setting. The converted questionnaire was pre-tested among child-care staff during cognitive interviews and pilot-tested among a larger sample of child-care staff. Factor analyses with Varimax rotation and internal consistencies were used to examine the scales. Spearman correlations, t tests and ANOVA were used to examine associations between the scales and staff's background characteristics (e.g. years of experience, gender). Child-care centres in the Netherlands. The qualitative pre-test included ten child-care staff members. The quantitative pilot test included 178 child-care staff members. The new questionnaire, the Child-care Food and Activity Practices Questionnaire (CFAPQ), consists of sixty-three items (forty food-related and twenty-three activity-related items), divided over twelve scales (seven food-related and five activity-related scales). The CFAPQ scales are to a large extent similar to the original CFPQ and PPAPP scales. The CFAPQ scales show sufficient internal consistency with Cronbach's α ranging between 0·53 and 0·96, and average corrected item-total correlations within acceptable ranges (0·30-0·89). Several of the scales were significantly associated with child-care staff's background characteristics. Scale psychometrics of the CFAPQ indicate it is a valid questionnaire that assesses child-care staff's practices related to both food and activities.
Testing primary-school children's understanding of the nature of science.

PubMed

Koerber, Susanne; Osterhaus, Christopher; Sodian, Beate

2015-03-01

Understanding the nature of science (NOS) is a critical aspect of scientific reasoning, yet few studies have investigated its developmental beginnings and initial structure. One contributing reason is the lack of an adequate instrument. Two studies assessed NOS understanding among third graders using a multiple-select (MS) paper-and-pencil test. Study 1 investigated the validity of the MS test by presenting the items to 68 third graders (9-year-olds) and subsequently interviewing them on their underlying NOS conception of the items. All items were significantly related between formats, indicating that the test was valid. Study 2 applied the same instrument to a larger sample of 243 third graders, and their performance was compared to a multiple-choice (MC) version of the test. Although the MC format inflated the guessing probability, there was a significant relation between the two formats. In summary, the MS format was a valid method revealing third graders' NOS understanding, thereby representing an economical test instrument. A latent class analysis identified three groups of children with expertise in qualitatively different aspects of NOS, suggesting that there is not a single common starting point for the development of NOS understanding; instead, multiple developmental pathways may exist. © 2014 The British Psychological Society.
Limitations of the Neurological Evolutional Exam (ENE) as a motor assessment for first graders.

PubMed

Caçola, Priscila M; Bobbio, Tatiana G; Arias, Amabile V; Gonçalves, Vanda G; Gabbard, Carl

2010-01-01

many clinicians and researchers in Brazil consider the Neurological Developmental Exam (NDE), a valid and reliable assessment for Brazilian school-aged children. However, since its inception, several tests have emerged that, according to some researchers, provide more in-depth evaluation of motor ability and go beyond the detection of general motor status (soft neurological signs). to highlight the limitations of the NDE as a motor skill assessment for first graders. thirty-five children were compared on seven selected items of the NDE, seven of the Bruininks-Oseretsky Test (BOT), and seven of the Visual-Motor Integration test (VMI). Participants received a "pass" or "fail" score for each item, as prescribed by the respective test manual. chi-square and ANOVA results indicated that the vast majority of children (74%) passed the NDE items, whereas values for the other tests were 29% (BOT) and 20% (VMI). Analysis of specific categories (e.g. visual, fine, and gross motor coordination) revealed a similar outcome. our data suggest that while the NDE may be a valid and reliable test for the detection of general motor status, its use as a diagnostic/remedial tool for identifying motor ability is questionable. One of our recommendations is the consideration of a revised NDE in light of the current needs of clinicians and researchers.
Design and development of a meal system for the elderly. [public health - nutrition/diet

NASA Technical Reports Server (NTRS)

1975-01-01

Food preference surveys (taste tests) were performed for 95 food items (which were selected from an original list of 150 items), and 21 menus were developed from the survey results. Each menu contains an entree, two side dishes, dessert, and a beverage. Food manufacturing specifications for freeze dried foods, frozen foods, and beverages are examined, and product labeling and packaging requirements are discussed. The nutritional value of the various foods is listed in tabular form, and sample product labels are shown. Cost estimates per serving are also included.
Design Criteria for Controlling Stress Corrosion Cracking

NASA Technical Reports Server (NTRS)

Franklin, D. B.

1987-01-01

This document sets forth the criteria to be used in the selection of materials for space vehicles and associated equipment and facilities so that failure resulting from stress corrosion will be prevented. The requirements established herein apply to all metallic components proposed for use in space vehicles and other flight hardware, ground support equipment, and facilities for testing. These requirements are applicable not only to items designed and fabricated by MSFC (Marshall Space Flight Center) and its prime contractors, but also to items supplied to the prime contractor by subcontractors and vendors.
Children's Learning from Broadcast Television: The Relationship between the Amount of Time a Child Watches Television with and without Adults and That Child's Learning from Television.

ERIC Educational Resources Information Center

Storm, Susan Ruotsala

A study examined young children's learning from selected television program content in varied subject matter and the relationship between that learning and the amount of time a child watches television with and without adults. A 28-item learning test based on instructional design principles was developed from selected television segments and…
Contamination by ten harmful elements in toys and children's jewelry bought on the North American market.

PubMed

Guney, Mert; Zagury, Gerald J

2013-06-04

Toys and children's jewelry may contain metals to which children can be orally exposed. The objectives of this research were (1) to determine total concentrations (TC's) of As, Ba, Cd, Cr, Cu, Mn, Ni, Pb, Sb, and Se in toys and jewelry (n = 72) bought on the North American market and compare TC's to regulatory limits, and (2) to estimate oral metal bioavailability in selected items (n = 4) via bioaccessibility testing. For metallic toys and children's jewelry (n = 24) 20 items had TC's exceeding migratable concentration limits (European Union). Seven of seventeen jewelry items did not comply with TC limits in U.S. and Canadian regulations. Samples included articles with very high Cd (37% [w/w]), Pb (65%), and Cu (71%) concentrations. For plastic toys (n = 18), toys with paint or coating (n = 12), and brittle or pliable toys (n = 18), TC's were below the EU migration limits (except in one toy for each category). Bioaccessibility tests showed that a tested jewelry item strongly leached Pb (gastric: 698 μg, intestinal: 705 μg) and some Cd (1.38 and 1.42 μg). Especially in metallic toys and jewelry, contamination by Pb and Cd, and to a lesser extent by Cu, Ni, As, and Sb, still poses an acute problem in North America.
Development and validation of a questionnaire to evaluate patient satisfaction with diabetes disease management.

PubMed

Paddock, L E; Veloski, J; Chatterton, M L; Gevirtz, F O; Nash, D B

2000-07-01

To develop a reliable and valid questionnaire to measure patient satisfaction with diabetes disease management programs. Questions related to structure, process, and outcomes were categorized into 14 domains defining the essential elements of diabetes disease management. Health professionals confirmed the content validity. Face validity was established by a patient focus group. The questionnaire was mailed to 711 patients with diabetes who participated in a disease management program. To reduce the number of questionnaire items, a principal components analysis was performed using a varimax rotation. The Scree test was used to select significant components. To further assess reliability and validity; Cronbach's alpha and product-moment correlations were calculated for components having > or =3 items with loadings >0.50. The validated 73-item mailed satisfaction survey had a 34.1% response rate. Principal components analysis yielded 13 components with eigenvalues > 1.0. The Scree test proposed a 6-component solution (39 items), which explained 59% of the total variation. Internal consistency reliabilities computed for the first 6 components (alpha = 0.79-0.95) were acceptable. The final questionnaire, the Diabetes Management Evaluation Tool (DMET), was designed to assess patient satisfaction with diabetes disease management programs. Although more extensive testing of the questionnaire is appropriate, preliminary reliability and validity of the DMET has been demonstrated.
Improving Inpatient Surveys: Web-Based Computer Adaptive Testing Accessed via Mobile Phone QR Codes

PubMed Central

2016-01-01

Background The National Health Service (NHS) 70-item inpatient questionnaire surveys inpatients on their perceptions of their hospitalization experience. However, it imposes more burden on the patient than other similar surveys. The literature shows that computerized adaptive testing (CAT) based on item response theory can help shorten the item length of a questionnaire without compromising its precision. Objective Our aim was to investigate whether CAT can be (1) efficient with item reduction and (2) used with quick response (QR) codes scanned by mobile phones. Methods After downloading the 2008 inpatient survey data from the Picker Institute Europe website and analyzing the difficulties of this 70-item questionnaire, we used an author-made Excel program using the Rasch partial credit model to simulate 1000 patients’ true scores followed by a standard normal distribution. The CAT was compared to two other scenarios of answering all items (AAI) and the randomized selection method (RSM), as we investigated item length (efficiency) and measurement accuracy. The author-made Web-based CAT program for gathering patient feedback was effectively accessed from mobile phones by scanning the QR code. Results We found that the CAT can be more efficient for patients answering questions (ie, fewer items to respond to) than either AAI or RSM without compromising its measurement accuracy. A Web-based CAT inpatient survey accessed by scanning a QR code on a mobile phone was viable for gathering inpatient satisfaction responses. Conclusions With advances in technology, patients can now be offered alternatives for providing feedback about hospitalization satisfaction. This Web-based CAT is a possible option in health care settings for reducing the number of survey items, as well as offering an innovative QR code access. PMID:26935793
Improving Inpatient Surveys: Web-Based Computer Adaptive Testing Accessed via Mobile Phone QR Codes.

PubMed

Chien, Tsair-Wei; Lin, Weir-Sen

2016-03-02

The National Health Service (NHS) 70-item inpatient questionnaire surveys inpatients on their perceptions of their hospitalization experience. However, it imposes more burden on the patient than other similar surveys. The literature shows that computerized adaptive testing (CAT) based on item response theory can help shorten the item length of a questionnaire without compromising its precision. Our aim was to investigate whether CAT can be (1) efficient with item reduction and (2) used with quick response (QR) codes scanned by mobile phones. After downloading the 2008 inpatient survey data from the Picker Institute Europe website and analyzing the difficulties of this 70-item questionnaire, we used an author-made Excel program using the Rasch partial credit model to simulate 1000 patients' true scores followed by a standard normal distribution. The CAT was compared to two other scenarios of answering all items (AAI) and the randomized selection method (RSM), as we investigated item length (efficiency) and measurement accuracy. The author-made Web-based CAT program for gathering patient feedback was effectively accessed from mobile phones by scanning the QR code. We found that the CAT can be more efficient for patients answering questions (ie, fewer items to respond to) than either AAI or RSM without compromising its measurement accuracy. A Web-based CAT inpatient survey accessed by scanning a QR code on a mobile phone was viable for gathering inpatient satisfaction responses. With advances in technology, patients can now be offered alternatives for providing feedback about hospitalization satisfaction. This Web-based CAT is a possible option in health care settings for reducing the number of survey items, as well as offering an innovative QR code access.
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory.

PubMed

Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E

2016-04-01

We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
Metacognition for strategy selection during arithmetic problem-solving in young and older adults.

PubMed

Geurten, Marie; Lemaire, Patrick

2018-04-19

We examined participants' strategy choices and metacognitive judgments during arithmetic problem-solving. Metacognitive judgments were collected either prospectively or retrospectively. We tested whether metacognitive judgments are related to strategy choices on the current problems and on the immediately following problems, and age-related differences in relations between metacognition and strategy choices. Data showed that both young and older adults were able to make accurate retrospective, but not prospective, judgments. Moreover, the accuracy of retrospective judgments was comparable in young and older adults when participants had to select and execute the better strategy. Metacognitive accuracy was even higher in older adults when participants had to only select the better strategy. Finally, low-confidence judgments on current items were more frequently followed by better strategy selection on immediately succeeding items than high-confidence judgments in both young and older adults. Implications of these findings to further our understanding of age-related differences and similarities in adults' metacognitive monitoring and metacognitive regulation for strategy selection in the context of arithmetic problem solving are discussed.
The Dominance Concept Inventory: A Tool for Assessing Undergraduate Student Alternative Conceptions about Dominance in Mendelian and Population Genetics.

PubMed

Abraham, Joel K; Perez, Kathryn E; Price, Rebecca M

2014-01-01

Despite the impact of genetics on daily life, biology undergraduates understand some key genetics concepts poorly. One concept requiring attention is dominance, which many students understand as a fixed property of an allele or trait and regularly conflate with frequency in a population or selective advantage. We present the Dominance Concept Inventory (DCI), an instrument to gather data on selected alternative conceptions about dominance. During development of the 16-item test, we used expert surveys (n = 12), student interviews (n = 42), and field tests (n = 1763) from introductory and advanced biology undergraduates at public and private, majority- and minority-serving, 2- and 4-yr institutions in the United States. In the final field test across all subject populations (n = 709), item difficulty ranged from 0.08 to 0.84 (0.51 ± 0.049 SEM), while item discrimination ranged from 0.11 to 0.82 (0.50 ± 0.048 SEM). Internal reliability (Cronbach's alpha) was 0.77, while test-retest reliability values were 0.74 (product moment correlation) and 0.77 (intraclass correlation). The prevalence of alternative conceptions in the field tests shows that introductory and advanced students retain confusion about dominance after instruction. All measures support the DCI as a useful instrument for measuring undergraduate biology student understanding and alternative conceptions about dominance. © 2014 J. K. Abraham et al. CBE—Life Sciences Education © 2014 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
Restricted interests and teacher presentation of items.

PubMed

Stocco, Corey S; Thompson, Rachel H; Rodriguez, Nicole M

2011-01-01

Restricted and repetitive behavior (RRB) is more pervasive, prevalent, frequent, and severe in individuals with autism spectrum disorders (ASDs) than in their typical peers. One subtype of RRB is restricted interests in items or activities, which is evident in the manner in which individuals engage with items (e.g., repetitious wheel spinning), the types of items or activities they select (e.g., preoccupation with a phone book), or the range of items or activities they select (i.e., narrow range of items). We sought to describe the relation between restricted interests and teacher presentation of items. Overall, we observed 5 teachers interacting with 2 pairs of students diagnosed with an ASD. Each pair included 1 student with restricted interests. During these observations, teachers were free to present any items from an array of 4 stimuli selected by experimenters. We recorded student responses to teacher presentation of items and analyzed the data to determine the relation between teacher presentation of items and the consequences for presentation provided by the students. Teacher presentation of items corresponded with differential responses provided by students with ASD, and those with restricted preferences experienced a narrower array of items.
Item-location binding in working memory: is it hippocampus-dependent?

PubMed

Allen, Richard J; Vargha-Khadem, Faraneh; Baddeley, Alan D

2014-07-01

A general consensus is emerging that the hippocampus has an important and active role in the creation of new long-term memory representations of associations or bindings between elements. However, it is less clear whether this contribution can be extended to the creation of temporary bound representations in working memory, involving the retention of small numbers of items over short delays. We examined this by administering a series of recognition and recall tests of working memory for colour-location binding and object-location binding to a patient with highly selective hippocampal damage (Jon), and groups of control participants. Jon achieved high levels of accuracy in all working memory tests of recognition and recall binding across retention intervals of up to 10s. In contrast, Jon performed at chance on an unexpected delayed test of the same object-location binding information. These findings indicate a clear dissociation between working memory and long-term memory, with no evidence for a critical hippocampal contribution to item-location binding in working memory. Copyright © 2014 Elsevier Ltd. All rights reserved.

[Development of a standardized guide for optimizing drug adherence information to be dispensed during a pharmaceutical counseling with a multiple myeloma patient: Initial validation].

PubMed

Favier-Archinard, Camille; Leguelinel-Blache, Géraldine; Dubois, Florent; Le Gall, Tanguy; Bourquard, Pascal; Passemard, Nadège; Tora, Sandrine; Rey, Aurélie; Rossi, Marie; Chevallier, Thierry; Cousin, Christelle; Favier, Mireille

2018-05-01

The safety of the community treatment with oral anticancer therapies is a strong theme of the cancer plan 2014-2019. The objective of this study was to develop a Pharmaceutical Counseling Guide to improve medication adherence in patients treated for multiple myeloma with oral anticancer therapies. A multidisciplinary professional working group selected a list of relevant medication adherence-related items that served as the framework for the design of the pharmaceutical counseling support materials in patient-accessible language. The readability, understanding and memorization of the information were validated in ten patients treated for myeloma. Twelve items were selected for treatment information (5 items), treatment planning (5 items), and adverse drug effects (2 items). A pharmacist guide, a patient guide, a medication schedule, and three self-questionnaires to evaluate medication knowledge and understanding of patients were developed. The patient test resulted in changes in these documents. This study carried out the initial validation of documents to standardize the pharmaceutical counseling for patients treated for myeloma so that it can be reproduced from one patient to another regardless of the pharmacist, by standardizing the information issued. This study needs to be completed by a final validation in myeloma patients, free from oral anticancer therapies. Copyright © 2018 Société Française du Cancer. Published by Elsevier Masson SAS. All rights reserved.
Gender-, age-, and race/ethnicity-based differential item functioning analysis of the movement disorder society-sponsored revision of the Unified Parkinson's disease rating scale.

PubMed

Goetz, Christopher G; Liu, Yuanyuan; Stebbins, Glenn T; Wang, Lu; Tilley, Barbara C; Teresi, Jeanne A; Merkitch, Douglas; Luo, Sheng

2016-12-01

Assess MDS-UPDRS items for gender-, age-, and race/ethnicity-based differential item functioning. Assessing differential item functioning is a core rating scale validation step. For the MDS-UPDRS, differential item functioning occurs if item-score probability among people with similar levels of parkinsonism differ according to selected covariates (gender, age, race/ethnicity). If the magnitude of differential item functioning is clinically relevant, item-score interpretation must consider influences by these covariates. Differential item functioning can be nonuniform (covariate variably influences an item-score across different levels of parkinsonism) or uniform (covariate influences an item-score consistently over all levels of parkinsonism). Using the MDS-UPDRS translation database of more than 5,000 PD patients from 14 languages, we tested gender-, age-, and race/ethnicity-based differential item functioning. To designate an item as having clinically relevant differential item functioning, we required statistical confirmation by 2 independent methods, along with a McFadden pseudo-R 2 magnitude statistic greater than "negligible." Most items showed no gender-, age- or race/ethnicity-based differential item functioning. When differential item functioning was identified, the magnitude statistic was always in the "negligible" range, and the scale-level impact was minimal. The absence of clinically relevant differential item functioning across all items and all parts of the MDS-UPDRS is strong evidence that the scale can be used confidently. As studies of Parkinson's disease increasingly involve multinational efforts and the MDS-UPDRS has several validated non-English translations, the findings support the scale's broad applicability in populations with varying gender, age, and race/ethnicity distributions. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
The VCOP Scale: a measure of overprotection in parents of physically vulnerable children.

PubMed

Wright, L; Mullen, T; West, K; Wyatt, P

1993-11-01

A scale is developed for measuring the overprotecting vs. optimal developmental stimulation tendencies for parents of physically "vulnerable" children. A series of items were administered to parents whose parenting techniques had been rated as either highly overprotective or as optimal by a group of MDs and other professionals. Correlations were estimated between each of the items and parental tendencies as rated by professionals. Twenty-eight items were selected that provided maximum prediction of over-protection. The resulting R2 was extraordinarily high (.94). Coefficient alpha and test-retest coefficients were acceptable. It is hoped that release of the new instrument (VCOPS) at this time will allow others to join in determining the clinical and experimental validity of this scale.
Eye-Movement Analysis Demonstrates Strategic Influences on Intelligence

ERIC Educational Resources Information Center

Vigneau, Francois; Caissie, Andre F.; Bors, Douglas A.

2006-01-01

Taking into account various models and findings pertaining to the nature of analogical reasoning, this study explored quantitative and qualitative individual differences in intelligence using latency and eye-movement data. Fifty-five university students were administered 14 selected items of the Raven's Advanced Progressive Matrices test. Results…
Applications of Decision Theory to Test-Based Decision Making. Project Psychometric Aspects of Item Banking No. 23. Research Report 87-9.

ERIC Educational Resources Information Center

van der Linden, Wim J.

The use of Bayesian decision theory to solve problems in test-based decision making is discussed. Four basic decision problems are distinguished: (1) selection; (2) mastery; (3) placement; and (4) classification, the situation where each treatment has its own criterion. Each type of decision can be identified as a specific configuration of one or…
Identification and analysis of student conceptions used to solve chemical equilibrium problems

NASA Astrophysics Data System (ADS)

Voska, Kirk William

This study identified and quantified chemistry conceptions students use when solving chemical equilibrium problems requiring the application of Le Chatelier's principle, and explored the feasibility of designing a paper and pencil test for this purpose. It also demonstrated the utility of conditional probabilities to assess test quality. A 10-item pencil-and-paper, two-tier diagnostic instrument, the Test to Identify Student Conceptualizations (TISC) was developed and administered to 95 second-semester university general chemistry students after they received regular course instruction concerning equilibrium in homogeneous aqueous, heterogeneous aqueous, and homogeneous gaseous systems. The content validity of TISC was established through a review of TISC by a panel of experts; construct validity was established through semi-structured interviews and conditional probabilities. Nine students were then selected from a stratified random sample for interviews to validate TISC. The probability that TISC correctly identified an answer given by a student in an interview was p = .64, while the probability that TISC correctly identified a reason given by a student in an interview was p=.49. Each TISC item contained two parts. In the first part the student selected the correct answer to a problem from a set of four choices. In the second part students wrote reasons for their answer to the first part. TISC questions were designed to identify students' conceptions concerning the application of Le Chatelier's principle, the constancy of the equilibrium constant, K, and the effect of a catalyst. Eleven prevalent incorrect conceptions were identified. This study found students consistently selected correct answers more frequently (53% of the time) than they provided correct reasons (33% of the time). The association between student answers and respective reasons on each TISC item was quantified using conditional probabilities calculated from logistic regression coefficients. The probability a student provided correct reasoning (B) when the student selected a correct answer (A) ranged from P(B| A) =.32 to P(B| A) =.82. However, the probability a student selected a correct answer when they provided correct reasoning ranged from P(A| B) =.96 to P(A| B) = 1. The K-R 20 reliability for TISC was found to be.79.
Recognizing What Matters: Value Improves Recognition by Selectively Enhancing Recollection

PubMed Central

Hennessee, Joseph P.; Castel, Alan D.; Knowlton, Barbara J.

2017-01-01

We examined the effects of value on recognition by assessing its contribution to recollection and familiarity. In three experiments, participants studied English words, each associated with a point-value they would earn for correct recognition, with the goal of maximizing their score. In Experiment 1, participants provided Remember/Know judgments. In Experiment 2 participants indicated whether items were recollected or if not, their degree of familiarity along a 6-point scale. In Experiment 3, recognition of words was accompanied by a test of memory for incidental details. Across all experiments, participants were more likely to recognize items with higher point-value. Furthermore, value appeared to primarily enhance recollection, as effects on familiarity were small and not consistent across experiments. Recollection of high-value items appears to be accompanied by fewer incidental details, suggesting that value increases focus on items at the expense of irrelevant information. PMID:28827894
Directed forgetting of complex pictures in an item method paradigm.

PubMed

Hauswald, Anne; Kissler, Johanna

2008-11-01

An item-cued directed forgetting paradigm was used to investigate the ability to control episodic memory and selectively encode complex coloured pictures. A series of photographs was presented to 21 participants who were instructed to either remember or forget each picture after it was presented. Memory performance was later tested with a recognition task where all presented items had to be retrieved, regardless of the initial instructions. A directed forgetting effect--that is, better recognition of "to-be-remembered" than of "to-be-forgotten" pictures--was observed, although its size was smaller than previously reported for words or line drawings. The magnitude of the directed forgetting effect correlated negatively with participants' depression and dissociation scores. The results indicate that, at least in an item method, directed forgetting occurs for complex pictures as well as words and simple line drawings. Furthermore, people with higher levels of dissociative or depressive symptoms exhibit altered memory encoding patterns.
Attention Effects During Visual Short-Term Memory Maintenance: Protection or Prioritization?

PubMed Central

Matsukura, Michi; Luck, Steven J.; Vecera, Shaun P.

2007-01-01

Interactions between visual attention and visual short-term memory (VSTM) play a central role in cognitive processing. For example, attention can assist in selectively encoding items into visual memory. Attention appears to be able to influence items already stored in visual memory as well; cues that appear long after the presentation of an array of objects can affect memory for those objects (Griffin & Nobre, 2003). In five experiments, we distinguished two possible mechanisms for the effects of cues on items currently stored in VSTM. A protection account proposes that attention protects the cued item from becoming degraded during the retention interval. By contrast, a prioritization account suggests that attention increases a cued item’s priority during the comparison process that occurs when memory is tested. The results of the experiments were consistent with the first of these possibilities, suggesting that attention can serve to protect VSTM representations while they are being maintained. PMID:18078232
One portion size of foods frequently consumed by Korean adults

PubMed Central

Choi, Mi-Kyeong; Hyun, Wha-Jin; Lee, Sim-Yeol; Park, Hong-Ju; Kim, Se-Na

2010-01-01

This study aimed to define a one portion size of food items frequently consumed for convenient use by Koreans in food selection, diet planning, and nutritional evaluation. We analyzed using the original data on 5,436 persons (60.87%) aged 20 ~ 64 years among 8,930 persons to whom NHANES 2005 and selected food items consumed by the intake frequency of 30 or higher among the 500 most frequently consumed food items. A total of 374 varieties of food items of regular use were selected. And the portion size of food items was set on the basis of the median (50th percentile) of the portion size for a single intake by a single person was analyzed. In cereals, the portion size of well polished rice was 80 g. In meats, the portion size of Korean beef cattle was 25 g. Among vegetable items, the portion size of Baechukimchi was 40 g. The portion size of the food items of regular use set in this study will be conveniently and effectively used by general consumers in selecting food items for a nutritionally balanced diet. In addition, these will be used as the basic data in setting the serving size in meal planning. PMID:20198213
Anatomical constraints on attention: Hemifield independence is a signature of multifocal spatial selection

PubMed Central

Alvarez, George A; Gill, Jonathan; Cavanagh, Patrick

2012-01-01

Previous studies have shown independent attentional selection of targets in the left and right visual hemifields during attentional tracking (Alvarez & Cavanagh, 2005) but not during a visual search (Luck, Hillyard, Mangun, & Gazzaniga, 1989). Here we tested whether multifocal spatial attention is the critical process that operates independently in the two hemifields. It is explicitly required in tracking (attend to a subset of object locations, suppress the others) but not in the standard visual search task (where all items are potential targets). We used a modified visual search task in which observers searched for a target within a subset of display items, where the subset was selected based on location (Experiments 1 and 3A) or based on a salient feature difference (Experiments 2 and 3B). The results show hemifield independence in this subset visual search task with location-based selection but not with feature-based selection; this effect cannot be explained by general difficulty (Experiment 4). Combined, these findings suggest that hemifield independence is a signature of multifocal spatial attention and highlight the need for cognitive and neural theories of attention to account for anatomical constraints on selection mechanisms. PMID:22637710
Introducing a short version of the physical self description questionnaire: new strategies, short-form evaluative criteria, and applications of factor analyses.

PubMed

Marsh, Herbert W; Martin, Andrew J; Jackson, Susan

2010-08-01

Based on the Physical Self Description Questionnaire (PSDQ) normative archive (n = 1,607 Australian adolescents), 40 of 70 items were selected to construct a new short form (PSDQ-S). The PSDQ-S was evaluated in a new cross-validation sample of 708 Australian adolescents and four additional samples: 349 Australian elite-athlete adolescents, 986 Spanish adolescents, 395 Israeli university students, 760 Australian older adults. Across these six groups, the 11 PSDQ-S factors had consistently high reliabilities and invariant factor structures. Study 1, using a missing-by-design variation of multigroup invariance tests, showed invariance across 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance over a 1-year interval (test-retest correlations .57-.90; Mdn = .77), and good convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to two other physical self-concept instruments.
Contextual behavior and neural circuits

PubMed Central

Lee, Inah; Lee, Choong-Hee

2013-01-01

Animals including humans engage in goal-directed behavior flexibly in response to items and their background, which is called contextual behavior in this review. Although the concept of context has long been studied, there are differences among researchers in defining and experimenting with the concept. The current review aims to provide a categorical framework within which not only the neural mechanisms of contextual information processing but also the contextual behavior can be studied in more concrete ways. For this purpose, we categorize contextual behavior into three subcategories as follows by considering the types of interactions among context, item, and response: contextual response selection, contextual item selection, and contextual item–response selection. Contextual response selection refers to the animal emitting different types of responses to the same item depending on the context in the background. Contextual item selection occurs when there are multiple items that need to be chosen in a contextual manner. Finally, when multiple items and multiple contexts are involved, contextual item–response selection takes place whereby the animal either chooses an item or inhibits such a response depending on item–context paired association. The literature suggests that the rhinal cortical regions and the hippocampal formation play key roles in mnemonically categorizing and recognizing contextual representations and the associated items. In addition, it appears that the fronto-striatal cortical loops in connection with the contextual information-processing areas critically control the flexible deployment of adaptive action sets and motor responses for maximizing goals. We suggest that contextual information processing should be investigated in experimental settings where contextual stimuli and resulting behaviors are clearly defined and measurable, considering the dynamic top-down and bottom-up interactions among the neural systems for contextual behavior. PMID:23675321
An Item Gains and Losses Analysis of False Memories Suggests Critical Items Receive More Item-Specific Processing than List Items

ERIC Educational Resources Information Center

Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.

2006-01-01

In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
Discriminant content validity: a quantitative methodology for assessing content of theory-based measures, with illustrative applications.

PubMed

Johnston, Marie; Dixon, Diane; Hart, Jo; Glidewell, Liz; Schröder, Carin; Pollard, Beth

2014-05-01

In studies involving theoretical constructs, it is important that measures have good content validity and that there is not contamination of measures by content from other constructs. While reliability and construct validity are routinely reported, to date, there has not been a satisfactory, transparent, and systematic method of assessing and reporting content validity. In this paper, we describe a methodology of discriminant content validity (DCV) and illustrate its application in three studies. Discriminant content validity involves six steps: construct definition, item selection, judge identification, judgement format, single-sample test of content validity, and assessment of discriminant items. In three studies, these steps were applied to a measure of illness perceptions (IPQ-R) and control cognitions. The IPQ-R performed well with most items being purely related to their target construct, although timeline and consequences had small problems. By contrast, the study of control cognitions identified problems in measuring constructs independently. In the final study, direct estimation response formats for theory of planned behaviour constructs were found to have as good DCV as Likert format. The DCV method allowed quantitative assessment of each item and can therefore inform the content validity of the measures assessed. The methods can be applied to assess content validity before or after collecting data to select the appropriate items to measure theoretical constructs. Further, the data reported for each item in Appendix S1 can be used in item or measure selection. Statement of contribution What is already known on this subject? There are agreed methods of assessing and reporting construct validity of measures of theoretical constructs, but not their content validity. Content validity is rarely reported in a systematic and transparent manner. What does this study add? The paper proposes discriminant content validity (DCV), a systematic and transparent method of assessing and reporting whether items assess the intended theoretical construct and only that construct. In three studies, DCV was applied to measures of illness perceptions, control cognitions, and theory of planned behaviour response formats. Appendix S1 gives content validity indices for each item of each questionnaire investigated. Discriminant content validity is ideally applied while the measure is being developed, before using to measure the construct(s), but can also be applied after using a measure. © 2014 The British Psychological Society.
Interservice Procedures for Instructional Systems Development. Phase 3. Develop

DTIC Science & Technology

1975-08-01

Occur at wide intervals to be learned *Reads about the actions to *Occur at the end, but before be learned tests or on-the-job performance *Watches a...the particular sub-category. Use the learning objective action statement, conditions, standards, and the test item to help select which guidelines to...objective. EXAMPLE If you have a CLASSIFYING objective like "identifying poisonous plants,’ when you get to guideline 16. "To test learning, require the
Memory capacity, selective control, and value-directed remembering in children with and without attention-deficit/hyperactivity disorder (ADHD).

PubMed

Castel, Alan D; Lee, Steve S; Humphreys, Kathryn L; Moore, Amy N

2011-01-01

The ability to select what is important to remember, to attend to this information, and to recall high-value items leads to the efficient use of memory. The present study examined how children with and without attention-deficit/hyperactivity disorder (ADHD) performed on an incentive-based selectivity task in which to-be-remembered items were worth different point values. Participants were 6-9 year old children with ADHD (n = 57) and without ADHD (n = 59). Using a selectivity task, participants studied words paired with point values and were asked to maximize their score, which was the overall value of the items they recalled. This task allows for measures of memory capacity and the ability to selectively remember high-value items. Although there were no significant between-groups differences in the number of words recalled (memory capacity), children with ADHD were less selective than children in the control group in terms of the value of the items they recalled (control of memory). All children recalled more high-value items than low-value items and showed some learning with task experience, but children with ADHD Combined type did not efficiently maximize memory performance (as measured by a selectivity index) relative to children with ADHD Inattentive type and healthy controls, who did not differ significantly from one another. Children with ADHD Combined type exhibit impairments in the strategic and efficient encoding and recall of high-value items. The findings have implications for theories of memory dysfunction in childhood ADHD and the key role of metacognition, cognitive control, and value-directed remembering when considering the strategic use of memory. (c) 2010 APA, all rights reserved
Free Recall Test Experience Potentiates Strategy-Driven Effects of Value on Memory

ERIC Educational Resources Information Center

Cohen, Michael S.; Rissman, Jesse; Hovhannisyan, Mariam; Castel, Alan D.; Knowlton, Barbara J.

2017-01-01

People tend to show better memory for information that is deemed valuable or important. By one mechanism, individuals selectively engage deeper, semantic encoding strategies for high value items (Cohen, Rissman, Suthana, Castel, & Knowlton, 2014). By another mechanism, information paired with value or reward is automatically strengthened in…
Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
The Bangor Voice Matching Test: A standardized test for the assessment of voice perception ability.

PubMed

Mühl, Constanze; Sheil, Orla; Jarutytė, Lina; Bestelmeyer, Patricia E G

2017-11-09

Recognising the identity of conspecifics is an important yet highly variable skill. Approximately 2 % of the population suffers from a socially debilitating deficit in face recognition. More recently the existence of a similar deficit in voice perception has emerged (phonagnosia). Face perception tests have been readily available for years, advancing our understanding of underlying mechanisms in face perception. In contrast, voice perception has received less attention, and the construction of standardized voice perception tests has been neglected. Here we report the construction of the first standardized test for voice perception ability. Participants make a same/different identity decision after hearing two voice samples. Item Response Theory guided item selection to ensure the test discriminates between a range of abilities. The test provides a starting point for the systematic exploration of the cognitive and neural mechanisms underlying voice perception. With a high test-retest reliability (r=.86) and short assessment duration (~10 min) this test examines individual abilities reliably and quickly and therefore also has potential for use in developmental and neuropsychological populations.

Implementing AORN recommended practices for selection and use of packaging systems for sterilization.

PubMed

Morton, Paula J; Conner, Ramona

2014-04-01

The delivery of sterile products to the sterile field is essential to perioperative practice. The use of protective packaging for sterilized items is crucial to helping ensure that patients receive sterile items for surgical procedures. AORN's "Recommended practices for selection and use of packaging systems for sterilization" offers guidance to perioperative team members in evaluating, selecting, and using packaging systems that permit sterilization of the contents, prevent contamination of sterilized items until the package is opened for use, protect the items from damage during transport and storage, and permit aseptic delivery of the items to the sterile field. Copyright © 2014 AORN, Inc. Published by Elsevier Inc. All rights reserved.
Measurement of fatigue: Comparison of the reliability and validity of single-item and short measures to a comprehensive measure.

PubMed

Kim, Hee-Ju; Abraham, Ivo

2017-01-01

Evidence is needed on the clinicometric properties of single-item or short measures as alternatives to comprehensive measures. We examined whether two single-item fatigue measures (i.e., Likert scale, numeric rating scale) or a short fatigue measure were comparable to a comprehensive measure in reliability (i.e., internal consistency and test-retest reliability) and validity (i.e., convergent, concurrent, and predictive validity) in Korean young adults. For this quantitative study, we selected the Functional Assessment of Chronic Illness Therapy-Fatigue for the comprehensive measure and the Profile of Mood States-Brief, Fatigue subscale for the short measure; and constructed two single-item measures. A total of 368 students from four nursing colleges in South Korea participated. We used Cronbach's alpha and item-total correlation for internal consistency reliability and intraclass correlation coefficient for test-retest reliability. We assessed Pearson's correlation with a comprehensive measure for convergent validity, with perceived stress level and sleep quality for concurrent validity and the receiver operating characteristic curve for predictive validity. The short measure was comparable to the comprehensive measure in internal consistency reliability (Cronbach's alpha=0.81 vs. 0.88); test-retest reliability (intraclass correlation coefficient=0.66 vs. 0.61); convergent validity (r with comprehensive measure=0.79); concurrent validity (r with perceived stress=0.55, r with sleep quality=0.39) and predictive validity (area under curve=0.88). Single-item measures were not comparable to the comprehensive measure. A short fatigue measure exhibited similar levels of reliability and validity to the comprehensive measure in Korean young adults. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
Relationship of college student characteristics and inquiry-based geometrical optics instruction to knowledge of image formation with light-ray tracing

NASA Astrophysics Data System (ADS)

Isik, Hakan

This study is premised on the fact that student conceptions of optics appear to be unrelated to student characteristics of gender, age, years since high school graduation, or previous academic experiences. This study investigated the relationships between student characteristics and student performance on image formation test items and the changes in student conceptions of optics after an introductory inquiry-based physics course. Data was collected from 39 college students who were involved in an inquiry-based physics course teaching topics of geometrical optics. Student data concerning characteristics and previous experiences with optics and mathematics were collected. Assessment of student understanding of optics knowledge for pinholes, plane mirrors, refraction, and convex lenses was collected with, the Test of Image Formation with Light-Ray Tracing instrument. Total scale and subscale scores representing the optics instrument content were derived from student pretest and posttest responses. The types of knowledge, needed to answer each optics item correctly, were categorized as situational, conceptual, procedural, and strategic knowledge. These types of knowledge were associated with student correct and incorrect responses to each item to explain the existences and changes in student scientific and naive conceptions. Correlation and stepwise multiple regression analyses were conducted to identify the student characteristics and academic experiences that significantly predicted scores on the subscales of the test. The results showed that student experience with calculus was a significant predictor of student performance on the total scale as well as on the refraction subscale of the Test of Image Formation with Light-Ray Tracing. A combination of student age and previous academic experience with precalculus was a significant predictor of student performance on the pretest pinhole subscale. Student characteristic of years since high school graduation significantly predicted the gain in student scores on pinhole and plane-mirror items from the pretest to the posttest with those students who were most recent graduates from high school doing better. Multivariate and univariate analyses of variance of the Test of Image Formation with Light-Ray Tracing pinhole scale and individual item changes from the pretest to the posttest resulted in statistically significant mean differences between total scores as well as between various individual pinhole items. There were no significant changes for individual plane-mirror items from pretest to posttest. Results revealed that there is a perceivable relationship between student optics-content knowledge and the types of knowledge required by items. At the pretest, the greatest selection of wrong responses related to the items requiring situational type of knowledge and the fewest selection of wrong responses was relate to the items requiring procedural type of knowledge. Student selection of wrong options for each item revealed the following naive optics conceptions: pinholes do not create reversed images (pretest), size and sharpness of pinhole images are related to the focus of a pinhole camera (pretest and posttest); propagation of light rays are interpreted as being radial rather than directional (pretest and posttest); no conception of image formation and observation for parallel mirrors (pretest and posttest), the place of an image depends on the position of the observer (pretest and posttest), a plane mirror reflects the images of the objects placed at one side of the mirror and the observers who were positioned at the other side of the mirror can see them (pretest and posttest); applying the law of reflection to plane mirrors without considering the variations in angles of incidence and reflection (pretest and posttest), and image observation is confused with the image formation in mirrors placed perpendicular to one another (pretest and posttest). Future research should focus on the acquisition, development, and identification of reliable measures of optics concepts, processes, types of knowledge, and specific optics understanding (i.e., pinhole, plane-mirror). Future research should focus on the identification of the more critical concepts such as changes in size and sharpness of pinhole images, image observation, image formation in general, and image formation and observation in parallel mirrors. Future research can be conducted with a larger set of participants so as to compare different instructional methods and address instructional deficiencies using more efficient statistical methods. Comparative studies can be conducted to investigate the relations of various instructional strategies on student conceptions of optics.
Clinical evaluation of satisfaction in patients rehabilitated with an immediately loaded implant-supported prosthesis: a controlled prospective study.

PubMed

Scala, Rudy; Cucchi, Alessandro; Ghensi, Paolo; Vartolo, Francesco

2012-01-01

The purpose of this controlled prospective study was to compare the satisfaction of patients rehabilitated with an immediately loaded implant-supported prosthesis and patients rehabilitated with a conventional denture in the mandible. Selected mandibular partially or totally edentulous patients were included in this prospective study. Patients' mandibles were completely rehabilitated with immediately loaded implants supporting a screw-retained full-arch prosthesis (test group) or with a conventional denture (control group). The Satisfaction Profile (SAT-P), which investigates a number of psychologic aspects related to the function and esthetics of the stomatognathic apparatus, was administered to each patient 1 month before and 3 months after provisional prosthetic rehabilitation. The questionnaire comprised four different SAT-P items: quality of eating, eating behavior, mood, and self-confidence. A visual analog scale was used to elicit patient responses. SAT-P item scores were analyzed statistically by means of the Student t test and the chi-square test (or the Mann-Whitney nonparametric test), with P < .05 considered significant. Forty-one patients were consecutively treated with 205 immediately loaded implants supporting a screw-retained full-arch prosthesis (test group); 38 patients were consecutively treated with a conventional denture (control group). Statistically significant differences were observed between the test and control groups for all four SAT-P items. The test group reported greater satisfaction for all items versus the control group. In both groups, the differences between pre- and postrehabilitation values were statistically significant. Each patient was satisfied with their treatment outcomes, but patients who received an implant-supported prosthesis were more satisfied than the patients who received a conventional denture. The results suggest that a screw-retained full-arch prosthesis on immediately loaded implants is a predictable means of enhancing patient satisfaction.
Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: a comparison of worked examples.

PubMed

Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D

2015-01-01

To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.
The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.

PubMed

Engelhard, Matthew M; Schmidt, Karen M; Engel, Casey E; Brenton, J Nicholas; Patek, Stephen D; Goldman, Myla D

2016-12-01

The Multiple Sclerosis Walking Scale (MSWS-12) is the predominant patient-reported measure of multiple sclerosis (MS) -elated walking ability, yet it had not been analyzed using item response theory (IRT), the emerging standard for patient-reported outcome (PRO) validation. This study aims to reduce MSWS-12 measurement error and facilitate computerized adaptive testing by creating an IRT model of the MSWS-12 and distributing it online. MSWS-12 responses from 284 subjects with MS were collected by mail and used to fit and compare several IRT models. Following model selection and assessment, subpopulations based on age and sex were tested for differential item functioning (DIF). Model comparison favored a one-dimensional graded response model (GRM). This model met fit criteria and explained 87 % of response variance. The performance of each MSWS-12 item was characterized using category response curves (CRCs) and item information. IRT-based MSWS-12 scores correlated with traditional MSWS-12 scores (r = 0.99) and timed 25-foot walk (T25FW) speed (r = -0.70). Item 2 showed DIF based on age (χ 2 = 19.02, df = 5, p < 0.01), and Item 11 showed DIF based on sex (χ 2 = 13.76, df = 5, p = 0.02). MSWS-12 measurement error depends on walking ability, but could be lowered by improving or replacing items with low information or DIF. The e-MSWS-12 includes IRT-based scoring, error checking, and an estimated T25FW derived from MSWS-12 responses. It is available at https://ms-irt.shinyapps.io/e-MSWS-12 .
Frequency of consumption of cariogenic food items by 4-month-old to 24-month-old children: comparison between two rural communities in KwaZulu-Natal, South Africa.

PubMed

MacKeown, Jennifer M; Faber, Mieke

2005-03-01

The objective of the study was to compare the frequency of consumption of cariogenic food items among 4-month-old to 24-month-old children in two neighbouring rural areas in KwaZulu-Natal Province, South Africa: Nyuswa/Embo (Area A) (n = 127) and Ndunakazi (Area B) (n = 105). Dietary intake was assessed using a food frequency questionnaire. Mothers or caregivers were interviewed by a team of Zulu-speaking fieldworkers. The percentage of children consuming the individual food items (consumers) and the weekly consumption for consumers were calculated for the two areas separately. The food items were ranked in descending order according to the combined group of children and reported for each area within five selected food groups (carbohydrates, sugars, fruit and vegetables, milk and milk products, and other foods and snacks). Food items were 'flagged' according to their cariogenic potential. Fisher's exact test on absolute numbers tested for significant differences in the frequency of intake between individual food items between the two groups. Significance was set at P < 0.05. The frequency of consumption of certain listed cariogenic food items showed significant differences between the two areas. A higher percentage of children in Area A than in Area B consumed most of the food items and also more frequently. Children mainly consumed foods with a cariogenic score of 2, solid foods with 8-20% sugars as well as foods high in starch with less than 10% sugars. This knowledge is essential to gain insight into the eating pattern among rural communities and will provide a baseline for developing and adapting dietary advice specifically for young rural South African children with particular emphasis on the prevention of dental caries.
Development and Validation of a Novel Generic Health-related Quality of Life Instrument With 20 Items (HINT-20).

PubMed

Jo, Min-Woo; Lee, Hyeon-Jeong; Kim, Soo Young; Kim, Seon-Ha; Chang, Hyejung; Ahn, Jeonghoon; Ock, Minsu

2017-01-01

Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability.
Neural Correlates of Encoding Within- and Across-Domain Inter-Item Associations

PubMed Central

Park, Heekyeong; Rugg, Michael D.

2012-01-01

The neural correlates of the encoding of associations between pairs of words, pairs of pictures, and word-picture pairs were compared. The aims were to determine first, whether the neural correlates of associative encoding vary according to study material and second, whether encoding of across- versus within-material item pairs is associated with dissociable patterns of hippocampal and perirhinal activity, as predicted by the ‘domain dichotomy’ hypothesis of medial temporal lobe (MTL) function. While undergoing fMRI scanning, subjects (n = 24) were presented with the three classes of study pairs, judging which of the denoted objects fit into the other. Outside of the scanner, subjects then undertook an associative recognition task, discriminating between intact study pairs, rearranged pairs comprising items that had been presented on different study trials, and unstudied item pairs. The neural correlates of successful associative encoding – subsequent associative memory effects – were operationalized as the difference in activity between study pairs correctly judged intact versus pairs incorrectly judged rearranged on the subsequent memory test. Pair type-independent subsequent memory effects were evident in the left inferior frontal gyrus (IFG) and the hippocampus. Picture-picture pairs elicited material-selective effects in regions of fusiform cortex that were also activated to a greater extent on picture trials than word trials, while word-word pairs elicited material-selective subsequent memory effects in left lateral temporal cortex. Contrary to the domain-dichotomy hypothesis, neither hippocampal nor perirhinal subsequent memory effects differed depending on whether they were elicited by within- versus across-material study pairs. It is proposed that the left IFG plays a domain-general role in associative encoding, that associative encoding can also be facilitated by enhanced processing in material-selective cortical regions, and that the hippocampus and perirhinal cortex contribute equally to the formation of inter-item associations regardless of whether the items belong to the same or to different processing domains. PMID:21254802
Validation of a General and Sport Nutrition Knowledge Questionnaire in Adolescents and Young Adults: GeSNK.

PubMed

Calella, Patrizia; Iacullo, Vittorio Maria; Valerio, Giuliana

2017-04-29

Good knowledge of nutrition is widely thought to be an important aspect to maintaining a balanced and healthy diet. The aim of this study was to develop and validate a new reliable tool to measure the general and the sport nutrition knowledge (GeSNK) in people who used to practice sports at different levels. The development of (GeSNK) was carried out in six phases as follows: (1) item development and selection by a panel of experts; (2) pilot study in order to assess item difficulty and item discrimination; (3) measurement of the internal consistency; (4) reliability assessment with a 2-week test-retest analysis; (5) concurrent validity was tested by administering the questionnaire along with other two similar tools; (6) construct validity by administering the questionnaire to three groups of young adults with different general nutrition and sport nutrition knowledge. The final questionnaire, consisted of 62 items of the original 183 questions. It is a consistent, valid, and suitable instrument that can be applied over time, making it a promising tool to look at the relationship between nutrition knowledge, demographic characteristics, and dietary behavior in adolescents and young adults.
Cross-cultural adaptation and validation of a Bengali version of the modified fibromyalgia impact questionnaire

PubMed Central

2012-01-01

Background Currently, no validated instruments are available to measure the health status of Bangladeshi patients with fibromyalgia (FM). The aims of this study were to cross-culturally adapt the modified Fibromyalgia Impact Questionnaire (FIQ) into Bengali (B-FIQ) and to test its validity and reliability in Bangladeshi patients with FM. Methods The FIQ was translated following cross-cultural adaptation guidelines and pretested in 30 female patients with FM. Next, the adapted B-FIQ was physician-administered to 102 consecutive female FM patients together with the Health Assessment Questionnaire (HAQ), selected subscales of the SF-36, and visual analog scales for current clinical symptoms. A tender point count (TPC) was performed by an experienced rheumatologist. Forty randomly selected patients completed the B-FIQ again after 7 days. Two control groups of 50 healthy people and 50 rheumatoid arthritis (RA) patients also completed the B-FIQ. Results For the final B-FIQ, five physical function sub-items were replaced with culturally appropriate equivalents. Internal consistency was adequate for both the 11-item physical function subscale (α = 0.73) and the total scale (α = 0.83). With exception of the physical function subscale, expected correlations were generally observed between the B-FIQ items and selected subscales of the SF-36, HAQ, clinical symptoms, and TPC. The B-FIQ was able to discriminate between FM patients and healthy controls and between FM patients and RA patients. Test-retest reliability was adequate for the physical function subscale (r = 0.86) and individual items (r = 0.73-0.86), except anxiety (r = 0.27) and morning tiredness (r = 0.64). Conclusion This study supports the reliability and validity of the B-FIQ as a measure of functional disability and health status in Bangladeshi women with FM. PMID:22925458
Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
A New Tool for Nutrition App Quality Evaluation (AQEL): Development, Validation, and Reliability Testing.

PubMed

DiFilippo, Kristen Nicole; Huang, Wenhao; Chapman-Novakofski, Karen M

2017-10-27

The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps' educational quality and technical functionality. Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps' qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. ©Kristen Nicole DiFilippo, Wenhao Huang, Karen M. Chapman-Novakofski. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 27.10.2017.
Assessing the equivalence of Web-based and paper-and-pencil questionnaires using differential item and test functioning (DIF and DTF) analysis: a case of the Four-Dimensional Symptom Questionnaire (4DSQ).

PubMed

Terluin, Berend; Brouwers, Evelien P M; Marchand, Miquelle A G; de Vet, Henrica C W

2018-05-01

Many paper-and-pencil (P&P) questionnaires have been migrated to electronic platforms. Differential item and test functioning (DIF and DTF) analysis constitutes a superior research design to assess measurement equivalence across modes of administration. The purpose of this study was to demonstrate an item response theory (IRT)-based DIF and DTF analysis to assess the measurement equivalence of a Web-based version and the original P&P format of the Four-Dimensional Symptom Questionnaire (4DSQ), measuring distress, depression, anxiety, and somatization. The P&P group (n = 2031) and the Web group (n = 958) consisted of primary care psychology clients. Unidimensionality and local independence of the 4DSQ scales were examined using IRT and Yen's Q3. Bifactor modeling was used to assess the scales' essential unidimensionality. Measurement equivalence was assessed using IRT-based DIF analysis using a 3-stage approach: linking on the latent mean and variance, selection of anchor items, and DIF testing using the Wald test. DTF was evaluated by comparing expected scale scores as a function of the latent trait. The 4DSQ scales proved to be essentially unidimensional in both modalities. Five items, belonging to the distress and somatization scales, displayed small amounts of DIF. DTF analysis revealed that the impact of DIF on the scale level was negligible. IRT-based DIF and DTF analysis is demonstrated as a way to assess the equivalence of Web-based and P&P questionnaire modalities. Data obtained with the Web-based 4DSQ are equivalent to data obtained with the P&P version.
Environmental opportunities questionnaire: development of a measure of the environment supporting early motor development in the first year of life.

PubMed

Doralp, Samantha; Bartlett, Doreen J

2013-09-01

The development and testing of a measure evaluating the quality and variability in the home environment as it relates to the motor development of infants during the first year of life. A sample of 112 boys and 95 girls with a mean age of 7.1 months (SD 1.8) and GA of 39.6 weeks (SD 1.5) participated in the study. The measurement development process was divided into three phases: measurement development (item generation or selection of items from existing measurement tools), pilot testing to determine acceptability and feasibility to parents, and exploratory factor analysis to organize items into meaningful concepts. Test-retest reliability and internal consistency were also determined. The environmental opportunities questionnaire (EOQ) is a feasible 21-item measure comprised of three factors including opportunities in the play space, sensory variety and parental encouragement. Overall, test-retest reliability was 0.92 (CI 0.84-0.96) and the internal consistency is 0.79. The EOQ emphasizes quality of the environment and access to equipment and toys that have the potential to facilitate early motor development. The preliminary analyses reported here suggest more work could be done on the EOQ to strengthen its use for research or clinical purposes; however, it is adequate for use in its current form. Implications for Rehabilitation New and feasible 21-item questionnaire that enables identification of malleable environmental factors that serve as potential points of intervention for children that are not developing typically. Therapeutic tool for use by therapists to inform and guide discussions with caregivers about potential influences of environmental, social and attitudinal factors in their child's early development.
There’s more to food store choice than proximity: a questionnaire development study

PubMed Central

2013-01-01

Background Proximity of food stores is associated with dietary intake and obesity; however, individuals frequently shop at stores that are not the most proximal. Little is known about other factors that influence food store choice. The current research describes the development of the Food Store Selection Questionnaire (FSSQ) and describes preliminary results of field testing the questionnaire. Methods Development of the FSSQ involved a multidisciplinary literature review, qualitative analysis of focus group transcripts, and expert and community reviews. Field testing consisted of 100 primary household food shoppers (93% female, 64% African American), in rural and urban Arkansas communities, rating FSSQ items as to their importance in store choice and indicating their top two reasons. After eliminating 14 items due to low mean importance scores and high correlations with other items, the final FSSQ questionnaire consists of 49 items. Results Items rated highest in importance were: meat freshness; store maintenance; store cleanliness; meat varieties; and store safety. Items most commonly rated as top reasons were: low prices; proximity to home; fruit/vegetable freshness; fruit/vegetable variety; and store cleanliness. Conclusions The FSSQ is a comprehensive questionnaire for detailing key reasons in food store choice. Although proximity to home was a consideration for participants, there were clearly other key factors in their choice of a food store. Understanding the relative importance of these different dimensions driving food store choice in specific communities may be beneficial in informing policies and programs designed to support healthy dietary intake and obesity prevention. PMID:23773428
There's more to food store choice than proximity: a questionnaire development study.

PubMed

Krukowski, Rebecca A; Sparks, Carla; DiCarlo, Marisha; McSweeney, Jean; West, Delia Smith

2013-06-17

Proximity of food stores is associated with dietary intake and obesity; however, individuals frequently shop at stores that are not the most proximal. Little is known about other factors that influence food store choice. The current research describes the development of the Food Store Selection Questionnaire (FSSQ) and describes preliminary results of field testing the questionnaire. Development of the FSSQ involved a multidisciplinary literature review, qualitative analysis of focus group transcripts, and expert and community reviews. Field testing consisted of 100 primary household food shoppers (93% female, 64% African American), in rural and urban Arkansas communities, rating FSSQ items as to their importance in store choice and indicating their top two reasons. After eliminating 14 items due to low mean importance scores and high correlations with other items, the final FSSQ questionnaire consists of 49 items. Items rated highest in importance were: meat freshness; store maintenance; store cleanliness; meat varieties; and store safety. Items most commonly rated as top reasons were: low prices; proximity to home; fruit/vegetable freshness; fruit/vegetable variety; and store cleanliness. The FSSQ is a comprehensive questionnaire for detailing key reasons in food store choice. Although proximity to home was a consideration for participants, there were clearly other key factors in their choice of a food store. Understanding the relative importance of these different dimensions driving food store choice in specific communities may be beneficial in informing policies and programs designed to support healthy dietary intake and obesity prevention.
The Mini-OAKHQOL for knee and hip osteoarthritis quality of life was obtained following recent shortening guidelines.

PubMed

Guillemin, Francis; Rat, Anne-Christine; Goetz, Christophe; Spitz, Elisabeth; Pouchot, Jacques; Coste, Joël

2016-01-01

To develop a short form of the knee and hip osteoarthritis quality of life questionnaire, the Mini-OAKHQOL, preserving the conceptual model and, as far as possible, the content and the psychometric properties of the original instrument. A two-step shortening procedure was used: (1) a consensus Delphi method, with a panel of patients and another of professionals independently asked to select items and (2) a nominal group, where patients, professionals, and methodologists reached consensus on the final selection of items, using information from the panels and from modern measurement and classical test theory analyses. The psychometric properties of the Mini-OAKHQOL were assessed in an independent population-based sample of 581 subjects with knee or hip osteoarthritis. The two-step shortening procedure resulted in a 20-item questionnaire. Confirmatory factor analysis showed preservation of the original five-dimensional structure. Rasch analyses showed the unidimensionality and invariance by sex, age, and joint of the main dimensions. Convergent validity, reproducibility, and internal consistency were similar to or better than those of the original OAKHQOL. The 20-item Mini-OAKHQOL has good psychometric properties and can be used for the measurement of quality of life in subjects with osteoarthritis of the lower limbs. Copyright © 2016 Elsevier Inc. All rights reserved.
The SPARK Tool to prioritise questions for systematic reviews in health policy and systems research: development and initial validation.

PubMed

Akl, Elie A; Fadlallah, Racha; Ghandour, Lilian; Kdouh, Ola; Langlois, Etienne; Lavis, John N; Schünemann, Holger; El-Jardali, Fadi

2017-09-04

Groups or institutions funding or conducting systematic reviews in health policy and systems research (HPSR) should prioritise topics according to the needs of policymakers and stakeholders. The aim of this study was to develop and validate a tool to prioritise questions for systematic reviews in HPSR. We developed the tool following a four-step approach consisting of (1) the definition of the purpose and scope of tool, (2) item generation and reduction, (3) testing for content and face validity, (4) and pilot testing of the tool. The research team involved international experts in HPSR, systematic review methodology and tool development, led by the Center for Systematic Reviews on Health Policy and Systems Research (SPARK). We followed an inclusive approach in determining the final selection of items to allow customisation to the user's needs. The purpose of the SPARK tool was to prioritise questions in HPSR in order to address them in systematic reviews. In the item generation and reduction phase, an extensive literature search yielded 40 relevant articles, which were reviewed by the research team to create a preliminary list of 19 candidate items for inclusion in the tool. As part of testing for content and face validity, input from international experts led to the refining, changing, merging and addition of new items, and to organisation of the tool into two modules. Following pilot testing, we finalised the tool, with 22 items organised in two modules - the first module including 13 items to be rated by policymakers and stakeholders, and the second including 9 items to be rated by systematic review teams. Users can customise the tool to their needs, by omitting items that may not be applicable to their settings. We also developed a user manual that provides guidance on how to use the SPARK tool, along with signaling questions. We have developed and conducted initial validation of the SPARK tool to prioritise questions for systematic reviews in HPSR, along with a user manual. By aligning systematic review production to policy priorities, the tool will help support evidence-informed policymaking and reduce research waste. We invite others to contribute with additional real-life implementation of the tool.
ADHD and retrieval-induced forgetting: evidence for a deficit in the inhibitory control of memory.

PubMed

Storm, Benjamin C; White, Holly A

2010-04-01

Research on retrieval-induced forgetting has shown that the selective retrieval of some information can cause the forgetting of other information. Such forgetting is believed to result from inhibitory processes that function to resolve interference during retrieval. The current study examined whether individuals with ADHD demonstrate normal levels of retrieval-induced forgetting. A total of 40 adults with ADHD and 40 adults without ADHD participated in a standard retrieval-induced forgetting experiment. Critically, half of the items were tested using category cues and the other half of the items were tested using category-plus-one-letter-stem cues. Whereas both ADHD and non-ADHD participants demonstrated retrieval-induced forgetting on the final category-cued recall test, only non-ADHD participants demonstrated retrieval-induced forgetting on the final category-plus-stem-cued recall test. These results suggest that individuals with ADHD do have a deficit in the inhibitory control of memory, but that this deficit may only be apparent when output interference is adequately controlled on the final test.

The effects of relative food item size on optimal tooth cusp sharpness during brittle food item processing

PubMed Central

Berthaume, Michael A.; Dumont, Elizabeth R.; Godfrey, Laurie R.; Grosse, Ian R.

2014-01-01

Teeth are often assumed to be optimal for their function, which allows researchers to derive dietary signatures from tooth shape. Most tooth shape analyses normalize for tooth size, potentially masking the relationship between relative food item size and tooth shape. Here, we model how relative food item size may affect optimal tooth cusp radius of curvature (RoC) during the fracture of brittle food items using a parametric finite-element (FE) model of a four-cusped molar. Morphospaces were created for four different food item sizes by altering cusp RoCs to determine whether optimal tooth shape changed as food item size changed. The morphospaces were also used to investigate whether variation in efficiency metrics (i.e. stresses, energy and optimality) changed as food item size changed. We found that optimal tooth shape changed as food item size changed, but that all optimal morphologies were similar, with one dull cusp that promoted high stresses in the food item and three cusps that acted to stabilize the food item. There were also positive relationships between food item size and the coefficients of variation for stresses in food item and optimality, and negative relationships between food item size and the coefficients of variation for stresses in the enamel and strain energy absorbed by the food item. These results suggest that relative food item size may play a role in selecting for optimal tooth shape, and the magnitude of these selective forces may change depending on food item size and which efficiency metric is being selected. PMID:25320068
Development Test II of Time Division Digital Multiplexer TD-1069( )/G

DTIC Science & Technology

1976-11-01

fungi: (1) Aspergillus flavus (2) Aspergillus niger (3) Aspergillus versicolor (4) Penicillium funicolosum (5) Chaetomium globosum c The fungi... Aspergillus flavus and a negligible amount of Aspergillus niger were observed on the exterior surface of the test item. 2-80 ■ ■■--’ — (2) Top...interior. The wire ties maintained a moderate amount of Aspergillus veraicolor and spotted colonies of Penicillium funiculosum. The voltage select
Random one-of-N selector

DOEpatents

Kronberg, J.W.

1993-04-20

An apparatus for selecting at random one item of N items on the average comprising counter and reset elements for counting repeatedly between zero and N, a number selected by the user, a circuit for activating and deactivating the counter, a comparator to determine if the counter stopped at a count of zero, an output to indicate an item has been selected when the count is zero or not selected if the count is not zero. Randomness is provided by having the counter cycle very often while varying the relatively longer duration between activation and deactivation of the count. The passive circuit components of the activating/deactivating circuit and those of the counter are selected for the sensitivity of their response to variations in temperature and other physical characteristics of the environment so that the response time of the circuitry varies. Additionally, the items themselves, which may be people, may vary in shape or the time they press a pushbutton, so that, for example, an ultrasonic beam broken by the item or person passing through it will add to the duration of the count and thus to the randomness of the selection.
Random one-of-N selector

DOEpatents

Kronberg, James W.

1993-01-01

An apparatus for selecting at random one item of N items on the average comprising counter and reset elements for counting repeatedly between zero and N, a number selected by the user, a circuit for activating and deactivating the counter, a comparator to determine if the counter stopped at a count of zero, an output to indicate an item has been selected when the count is zero or not selected if the count is not zero. Randomness is provided by having the counter cycle very often while varying the relatively longer duration between activation and deactivation of the count. The passive circuit components of the activating/deactivating circuit and those of the counter are selected for the sensitivity of their response to variations in temperature and other physical characteristics of the environment so that the response time of the circuitry varies. Additionally, the items themselves, which may be people, may vary in shape or the time they press a pushbutton, so that, for example, an ultrasonic beam broken by the item or person passing through it will add to the duration of the count and thus to the randomness of the selection.
The Piper Fatigue Scale-12 (PFS-12): psychometric findings and item reduction in a cohort of breast cancer survivors.

PubMed

Reeve, Bryce B; Stover, Angela M; Alfano, Catherine M; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B; Piper, Barbara F

2012-11-01

Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study's primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Breast cancer survivors (n = 799; stages in situ through IIIa; ages 29-86 years) were recruited through three SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has four subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale's content validity, items' relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87 to 0.89, compared to 0.90-0.94 prior to item removal. The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African-Americans and Caucasian breast cancer survivors. Further testing in diverse populations is warranted.
The Effect of the Position of an Item within a Test on the Item Difficulty Value.

ERIC Educational Resources Information Center

Rubin, Lois S.; Mott, David E. W.

An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
Effects of adding an Italian theme to a restaurant on the perceived ethnicity, acceptability, and selection of foods.

PubMed

Bell, R; Meiselman, H L; Pierson, B J; Reeve, W G

1994-02-01

We investigated whether a change in the perceived ethnicity of a food can be produced without manipulating the food item itself, and if that change in ethnic perception is accompanied by a change in acceptability and food selection behavior. Italian and British foods were offered in a British restaurant for four days. Foods were offered for 2 days under control conditions, when the restaurant was decorated as usual. The identical foods then were offered in the restaurant for 2 more days under experimental conditions, when ethnic names were used on the menu to describe foods, and the restaurant was decorated with an Italian theme. Perceived ethnicity and acceptability of items were rated by customers each day, and item selection was tracked. The Italian theme increased selection of pasta and dessert items, and decreased the selection of fish. The Italian theme also increased the perceived Italian ethnicity of British pasta items, fish and veal, and increased the perceived Italian ethnicity of the meal overall. These findings show that changes in perceived ethnicity and food selection can be accomplished without altering food items, but merely by manipulating the environment, and may imply a unique strategy for increasing perceived menu variety.
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

ERIC Educational Resources Information Center

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Test item linguistic complexity and assessments for deaf students.

PubMed

Cawthon, Stephanie

2011-01-01

Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
A Computer Adaptive Testing Version of the Addiction Severity Index-Multimedia Version (ASI-MV): The Addiction Severity CAT

PubMed Central

Butler, Stephen F.; Black, Ryan A.; McCaffrey, Stacey A.; Ainscough, Jessica; Doucette, Ann M.

2017-01-01

The purpose of this study was to develop and validate a computer adaptive testing (CAT) version of the Addiction Severity Index-Multimedia Version (ASI-MV®), the Addiction Severity CAT. This goal was accomplished in four steps. First, new candidate items for Addiction Severity CAT domains were evaluated after brainstorming sessions with experts in substance abuse treatment. Next, this new item bank was psychometrically evaluated on a large non-clinical (n =4419) and substance abuse treatment sample (n =845). Based on these results, final items were selected and calibrated for the creation of the Addiction Severity CAT algorithms. Once the algorithms were developed for the entire assessment, a fully functioning prototype of an Addiction Severity CAT was created. CAT simulations were conducted and optimal termination criteria were selected for the Addiction Severity CAT algorithms. Finally, construct validity of the CAT algorithms was evaluated by examining convergent/discriminant validity and sensitivity to change. The Addiction Severity CAT was determined to be valid, sensitive to change, and reliable. Further, the Addiction Severity CAT’s time of administration was found to be significantly less than the average time of administration for the ASI-MV composite scores. This study represents the initial validation of an IRT-based Addiction Severity CAT, and further exploration of the Addiction Severity CAT is needed. PMID:28230387
Motivated encoding selectively promotes memory for future inconsequential semantically-related events.

PubMed

Oyarzún, Javiera P; Packard, Pau A; de Diego-Balaguer, Ruth; Fuentemilla, Lluis

2016-09-01

Neurobiological models of long-term memory explain how memory for inconsequential events fades, unless these happen before or after other relevant (i.e., rewarding or aversive) or novel events. Recently, it has been shown in humans that retrospective and prospective memories are selectively enhanced if semantically related events are paired with aversive stimuli. However, it remains unclear whether motivating stimuli, as opposed to aversive, have the same effect in humans. Here, participants performed a three phase incidental encoding task where one semantic category was rewarded during the second phase. A memory test 24h after, but not immediately after encoding, revealed that memory for inconsequential items was selectively enhanced only if items from the same category had been previously, but not subsequently, paired with rewards. This result suggests that prospective memory enhancement of reward-related information requires, like previously reported for aversive memories, of a period of memory consolidation. The current findings provide the first empirical evidence in humans that the effects of motivated encoding are selectively and prospectively prolonged over time. Copyright © 2016 Elsevier Inc. All rights reserved.
Direction of Wording Effects in Balanced Scales.

ERIC Educational Resources Information Center

Miller, Timothy R.; Cleary, T. Anne

1993-01-01

The degree to which statistical item selection reduces direction-of-wording effects in balanced affective measures developed from relatively small item pools was investigated with 171 male and 228 female undergraduate and graduate students at 2 U.S. universities. Clearest direction-of-wording effects result from selection of items with high…
Assessing Correspondence Following Acquisition of an Exchange-Based Communication System

ERIC Educational Resources Information Center

Sigafoos, Jeff; Ganz, Jennifer B.; O'Reilly, Mark; Lancioni, Giulio E.; Schlosser, Ralf W.

2007-01-01

Two students with developmental disabilities were taught to request six snack items. Requesting involved giving a graphic symbol to the trainer in exchange for the matching snack item. Following acquisition, we assessed the correspondence between requests and subsequent item selections by requiring the student to select the previously requested…
Decision making: rational or hedonic?

PubMed Central

Cabanac, Michel; Bonniot-Cabanac, Marie-Claude

2007-01-01

Three experiments studied the hedonicity of decision making. Participants rated their pleasure/displeasure while reading item-sentences describing political and social problems followed by different decisions (Questionnaire 1). Questionnaire 2 was multiple-choice, grouping the items from Questionnaire 1. In Experiment 1, participants answered Questionnaire 2 rapidly or slowly. Both groups selected what they had rated as pleasant, but the 'leisurely' group maximized pleasure less. In Experiment 2, participants selected the most rational responses. The selected behaviors were pleasant but less than spontaneous behaviors. In Experiment 3, Questionnaire 2 was presented once with items grouped by theme, and once with items shuffled. Participants maximized the pleasure of their decisions, but the items selected on Questionnaires 2 were different when presented in different order. All groups maximized pleasure equally in their decisions. These results support that decisions are made predominantly in the hedonic dimension of consciousness. PMID:17848195
Selecting for memory? The influence of selective attention on the mnemonic binding of contextual information

PubMed Central

Uncapher, Melina R.; Rugg, Michael D.

2009-01-01

Not all of what is experienced is remembered later. Behavioral evidence suggests that the manner in which an event is processed influences which aspects of the event will later be remembered. The present experiment investigated the neural correlates of ‘selective encoding’, or the mechanisms that support the encoding of some elements of an event in preference to others. Event-related functional magnetic resonance imaging (fMRI) data were acquired while volunteers selectively attended to one of two different contextual features of study items (color or location). A surprise memory test for the items and both contextual features was subsequently administered to determine the influence of selective attention on the neural correlates of contextual encoding. Activity in several cortical regions indexed later memory success selectively for color or location information, and this encoding-related activity was enhanced by selective attention to the relevant feature. Critically, a region in the hippocampus responded selectively to attended source information (whether color or location), demonstrating encoding-related activity for attended but not for nonattended source features. Together, the findings suggest that selective attention modulates the magnitude of activity in cortical regions engaged by different aspects of an event, and hippocampal encoding mechanisms seem to be sensitive to this modulation. Thus, the information that is encoded into a memory representation is biased by selective attention, and this bias is mediated by cortico-hippocampal interactions. PMID:19553466
Selecting for memory? The influence of selective attention on the mnemonic binding of contextual information.

PubMed

Uncapher, Melina R; Rugg, Michael D

2009-06-24

Not all of what is experienced is remembered later. Behavioral evidence suggests that the manner in which an event is processed influences which aspects of the event will later be remembered. The present experiment investigated the neural correlates of "selective encoding," or the mechanisms that support the encoding of some elements of an event in preference to others. Event-related MRI data were acquired while volunteers selectively attended to one of two different contextual features of study items (color or location). A surprise memory test for the items and both contextual features was subsequently administered to determine the influence of selective attention on the neural correlates of contextual encoding. Activity in several cortical regions indexed later memory success selectively for color or location information, and this encoding-related activity was enhanced by selective attention to the relevant feature. Critically, a region in the hippocampus responded selectively to attended source information (whether color or location), demonstrating encoding-related activity for attended but not for nonattended source features. Together, the findings suggest that selective attention modulates the magnitude of activity in cortical regions engaged by different aspects of an event, and hippocampal encoding mechanisms seem to be sensitive to this modulation. Thus, the information that is encoded into a memory representation is biased by selective attention, and this bias is mediated by cortical-hippocampal interactions.
Development of the PROMIS negative psychosocial expectancies of smoking item banks.

PubMed

Stucky, Brian D; Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Cerully, Jennifer; Kuhfeld, Megan; Hansen, Mark; Cai, Li

2014-09-01

Negative psychosocial expectancies of smoking include aspects of social disapproval and disappointment in oneself. This paper describes analyses conducted to develop and evaluate item banks for assessing psychosocial expectancies among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of psychosocial expectancies items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess psychosocial expectancies. A total of 21 items were included in the Psychosocial Expectancies item banks: 14 items are common across daily and nondaily smokers, 6 are unique to daily, and 1 is unique to nondaily. For both daily and nondaily smokers, the Psychosocial Expectancies item banks are strongly unidimensional, highly reliable (reliability = 0.95 and 0.93, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.85). Results from simulated CATs showed that, on average, fewer than 8 items are needed to assess psychosocial expectancies with adequate precision when using the item banks. Psychosocial expectancies of smoking can be assessed on the basis of these item banks via the SF, by using CAT, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of an Instrument to Measure Pharmacy Student Attitudes Toward Social Media Professionalism.

PubMed

Chisholm-Burns, Marie A; Spivey, Christina A; Jaeger, Melanie C; Williams, Jennifer; George, Christa

2017-05-01

Objectives. To develop and validate a scale measuring pharmacy students' attitudes toward social media professionalism, and assess the impact of an educational presentation on social media professionalism. Methods. A social media professionalism scale was used in a pre- and post-survey to determine the effects of a social media professionalism presentation. The 26-item scale was administered to 197 first-year pharmacy (P1) students during orientation. Exploratory factor analysis was applied to determine the number of underlying factors responsible for covariation of the data. Principal components analysis was used as the extraction method. Varimax was selected as the rotation method. Cronbach's alpha was estimated. Wilcoxon signed rank test was used to compare pre- and post-scores of each item, subscale, and total scale. Results. There were 187 (95%) students who participated. The final scale had five subscales and 15 items. Subscales were named according to the professionalism tenet they best represented. Scores of items addressing reading/posting to social media during class, an employer's use of social media when making hiring decisions, and a college/university's use of social media as a measure of professional conduct significantly increased from pre-test to post-test. The "honesty and integrity" subscale score also significantly increased. Conclusion. The social media professionalism scale measures five tenets of professionalism and exhibits satisfactory reliability. The presentation improved P1 students' attitudes regarding social media professionalism.
Validity of the Eating Attitude Test among Exercisers.

PubMed

Lane, Helen J; Lane, Andrew M; Matheson, Hilary

2004-12-01

Theory testing and construct measurement are inextricably linked. To date, no published research has looked at the factorial validity of an existing eating attitude inventory for use with exercisers. The Eating Attitude Test (EAT) is a 26-item measure that yields a single index of disordered eating attitudes. The original factor analysis showed three interrelated factors: Dieting behavior (13-items), oral control (7-items), and bulimia nervosa-food preoccupation (6-items). The primary purpose of the study was to examine the factorial validity of the EAT among a sample of exercisers. The second purpose was to investigate relationships between eating attitudes scores and selected psychological constructs. In stage one, 598 regular exercisers completed the EAT. Confirmatory factor analysis (CFA) was used to test the single-factor, a three-factor model, and a four-factor model, which distinguished bulimia from food pre-occupation. CFA of the single-factor model (RCFI = 0.66, RMSEA = 0.10), the three-factor-model (RCFI = 0.74; RMSEA = 0.09) showed poor model fit. There was marginal fit for the 4-factor model (RCFI = 0.91, RMSEA = 0.06). Results indicated five-items showed poor factor loadings. After these 5-items were discarded, the three models were re-analyzed. CFA results indicated that the single-factor model (RCFI = 0.76, RMSEA = 0.10) and three-factor model (RCFI = 0.82, RMSEA = 0.08) showed poor fit. CFA results for the four-factor model showed acceptable fit indices (RCFI = 0.98, RMSEA = 0.06). Stage two explored relationships between EAT scores, mood, self-esteem, and motivational indices toward exercise in terms of self-determination, enjoyment and competence. Correlation results indicated that depressed mood scores positively correlated with bulimia and dieting scores. Further, dieting was inversely related with self-determination toward exercising. Collectively, findings suggest that a 21-item four-factor model shows promising validity coefficients among exercise participants, and that future research is needed to investigate eating attitudes among samples of exercisers. Key PointsValidity of psychometric measures should be thoroughly investigated. Researchers should not assume that a scale validation on one sample will show the same validity coefficients in a different population.The Eating Attitude Test is a commonly used scale. The present study shows a revised 21-item scale was suitable for exercisers.Researchers using the Eating Attitude Test should use subscales of Dieting, Oral control, Food pre-occupation, and Bulimia.Future research should involve qualitative techniques and interview exercise participants to explore the nature of eating attitudes.
Development and psychometric evaluation of an information literacy self-efficacy survey and an information literacy knowledge test.

PubMed

Tepe, Rodger; Tepe, Chabha

2015-03-01

To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. In this test-retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. The IL self-efficacy survey demonstrated good reliability (test-retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test-retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments.

The Bereavement Guilt Scale.

PubMed

Li, Jie; Stroebe, Magaret; Chan, Cecilia L W; Chow, Amy Y M

2017-06-01

The rationale, development, and validation of the Bereavement Guilt Scale (BGS) are described in this article. The BGS was based on a theoretically developed, multidimensional conceptualization of guilt. Part 1 describes the generation of the item pool, derived from in-depth interviews, and review of the scientific literature. Part 2 details statistical analyses for further item selection (Sample 1, N = 273). Part 3 covers the psychometric properties of the emergent-BGS (Sample 2, N = 600, and Sample 3, N = 479). Confirmatory factor analysis indicated that a five-factor model fit the data best. Correlations of BGS scores with depression, anxiety, self-esteem, self-forgiveness, and mode of death were consistent with theoretical predictions, supporting the construct validity of the measure. The internal consistency and test-retest reliability were also supported. Thus, initial testing or examination suggests that the BGS is a valid tool to assess multiple components of bereavement guilt. Further psychometric testing across cultures is recommended.
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

PubMed Central

Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

PubMed

Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

2016-01-01

The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
EORTC QLQ-COMU26: a questionnaire for the assessment of communication between patients and professionals. Phase III of the module development in ten countries.

PubMed

Arraras, Juan Ignacio; Wintner, Lisa M; Sztankay, Monika; Tomaszewski, Krzysztof A; Hofmeister, Dirk; Costantini, Anna; Bredart, Anne; Young, Teresa; Kuljanic, Karin; Tomaszewska, Iwona M; Kontogianni, Meropi; Chie, Wei-Chu; Kulis, Dagmara; Greimel, Eva

2017-05-01

Communication between patients and professionals is one major aspect of the support offered to cancer patients. The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group (QLG) has developed a cancer-specific instrument for the measurement of different issues related to the communication between cancer patients and their health care professionals. Questionnaire development followed the EORTC QLG Module Development Guidelines. A provisional questionnaire was pre-tested (phase III) in a multicenter study within ten countries from five cultural areas (Northern and South Europe, UK, Poland and Taiwan). Patients from seven subgroups (before, during and after treatment, for localized and advanced disease each, plus palliative patients) were recruited. Structured interviews were conducted. Qualitative and quantitative analyses have been performed. One hundred forty patients were interviewed. Nine items were deleted and one shortened. Patients' comments had a key role in item selection. No item was deleted due to just quantitative criteria. Consistency was observed in patients' answers across cultural areas. The revised version of the module EORTC QLQ-COMU26 has 26 items, organized in 6 scales and 4 individual items. The EORTC COMU26 questionnaire can be used in daily clinical practice and research, in various patient groups from different cultures. The next step will be an international field test with a large heterogeneous group of cancer patients.
To call a cloud 'cirrus': sound symbolism in names for categories or items.

PubMed

Ković, Vanja; Sučević, Jelena; Styles, Suzy J

2017-01-01

The aim of the present paper is to experimentally test whether sound symbolism has selective effects on labels with different ranges-of-reference within a simple noun-hierarchy. In two experiments, adult participants learned the make up of two categories of unfamiliar objects ('alien life forms'), and were passively exposed to either category-labels or item-labels, in a learning-by-guessing categorization task. Following category training, participants were tested on their visual discrimination of object pairs. For different groups of participants, the labels were either congruent or incongruent with the objects. In Experiment 1, when trained on items with individual labels, participants were worse (made more errors) at detecting visual object mismatches when trained labels were incongruent. In Experiment 2, when participants were trained on items in labelled categories, participants were faster at detecting a match if the trained labels were congruent, and faster at detecting a mismatch if the trained labels were incongruent. This pattern of results suggests that sound symbolism in category labels facilitates later similarity judgments when congruent, and discrimination when incongruent, whereas for item labels incongruence generates error in judgements of visual object differences. These findings reveal that sound symbolic congruence has a different outcome at different levels of labelling within a noun hierarchy. These effects emerged in the absence of the label itself, indicating subtle but pervasive effects on visual object processing.
Measuring assessment standards in undergraduate medical programs: Development and validation of AIM tool.

PubMed

Sajjad, Madiha; Khan, Rehan Ahmed; Yasmeen, Rahila

2018-01-01

To develop a tool to evaluate faculty perceptions of assessment quality in an undergraduate medical program. The Assessment Implementation Measure (AIM) tool was developed by a mixed method approach. A preliminary questionnaire developed through literature review was submitted to a panel of 10 medical education experts for a three-round 'Modified Delphi technique'. Panel agreement of > 75% was considered the criterion for inclusion of items in the questionnaire. Cognitive pre-testing of five faculty members was conducted. Pilot study was done with 30 randomly selected faculty members. Content validity index (CVI) was calculated for individual items (I-CVI) and composite scale (S-CVI). Cronbach's alpha was calculated to determine the internal consistency reliability of the tool. The final AIM tool had 30 items after the Delphi process. S-CVI was 0.98 with the S-CVI/Avg method and 0.86 by S-CVI/UA method, suggesting good content validity. Cut-off value of < 0.9 I-CVI was taken as criterion for item deletion. Cognitive pre-testing revealed good item interpretation. Cronbach's alpha calculated for the AIM was 0.9, whereas Cronbach's alpha for the four domains ranged from 0.67 to 0.80. 'AIM' is a relevant and useful instrument with good content validity and reliability of results, and may be used to evaluate the teachers´ perceptions about assessment quality.
A Process for Reviewing and Evaluating Generated Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2016-01-01

Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…
Aerobic fitness and executive control of relational memory in preadolescent children.

PubMed

Chaddock, Laura; Hillman, Charles H; Buck, Sarah M; Cohen, Neal J

2011-02-01

the neurocognitive benefits of an active lifestyle in childhood have public health and educational implications, especially as children in today's technological society are becoming increasingly overweight, unhealthy, and unfit. Human and animal studies show that aerobic exercise affects both prefrontal executive control and hippocampal function. This investigation attempts to bridge these research threads by using a cognitive task to examine the relationship between aerobic fitness and executive control of relational memory in preadolescent 9- and 10-yr-old children. higher-fit and lower-fit children studied faces and houses under individual item (i.e., nonrelational) and relational encoding conditions, and the children were subsequently tested with recognition memory trials consisting of previously studied pairs and pairs of completely new items. With each subject participating in both item and relational encoding conditions, and with recognition test trials amenable to the use of both item and relational memory cues, this task afforded a challenge to the flexible use of memory, specifically in the use of appropriate encoding and retrieval strategies. Hence, the task provided a test of both executive control and memory processes. lower-fit children showed poorer recognition memory performance than higher-fit children, selectively in the relational encoding condition. No association between aerobic fitness and recognition performance was found for faces and houses studied as individual items (i.e., nonrelationally). the findings implicate childhood aerobic fitness as a factor in the ability to use effective encoding and retrieval executive control processes for relational memory material and, possibly, in the strategic engagement of prefrontal- and hippocampal-dependent systems.
Cross-Culture Validation of the HIV/AIDS Stress Scale: The Development of a Revised Chinese Version.

PubMed

Niu, Lu; Qiu, Yangyang; Luo, Dan; Chen, Xi; Wang, Min; Pakenham, Kenneth I; Zhang, Xixing; Huang, Zhulin; Xiao, Shuiyuan

2016-01-01

Being HIV-infected is a stressful experience for many individuals. To assess HIV-related stress in the Chinese context, a measure with satisfied psychometric properties is yet underdeveloped. This study aimed to examine the psychometric characteristics of a simplified Chinese version of the HIV/AIDS Stress Scale (SS-HIV) among people living with HIV/AIDS in central China. A total of 667 people living with HIV (92% were male) were recruited from March 1st 2014 to August 31th 2015 by consecutive sampling. A standard questionnaire package containing the Chinese HIV/AIDS Stress Scale (CSS-HIV), the Chinese Patient Health Questionnaire-9 (PHQ-9), and the Chinese Generalized Anxiety Disorder Scale (GAD-7) were administered to all participants, and 38 of the participants were selected randomly to be re-tested in four weeks after the initial testing. Our data supported that a revised 17-item CSS-HIV had adequate psychometric properties. It consisted of 3 factors: emotional stress (6 items), social stress (6 items) and instrumental stress (5 items). The overall Cronbach's α was 0.906, and the test-retest reliability coefficient was 0.832. The revised CSS-HIV was significantly correlated with the number of HIV-related symptoms, as well as scores on the PHQ-9 and GAD-7, indicating acceptable concurrent validity. The 17-item Chinese version of the SS-HIV has potential research and clinical utility in identifying important stressors among the Chinese HIV-infected population and in understanding the effects of stress on adjustment to HIV.
What's in a Topic? Exploring the Interaction between Test-Taker Age and Item Content in High-Stakes Testing

ERIC Educational Resources Information Center

Banerjee, Jayanti; Papageorgiou, Spiros

2016-01-01

The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Selecting Soldiers and Civilians into the U.S. Army Officer Candidate School : Developing Empirical Selection Composites

DTIC Science & Technology

2014-07-01

a biographical instrument measuring personality ; (b) a Work Values instrument representing work preferences investigated in prior officer and...items used in SelectOCS Phase 2 (see Table 2.5). TAPAS uses multidimensional pairwise preference (MDPP) personality items scored using item response...presented respondents with a list of 30 traits and 30 skills (derived from leadership and personality literature) and instructed them to rate the
The impact of experimental measurement errors on long-term viscoelastic predictions. [of structural materials

NASA Technical Reports Server (NTRS)

Tuttle, M. E.; Brinson, H. F.

1986-01-01

The impact of flight error in measured viscoelastic parameters on subsequent long-term viscoelastic predictions is numerically evaluated using the Schapery nonlinear viscoelastic model. Of the seven Schapery parameters, the results indicated that long-term predictions were most sensitive to errors in the power law parameter n. Although errors in the other parameters were significant as well, errors in n dominated all other factors at long times. The process of selecting an appropriate short-term test cycle so as to insure an accurate long-term prediction was considered, and a short-term test cycle was selected using material properties typical for T300/5208 graphite-epoxy at 149 C. The process of selection is described, and its individual steps are itemized.
Applying Item Response Theory methods to design a learning progression-based science assessment

NASA Astrophysics Data System (ADS)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all the defined boundaries. This ensures the accuracy of the classification. Third, when item threshold parameters vary a bit, the scoring rubrics and the items need to be reviewed to make the threshold parameters similar across items. This is because one important design criterion of the learning progression-based items is that ideally, a student should be at the same level across items, which means that the item threshold parameters (d1, d 2 and d3) should be similar across items. To design a learning progression-based science assessment, we need to understand whether the assessment measures a single construct or several constructs and how items are associated with the constructs being measured. Results from dimension analyses indicate that items of different carbon transforming processes measure different aspects of the carbon cycle construct. However, items of different practices assess the same construct. In general, there are high correlations among different processes or practices. It is not clear whether the strong correlations are due to the inherent links among these process/practice dimensions or due to the fact that the student sample does not show much variation in these process/practice dimensions. Future data are needed to examine the dimensionalities in terms of process/practice in detail. Finally, based on item characteristics analysis, recommendations are made to write more discriminative CR items and better OMC, MTF options. Item writers can follow these recommendations to write better learning progression-based items.
The Swiss Health Literacy Survey: development and psychometric properties of a multidimensional instrument to assess competencies for health

PubMed Central

Wang, Jen; Thombs, Brett D.; Schmid, Margareta R.

2012-01-01

Abstract Background Growing recognition of the role of citizens and patients in health and health care has placed a spotlight on health literacy and patient education. Objective To identify specific competencies for health in definitions of health literacy and patient‐centred concepts and empirically test their dimensionality in the general population. Methods A thorough review of the literature on health literacy, self‐management, patient empowerment, patient education and shared decision making revealed considerable conceptual overlap as competencies for health and identified a corpus of 30 generic competencies for health. A questionnaire containing 127 items covering the 30 competencies was fielded as a telephone interview in German, French and Italian among 1255 respondents randomly selected from the resident population in Switzerland. Findings Analyses with the software MPlus to model items with mixed response categories showed that the items do not load onto a single factor. Multifactorial models with good fit could be erected for each of five dimensions defined a priori and their corresponding competencies: information and knowledge (four competencies, 17 items), general cognitive skills (four competencies, 17 items), social roles (two competencies, seven items), medical management (four competencies, 27 items) and healthy lifestyle (two competencies, six items). Multiple indicators and multiple causes models identified problematic differential item functioning for only six items belonging to two competencies. Conclusions The psychometric analyses of this instrument support broader conceptualization of health literacy not as a single competence but rather as a package of competencies for health. PMID:22390287
77 FR 51580 - Advisory Committee on Reactor Safeguards; Notice of Meeting

Federal Register 2010, 2011, 2012, 2013, 2014

2012-08-24

... Pike, Rockville, Maryland. Thursday, September 6, 2012, Conference Room T2-B1, 11545 Rockville Pike... p.m.-3:15 p.m.: Selected Chapters of the Safety Evaluation Reports (SERs) with Open Items Associated... and Peach Bottom,'' and (2) NUREG/CR-7040, ``Evaluation of JNES Equipment Fragility Tests for Use in...
The Influence of Distractor Strength and Response Order on MCQ Responding

ERIC Educational Resources Information Center

Kiat, John Emmanuel; Ong, Ai Rene; Ganesan, Asha

2018-01-01

Multiple-choice questions (MCQs) play a key role in standardised testing and in-class assessment. Research into the influence of within-item response order on MCQ characteristics has been mixed. While some researchers have shown preferential selection of response options presented earlier in the answer list, others have failed to replicate these…
Defining and Comparing the Reading Comprehension Construct: A Cognitive-Psychometric Modeling Approach

ERIC Educational Resources Information Center

Svetina, Dubravka; Gorin, Joanna S.; Tatsuoka, Kikumi K.

2011-01-01

As a construct definition, the current study develops a cognitive model describing the knowledge, skills, and abilities measured by critical reading test items on a high-stakes assessment used for selection decisions in the United States. Additionally, in order to establish generalizability of the construct meaning to other similarly structured…
Growing Gardens, Growing Minds

ERIC Educational Resources Information Center

Hebert, Terri; Martin, Deb; Slattery, Tracy

2014-01-01

The authors present a program where students and family members were involved in a taste-testing to select the items to be planted in the school's garden at Stephenson Elementary. A simple rubric of facial recognition is used. Smiles for the favorites; frowns for the disqualifiers. With the help of the school's leadership team consisting…
Item validity vs. item discrimination index: a redundancy?

NASA Astrophysics Data System (ADS)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
Dimensionality and predictive validity of the HAM-Nat, a test of natural sciences for medical school admission

PubMed Central

2011-01-01

Background Knowledge in natural sciences generally predicts study performance in the first two years of the medical curriculum. In order to reduce delay and dropout in the preclinical years, Hamburg Medical School decided to develop a natural science test (HAM-Nat) for student selection. In the present study, two different approaches to scale construction are presented: a unidimensional scale and a scale composed of three subject specific dimensions. Their psychometric properties and relations to academic success are compared. Methods 334 first year medical students of the 2006 cohort responded to 52 multiple choice items from biology, physics, and chemistry. For the construction of scales we generated two random subsamples, one for development and one for validation. In the development sample, unidimensional item sets were extracted from the item pool by means of weighted least squares (WLS) factor analysis, and subsequently fitted to the Rasch model. In the validation sample, the scales were subjected to confirmatory factor analysis and, again, Rasch modelling. The outcome measure was academic success after two years. Results Although the correlational structure within the item set is weak, a unidimensional scale could be fitted to the Rasch model. However, psychometric properties of this scale deteriorated in the validation sample. A model with three highly correlated subject specific factors performed better. All summary scales predicted academic success with an odds ratio of about 2.0. Prediction was independent of high school grades and there was a slight tendency for prediction to be better in females than in males. Conclusions A model separating biology, physics, and chemistry into different Rasch scales seems to be more suitable for item bank development than a unidimensional model, even when these scales are highly correlated and enter into a global score. When such a combination scale is used to select the upper quartile of applicants, the proportion of successful completion of the curriculum after two years is expected to rise substantially. PMID:21999767

Dimensionality and predictive validity of the HAM-Nat, a test of natural sciences for medical school admission.

PubMed

Hissbach, Johanna C; Klusmann, Dietrich; Hampe, Wolfgang

2011-10-14

Knowledge in natural sciences generally predicts study performance in the first two years of the medical curriculum. In order to reduce delay and dropout in the preclinical years, Hamburg Medical School decided to develop a natural science test (HAM-Nat) for student selection. In the present study, two different approaches to scale construction are presented: a unidimensional scale and a scale composed of three subject specific dimensions. Their psychometric properties and relations to academic success are compared. 334 first year medical students of the 2006 cohort responded to 52 multiple choice items from biology, physics, and chemistry. For the construction of scales we generated two random subsamples, one for development and one for validation. In the development sample, unidimensional item sets were extracted from the item pool by means of weighted least squares (WLS) factor analysis, and subsequently fitted to the Rasch model. In the validation sample, the scales were subjected to confirmatory factor analysis and, again, Rasch modelling. The outcome measure was academic success after two years. Although the correlational structure within the item set is weak, a unidimensional scale could be fitted to the Rasch model. However, psychometric properties of this scale deteriorated in the validation sample. A model with three highly correlated subject specific factors performed better. All summary scales predicted academic success with an odds ratio of about 2.0. Prediction was independent of high school grades and there was a slight tendency for prediction to be better in females than in males. A model separating biology, physics, and chemistry into different Rasch scales seems to be more suitable for item bank development than a unidimensional model, even when these scales are highly correlated and enter into a global score. When such a combination scale is used to select the upper quartile of applicants, the proportion of successful completion of the curriculum after two years is expected to rise substantially.
Development of a PROMIS item bank to measure pain interference.

PubMed

Amtmann, Dagmar; Cook, Karon F; Jensen, Mark P; Chen, Wen-Hung; Choi, Seung; Revicki, Dennis; Cella, David; Rothrock, Nan; Keefe, Francis; Callahan, Leigh; Lai, Jin-Shei

2010-07-01

This paper describes the psychometric properties of the PROMIS-pain interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on the review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue=35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96-0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p<0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available. Copyright 2010 International Association for the Study of Pain. All rights reserved.
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Assessing child and adolescent pragmatic language competencies: toward evidence-based assessments.

PubMed

Russell, Robert L; Grizzle, Kenneth L

2008-06-01

Using language appropriately and effectively in social contexts requires pragmatic language competencies (PLCs). Increasingly, deficits in PLCs are linked to child and adolescent disorders, including autism spectrum, externalizing, and internalizing disorders. As the role of PLCs expands in diagnosis and treatment of developmental psychopathology, psychologists and educators will need to appraise and select clinical and research PLC instruments for use in assessments and/or studies. To assist in this appraisal, 24 PLC instruments, containing 1,082 items, are assessed by addressing four questions: (1) Can PLC domains targeted by assessment items be reliably identified?, (2) What are the core PLC domains that emerge across the 24 instruments?, (3) Do PLC questionnaires and tests assess similar PLC domains?, and (4) Do the instruments achieve content, structural, diagnostic, and ecological validity? Results indicate that test and questionnaire items can be reliably categorized into PLC domains, that PLC domains featured in questionnaires and tests significantly differ, and that PLC instruments need empirical confirmation of their dimensional structure, content validity across all developmental age bands, and ecological validity. Progress in building a better evidence base for PLC assessments should be a priority in future research.
The revised transliminality scale: reliability and validity data from a Rasch top-down purification procedure.

PubMed

Lange, R; Thalbourne, M A; Houran, J; Storm, L

2000-12-01

The concept of transliminality ("a hypothesized tendency for psychological material to cross thresholds into or out of consciousness") was anticipated by William James (1902/1982), but it was only recently given an empirical definition by Thalbourne in terms of a 29-item Transliminality Scale. This article presents the 17-item Revised Transliminality Scale (or RTS) that corrects age and gender biases, is unidimensional by a Rasch criterion, and has a reliability of.82. The scale defines a probabilistic hierarchy of items that address magical ideation, mystical experience, absorption, hyperaesthesia, manic experience, dream interpretation, and fantasy proneness. These findings validate the suggestions by James and Thalbourne that some mental phenomena share a common underlying dimension with selected sensory experiences (such being overwhelmed by smells, bright lights, sights, and sounds). Low scores on transliminality remain correlated with "tough mindedness" in on Cattell 16PF test, as well as "self-control" and "rule consciousness," whereas high scores are associated with "abstractedness" and an "openness to change" on that test. An independent validation study confirmed the predictions implied by our definition of transliminality. Implications for test construction are discussed. Copyright 2000 Academic Press.
Emperical Tests of Acceptance Sampling Plans

NASA Technical Reports Server (NTRS)

White, K. Preston, Jr.; Johnson, Kenneth L.

2012-01-01

Acceptance sampling is a quality control procedure applied as an alternative to 100% inspection. A random sample of items is drawn from a lot to determine the fraction of items which have a required quality characteristic. Both the number of items to be inspected and the criterion for determining conformance of the lot to the requirement are given by an appropriate sampling plan with specified risks of Type I and Type II sampling errors. In this paper, we present the results of empirical tests of the accuracy of selected sampling plans reported in the literature. These plans are for measureable quality characteristics which are known have either binomial, exponential, normal, gamma, Weibull, inverse Gaussian, or Poisson distributions. In the main, results support the accepted wisdom that variables acceptance plans are superior to attributes (binomial) acceptance plans, in the sense that these provide comparable protection against risks at reduced sampling cost. For the Gaussian and Weibull plans, however, there are ranges of the shape parameters for which the required sample sizes are in fact larger than the corresponding attributes plans, dramatically so for instances of large skew. Tests further confirm that the published inverse-Gaussian (IG) plan is flawed, as reported by White and Johnson (2011).
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

ERIC Educational Resources Information Center

Sahin, Alper; Anil, Duygu

2017-01-01

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
[Perceptions on item disclosure for the Korean medical licensing examination].

PubMed

Yang, Eunbae B

2015-09-01

This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
Inductive Selectivity in Children's Cross-Classified Concepts

ERIC Educational Resources Information Center

Nguyen, Simone P.

2012-01-01

Cross-classified items pose an interesting challenge to children's induction as these items belong to many different categories, each of which may serve as a basis for a different type of inference. Inductive selectivity is the ability to appropriately make different types of inferences about a single cross-classifiable item based on its different…
A Review of Classical Methods of Item Analysis.

ERIC Educational Resources Information Center

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Modeling Item-Position Effects within an IRT Framework

ERIC Educational Resources Information Center

Debeer, Dries; Janssen, Rianne

2013-01-01

Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
Preschoolers’ Novel Noun Extensions: Shape in Spite of Knowing Better

PubMed Central

Saalbach, Henrik; Schalk, Lennart

2011-01-01

We examined the puzzling research findings that when extending novel nouns, preschoolers rely on shape similarity (rather than categorical relations) while in other task contexts (e.g., property induction) they rely on categorical relations. Taking into account research on children’s word learning, categorization, and inductive inference we assume that preschoolers have both a shape-based and a category-based word extension strategy available and can switch between these two depending on which information is easily available. To this end, we tested preschoolers on two versions of a novel-noun label extension task. First, we paralleled the standard extension task commonly used by previous research. In this case, as expected, preschoolers predominantly selected same-shape items. Second, we supported preschoolers’ retrieval of item-related information from memory by asking them simple questions about each item prior to the label extension task. Here, they switched to a category-based strategy, thus, predominantly selecting same-category items. Finally, we revealed that this shape-to-category shift is specific to the word learning context as we did not find it in a non-lexical classification task. These findings support our assumption that preschoolers’ decision about word extension change in accordance with the availability of information (from task context or by memory retrieval). We conclude by suggesting that preschoolers’ noun extensions can be conceptualized within the framework of heuristic decision-making. This provides an ecologically plausible processing account with respect to which information is selected and how this information is integrated to act as a guideline for decision-making when novel words have to be generalized. PMID:22073036
Binding of Visual and Spatial Short-Term Memory in Williams Syndrome and Moderate Learning Disability

ERIC Educational Resources Information Center

Jarrold, Christopher; Phillips, Caroline; Baddeley, Alan D

2007-01-01

A main aim of this study was to test the claim that individuals with Williams syndrome have selectively impaired memory for spatial as opposed to visual information. The performance of 16 individuals with Williams syndrome (six males, 10 females; mean age 18y 7mo [SD 7y 6mo], range 9y 1mo-30y 7mo) on tests of short-term memory for item and…
Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
[Development of a scale to measure the self concept of cesarean section mothers].

PubMed

Lee, M L; Cho, J H

1990-08-01

Recently, the rate of cesarean section in Korea has been increasing. The results of several previous studies in foreign countries on the emotional responses of cesarean section mothers showed that they might experience difficulties in the mother-infant interaction due to fatigue, lack of early mother-infant interaction, disappointments, anger, feelings of loss of control, and other factors. Human behavior is said to be determined by one's self concept, and self concept is influenced by both internal and external environmental factors. A scale to measure the self concept of cesarean section mothers was needed in order to identify those who might have difficulties in the mother-infant interactions in future. The purposes of this study were to develop a measuring scale, and to test its reliability and validity. The process of this study was as follows. A structured interview was done with 50 cesarean section and vaginal delivery mothers to find their state of emotional reaction after giving birth to their babies. Based on the results of the interviews, a 50 items Likert scale was developed. The self concept of 268 cesarean section and vaginal delivery mothers who were hospitalized at six hospital in seoul were measured, during the period between Feb. 1 and April 30. Reviewing the discriminating power of each item by means of crosstabulation, ten items were selected for the final scale. The reliability and validity of this ten item scale were tested by Cronbach's alpha and t-test, using spss pc + package. The results of this study and recommendation are as follows. 1. The ten selected items were as follows. I feel pains in my breast. (-) I have a good appetite now. (+) I feel pains in my flank. (-) I feel fine now. (+) My body seems to have returned to its prepregnant state. (+) Thinking of the delivery process, I feel sorry. (-) I want to hold my baby in my arms. (+) I want to keep my own life, even if I became a mother. (-) I want to delegate the care of the baby to my mother/mother in law. (-) I think baby is my alter ege. (+) 2. The reliability of this scale was tested by Cronbach's alpha, and the coefficient of this scale was .8066. 3. The construct validity of this scale was tested by means of known group methods. The value of self concept for cesarean section mother was significantly lower than for vaginal delivery mothers (t = -5.51, df = 266, p = 0.007).(ABSTRACT TRUNCATED AT 400 WORDS)
Confirming the cognition of rising scores: Fox and Mitchum (2013) predicts violations of measurement invariance in series completion between age-matched cohorts.

PubMed

Fox, Mark C; Mitchum, Ainsley L

2014-01-01

The trend of rising scores on intelligence tests raises important questions about the comparability of variation within and between time periods. Descriptions of the processes that mediate selection of item responses provide meaningful psychological criteria upon which to base such comparisons. In a recent paper, Fox and Mitchum presented and tested a cognitive theory of rising scores on analogical and inductive reasoning tests that is specific enough to make novel predictions about cohort differences in patterns of item responses for tests such as the Raven's Matrices. In this paper we extend the same proposal in two important ways by (1) testing it against a dataset that enables the effects of cohort to be isolated from those of age, and (2) applying it to two other inductive reasoning tests that exhibit large Flynn effects: Letter Series and Word Series. Following specification and testing of a confirmatory item response model, predicted violations of measurement invariance are observed between two age-matched cohorts that are separated by only 20 years, as members of the later cohort are found to map objects at higher levels of abstraction than members of the earlier cohort who possess the same overall level of ability. Results have implications for the Flynn effect and cognitive aging while underscoring the value of establishing psychological criteria for equating members of distinct groups who achieve the same scores.
Consensus on measurement properties and feasibility of performance tests for the exercise and sport sciences: a Delphi study.

PubMed

Robertson, Sam; Kremer, Peter; Aisbett, Brad; Tran, Jacqueline; Cerin, Ester

2017-12-01

Performance tests are used for multiple purposes in exercise and sport science. Ensuring that a test displays an appropriate level of measurement properties for use within a population is important to ensure confidence in test findings. The aim of this study was to obtain subject matter expert consensus on the measurement and feasibility properties that should be considered for performance tests used in the exercise and sport sciences and how these should be defined. This information was used to develop a checklist for broader dissemination. A two-round Delphi study was undertaken including 33 exercise scientists, academics and sport scientists. Participants were asked to rate the importance of a range of measurement properties relevant to performance tests in exercise and sport science. Responses were obtained in binary and Likert-scale formats, with consensus defined as achieving 67% agreement on each question. Consensus was reached on definitions and terminology for all items. Ten level 1 items (those that achieved consensus on all four questions) and nine level 2 items (those achieving consensus on ≥2 questions) were included. Both levels were included in the final checklist. The checklist developed from this study can be used to inform decision-making and test selection for practitioners and researchers in the exercise and sport sciences. This can facilitate knowledge sharing and performance comparisons across sub-disciplines, thereby improving existing field practice and research methodological quality.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.