Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun
2016-01-01
The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
2016-01-01
Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
ERIC Educational Resources Information Center
Baghaei, Purya; Ravand, Hamdollah
2016-01-01
In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
Chan, Raymond Javan; Yates, Patsy; McCarthy, Alexandra L
Fatigue is one of the most distressing and commonly experienced symptoms in patients with advanced cancer. Although the self-management (SM) of cancer-related symptoms has received increasing attention, no research instrument assessing fatigue SM outcomes for patients with advanced cancer is available. The aim of this study was to describe the development and preliminary testing of an interviewer-administered instrument for assessing the frequency and perceived levels of effectiveness and self-efficacy associated with fatigue SM behaviors in patients with advanced cancer. The development and testing of the Self-efficacy in Managing Symptoms Scale-Fatigue Subscale for Patients With Advanced Cancer (SMSFS-A) involved a number of procedures: item generation using a comprehensive literature review and semistructured interviews, content validity evaluation using expert panel reviews, and face validity and test-retest reliability evaluation using pilot testing. Initially, 23 items (22 specific behaviors with 1 global item) were generated from the literature review and semistructured interviews. After 2 rounds of expert panel review, the final scale was reduced to 17 items (16 behaviors with 1 global item). Participants in the pilot test (n = 10) confirmed that the questions in this scale were clear and easy to understand. Bland-Altman analysis showed agreement of results over a 1-week interval. The SMSFS-A items were generated using multiple sources. This tool demonstrated preliminary validity and reliability. The SMSFS-A has the potential to be used for clinical and research purposes. Nurses can use this instrument for collecting data to inform the initiation of appropriate fatigue SM support for this population.
Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.
ERIC Educational Resources Information Center
Smith, Clifton L.; And Others
This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…
ERIC Educational Resources Information Center
Bennett, Randy Elliot; And Others
1990-01-01
The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Construction and Analysis of Educational Tests Using Abductive Machine Learning
ERIC Educational Resources Information Center
El-Alfy, El-Sayed M.; Abdel-Aal, Radwan E.
2008-01-01
Recent advances in educational technologies and the wide-spread use of computers in schools have fueled innovations in test construction and analysis. As the measurement accuracy of a test depends on the quality of the items it includes, item selection procedures play a central role in this process. Mathematical programming and the item response…
ERIC Educational Resources Information Center
Hole, Arne; Grønmo, Liv Sissel; Onstad, Torgeir
2018-01-01
Background: This paper discusses a framework for analyzing the dependence on mathematical theory in test items, that is, a framework for discussing to what extent knowledge of mathematical theory is helpful for the student in solving the item. The framework can be applied to any test in which some knowledge of mathematical theory may be useful,…
Devine, J; Otto, C; Rose, M; Barthel, D; Fischer, F; Mühlan, H; Mülhan, H; Nolte, S; Schmidt, S; Ottova-Jordan, V; Ravens-Sieberer, U
2015-04-01
Assessing health-related quality of life (HRQoL) via Computerized Adaptive Tests (CAT) provides greater measurement precision coupled with a lower test burden compared to conventional tests. Currently, there are no European pediatric HRQoL CATs available. This manuscript aims at describing the development of a HRQoL CAT for children and adolescents: the Kids-CAT, which was developed based on the established KIDSCREEN-27 HRQoL domain structure. The Kids-CAT was developed combining classical test theory and item response theory methods and using large archival data of European KIDSCREEN norm studies (n = 10,577-19,580). Methods were applied in line with the US PROMIS project. Item bank development included the investigation of unidimensionality, local independence, exploration of Differential Item Functioning (DIF), evaluation of Item Response Curves (IRCs), estimation and norming of item parameters as well as first CAT simulations. The Kids-CAT was successfully built covering five item banks (with 26-46 items each) to measure physical well-being, psychological well-being, parent relations, social support and peers, and school well-being. The Kids-CAT item banks proved excellent psychometric properties: high content validity, unidimensionality, local independence, low DIF, and model conform IRCs. In CAT simulations, seven items were needed to achieve a measurement precision between .8 and .9 (reliability). It has a child-friendly design, is easy accessible online and gives immediate feedback reports of scores. The Kids-CAT has the potential to advance pediatric HRQoL measurement by making it less burdensome and enhancing the patient-doctor communication.
Development of an instrument to measure self-efficacy in caregivers of people with advanced cancer.
Ugalde, Anna; Krishnasamy, Meinir; Schofield, Penelope
2013-06-01
Informal caregivers of people with advanced cancer experience many negative impacts as a result of their role. There is a lack of suitable measures specifically designed to assess their experience. This study aimed to develop a new measure to assess self-efficacy in caregivers of people with advanced cancer. The development and testing of the new measure consisted of four separate, sequential phases: generation of issues, development of issues into items, pilot testing and field testing. In the generation of issues, 17 caregivers were interviewed to generate data. These data were analysed to generate codes, which were then systematically developed into items to construct the instrument. The instrument was pilot tested with 14 health professionals and five caregivers. It was then administered to a large sample for field testing to establish the psychometric properties, with established measures including the Brief Cope and the Family Appraisals for Caregiving Questionnaire for Palliative Care. Ninety-four caregivers completed the questionnaire booklet to establish the factor structure, reliability and validity. The factor analysis resulted in a 21-item, four-factor instrument, with the subscales being termed Resilience, Self-Maintenance, Emotional Connectivity and Instrumental Caregiving. The test-retest reliability and internal consistency were both excellent, ranging from 0.73 to 0.85 and 0.81 to 0.94, respectively. Six convergent and divergent hypotheses were made, and five were supported. This study has developed a new instrument to assess self-efficacy in caregivers of people with advanced cancer. The result is a four-factor, 21-item instrument with demonstrated reliability and validity. Copyright © 2012 John Wiley & Sons, Ltd.
Fundamentals of Marketing Core Curriculum. Test Items and Assessment Techniques.
ERIC Educational Resources Information Center
Smith, Clifton L.; And Others
This document contains multiple choice test items and assessment techniques for Missouri's fundamentals of marketing core curriculum. The core curriculum is divided into these nine occupational duties: (1) communications in marketing; (2) economics and marketing; (3) employment and advancement; (4) human relations in marketing; (5) marketing…
The development and validation of the advance care planning questionnaire in Malaysia.
Lai, Pauline Siew Mei; Mohd Mudri, Salinah; Chinna, Karuthan; Othman, Sajaratulnisah
2016-10-18
Advance care planning is a voluntary process whereby individual preferences, values and beliefs are used to aid a person in planning for end-of-life care. Currently, there is no local instrument to assess an individual's awareness and attitude towards advance care planning. This study aimed to develop an Advance Care Planning Questionnaire and to determine its validity and reliability among older people in Malaysia. The Advance Care Planning Questionnaire was developed based on literature review. Face and content validity was verified by an expert panel, and piloted among 15 participants. Our study was conducted from October 2013 to February 2014, at an urban primary care clinic in Malaysia. Included were those aged >50 years, who could understand English. A retest was conducted 2 weeks after the first administration. Participants from the pilot study did not encounter any problems in answering the Advance Care Planning Questionnaire. Hence, no further modifications were made. Flesch reading ease was 71. The final version of the Advance Care Planning Questionnaire consists of 66 items: 30 items were measured on a nominal scale, whilst 36 items were measured on a Likert-like scale; of which we were only able to validate 22 items, as the remaining 14 items were descriptive in nature. A total of 245 eligible participants were approached; of which 230 agreed to participate (response rate = 93.9 %). Factor analysis on the 22 items measured on a Likert-scale revealed four domains: "feelings regarding advance care planning", "justifications for advance care planning", "justifications for not having advance care planning: fate and religion", and "justifications for not having advance care planning: avoid thinking about death". The Cronbach's alpha values for items each domain ranged from 0.637-0.915. In test-retest, kappa values ranged from 0.738-0.947. The final Advance Care Planning Questionnaire consisted of 63 items and 4 domains. It was found to be a valid and reliable instrument to assess the awareness and attitude of older people in Malaysia towards advance care planning.
ERIC Educational Resources Information Center
Vigneau, Francois; Bors, Douglas A.
2008-01-01
Various taxonomies of Raven's Advanced Progressive Matrices (APM) items have been proposed in the literature to account for performance on the test. In the present article, three such taxonomies based on information processing, namely Carpenter, Just and Shell's [Carpenter, P.A., Just, M.A., & Shell, P., (1990). What one intelligence test…
Understanding the Equals Sign as a Gateway to Algebraic Thinking
ERIC Educational Resources Information Center
Matthews, Percival G.; Rittle-Johnson, Bethany; Taylor, Roger S.; McEldoon, Katherine L.
2010-01-01
In this study, the authors wanted to examine whether success on items testing basic equivalence knowledge, such as the meaning of the equal sign and ability to solve problems such as 3 + 5 = 4 + _, predicted success on items testing more advanced algebraic thinking (i.e. principles of equality and solving equations that use letter variables). This…
ERIC Educational Resources Information Center
Domyancich, John M.
2014-01-01
Multiple-choice questions are an important part of large-scale summative assessments, such as the advanced placement (AP) chemistry exam. However, past AP chemistry exam items often lacked the ability to test conceptual understanding and higher-order cognitive skills. The redesigned AP chemistry exam shows a distinctive shift in item types toward…
Measurement in Cross-Cultural Neuropsychology
Pedraza, Otto; Mungas, Dan
2010-01-01
The measurement of cognitive abilities across diverse cultural, racial, and ethnic groups has a contentious history, with broad political, legal, economic, and ethical repercussions. Advances in psychometric methods and converging scientific ideas about genetic variation afford new tools and theoretical contexts to move beyond the reflective analysis of between-group test score discrepancies. Neuropsychology is poised to benefit from these advances to cultivate a richer understanding of the factors that underlie cognitive test score disparities. To this end, the present article considers several topics relevant to the measurement of cognitive abilities across groups from diverse ancestral origins, including fairness and bias, equivalence, diagnostic validity, item response theory, and differential item functioning. PMID:18814034
Specification for Qualification and Certification for Level II - Advanced Welders.
ERIC Educational Resources Information Center
American Welding Society, Miami, FL.
This document defines the requirements and program for the American Welding Society (AWS) to certify advanced-level welders through an evaluation process entailing performance qualification and practical knowledge tests requiring the use of advanced reading, computational, and manual skills. The following items are included: statement of the…
Jelínek, Martin; Květon, Petr; Vobořil, Dalibor
2015-02-01
Despite initial expectations, which have emerged with the advancement of computer technology over the last decade of the twentieth century, scientific literature does not contain many relevant references regarding the development and use of innovative items in psychological testing. Our study presents and evaluates two novel item types. One item type is derived from a standard schematic test item used for the assessment of the spatial perception aspect of spatial ability, enhanced by an interactive response module. The performance on this item type is correlated with the performance on its paper and pencil counterpart. The other innovative item type used complex stimuli in the form of a short video of a ride through a city presented in an on-route perspective, which is intended to measure navigation skills and the ability to keep oneself oriented in space. In this case, the scores were related to the capacity of visuo-spatial working memory and also to the overall score in the paper/pencil test of spatial ability. The second relationship was moderated by gender.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santi, Peter Angelo; Cutler, Theresa Elizabeth; Favalli, Andrea
In order to improve the accuracy and capabilities of neutron multiplicity counting, additional quantifiable information is needed in order to address the assumptions that are present in the point model. Extracting and utilizing higher order moments (Quads and Pents) from the neutron pulse train represents the most direct way of extracting additional information from the measurement data to allow for an improved determination of the physical properties of the item of interest. The extraction of higher order moments from a neutron pulse train required the development of advanced dead time correction algorithms which could correct for dead time effects inmore » all of the measurement moments in a self-consistent manner. In addition, advanced analysis algorithms have been developed to address specific assumptions that are made within the current analysis model, namely that all neutrons are created at a single point within the item of interest, and that all neutrons that are produced within an item are created with the same energy distribution. This report will discuss the current status of implementation and initial testing of the advanced dead time correction and analysis algorithms that have been developed in an attempt to utilize higher order moments to improve the capabilities of correlated neutron measurement techniques.« less
Ability evaluation by binary tests: Problems, challenges & recent advances
NASA Astrophysics Data System (ADS)
Bashkansky, E.; Turetsky, V.
2016-11-01
Binary tests designed to measure abilities of objects under test (OUTs) are widely used in different fields of measurement theory and practice. The number of test items in such tests is usually very limited. The response to each test item provides only one bit of information per OUT. The problem of correct ability assessment is even more complicated, when the levels of difficulty of the test items are unknown beforehand. This fact makes the search for effective ways of planning and processing the results of such tests highly relevant. In recent years, there has been some progress in this direction, generated by both the development of computational tools and the emergence of new ideas. The latter are associated with the use of so-called “scale invariant item response models”. Together with maximum likelihood estimation (MLE) approach, they helped to solve some problems of engineering and proficiency testing. However, several issues related to the assessment of uncertainties, replications scheduling, the use of placebo, as well as evaluation of multidimensional abilities still present a challenge for researchers. The authors attempt to outline the ways to solve the above problems.
Above-Level Test Item Functioning across Examinee Age Groups
ERIC Educational Resources Information Center
Warne, Russell T.; Doty, Kristine J.; Malbica, Anne Marie; Angeles, Victor R.; Innes, Scott; Hall, Jared; Masterson-Nixon, Kelli
2016-01-01
"Above-level testing" (also called "above-grade testing," "out-of-level testing," and "off-level testing") is the practice of administering to a child a test that is designed for an examinee population that is older or in a more advanced grade. Above-level testing is frequently used to help educators design…
Bodenburg, Sebastian; Dopslaff, Nina
2008-01-01
The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.
Fiori, Marina; Antonietti, Jean-Philippe; Mikolajczak, Moira; Luminet, Olivier; Hansenne, Michel; Rossier, Jérôme
2014-01-01
The ability approach has been indicated as promising for advancing research in emotional intelligence (EI). However, there is scarcity of tests measuring EI as a form of intelligence. The Mayer Salovey Caruso Emotional Intelligence Test, or MSCEIT, is among the few available and the most widespread measure of EI as an ability. This implies that conclusions about the value of EI as a meaningful construct and about its utility in predicting various outcomes mainly rely on the properties of this test. We tested whether individuals who have the highest probability of choosing the most correct response on any item of the test are also those who have the strongest EI ability. Results showed that this is not the case for most items: The answer indicated by experts as the most correct in several cases was not associated with the highest ability; furthermore, items appeared too easy to challenge individuals high in EI. Overall results suggest that the MSCEIT is best suited to discriminate persons at the low end of the trait. Results are discussed in light of applied and theoretical considerations.
Enhancing a Computer-Based Testing Environment with Optimum Item Response Time
ERIC Educational Resources Information Center
Delen, Erhan
2015-01-01
As technology has become more advanced and accessible in instructional settings, there has been an upward trend in computer-based testing in the last decades. The present experimental study examines students' behaviors during computer-based testing in two different conditions and explores how these conditions affect the test results. Results…
Victorson, David E; Choi, Seung; Judson, Marc A; Cella, David
2014-05-01
Sarcoidosis is a multisystem disease that can negatively impact health-related quality of life (HRQL) across generic (e.g., physical, social and emotional wellbeing) and disease-specific (e.g., pulmonary, ocular, dermatologic) domains. Measurement of HRQL in sarcoidosis has largely relied on generic patient-reported outcome tools, with little disease-specific measures available. The purpose of this paper is to present the development and testing of disease-specific item banks and short forms of lung, skin and eye problems, which are a part of a new patient-reported outcome (PRO) instrument called the sarcoidosis assessment tool. After prioritizing and selecting the most important disease-specific domains, we wrote new items to reflect disease-specific problems by drawing from patient focus group and clinician expert survey data that were used to create our conceptual model of HRQL in sarcoidosis. Item pools underwent cognitive interviews by sarcoidosis patients (n = 13), and minor modifications were made. These items were administered in a multi-site study (n = 300) to obtain item calibrations and create calibrated short forms using item response theory (IRT) approaches. From the available item pools, we created four new item banks and short forms: (1) skin problems, (2) skin stigma, (3) lung problems, and (4) eye Problems. We also created and tested supplemental forms of the most common constitutional symptoms and negative effects of corticosteroids. Several new sarcoidosis-specific PROs were developed and tested using IRT approaches. These new measures can advance more precise and targeted HRQL assessment in sarcoidosis clinical trials and clinical practice.
Costantini, Massimo; Rabitti, Elisa; Beccaro, Monica; Fusco, Flavio; Peruselli, Carlo; La Ciura, Pietro; Valle, Alessandro; Suriani, Cinzia; Berardi, Maria Alejandra; Valenti, Danila; Mosso, Felicita; Morino, Piero; Zaninetta, Giovanni; Tubere, Giorgio; Piazza, Massimo; Sofia, Michele; Di Leo, Silvia; Higginson, Irene J
2016-02-26
There is an increasing requirement to assess outcomes, but few measures have been tested for advanced medical illness. We aimed to test the validity, reliability and responsiveness of the Palliative care Outcome Scale (POS), and to analyse predictors of change after the transition to palliative care. Phase 1: multicentre, mixed method study comprising cognitive and qualitative interviews with patients and staff, cultural refinement and adaption. Phase 2: consecutive cancer patients on admission to 8 inpatient hospices and 7 home-based teams were asked to complete the POS, the EORTC QLQ-C15-PAL and the FACIT-Sp (T0), to assess internal consistency, convergent and divergent validity. After 6 days (T1) patients and staff completed the POS to assess responsiveness to change (T1-T0), and agreement between self-assessed POS and POS completed by the staff. Finally, we asked hospices an assessment 24-48 h after T1 to assess its reliability (test re-test analysis). Phase I: 209 completed POS questionnaires and 29 cognitive interviews were assessed, revisions made and one item substituted. Phase II: 295 consecutive patients admitted to 15 PCTs were approached, 175 (59.3 %) were eligible, and 150 (85.7 %) consented. Consent was limited by the severity of illness in 40 % patients. We found good convergent validity, with strong and moderate correlations (r ranged 0.5-0.8) between similar items from the POS, the QLQ-C15-PAL and the FACIT-Sp. As hypothesised, the physical function subscale of QLQ-C15-PAL was not correlated with any POS item (r ranged -0.16-0.02). We found acceptable to good test re-test reliability in both versions for 6 items. We found significant clinical improvements during the first week of palliative care in 7/10 items assessed-pain, other symptoms, patient and family anxiety, information, feeling at peace and wasted time. Both the patient self-assessed and professional POS versions are valid and with an acceptable internal consistency. POS detected significant clinical improvements during palliative care, at a time when patients are usually expected to deteriorate. These results suggest that there is room for substantial improvement in the management of patients with advanced disease, across all key domains-symptoms, psychological, information, social and spiritual.
The Promise of NLP and Speech Processing Technologies in Language Assessment
ERIC Educational Resources Information Center
Chapelle, Carol A.; Chung, Yoo-Ree
2010-01-01
Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or…
Rail Impact Testing. Test Operations Procedure (TOP)
2008-09-15
impact test. The rail impact test is used to verify structural integrity of the test item and the adequacy of the tie-down system and tie-down...strength of provisions, connection and supporting structural frame, paragraph 5.2.3 ** Superscript...parts, to include outriggers and booms) without advanced approval by SDDCTEA. Torque nuts on wire rope clips to their correct value. Torque cable
Latimer, Shane; Meade, Tanya; Tennant, Alan
2014-07-30
The purpose of this study was to investigate the application of item banking to questionnaire items intended to measure Deliberate Self-Harm (DSH) behaviours. The Rasch measurement model was used to evaluate behavioural items extracted from seven published DSH scales administered to 568 Australians aged 18-30 years (62% university students, 21% mental health patients, and 17% community members). Ninety four items were calibrated in the item bank (including 12 items with differential item functioning for gender and age). Tailored scale construction was demonstrated by extracting scales covering different combinations of DSH methods but with the same raw score for each person location on the latent DSH construct. A simulated computer adaptive test (starting with common self-harm methods to minimise presentation of extreme behaviours) demonstrated that 11 items (on average) were needed to achieve a standard error of measurement of 0.387 (corresponding to a Cronbach׳s Alpha of 0.85). This study lays the groundwork for advancing DSH measurement to an item bank approach with the flexibility to measure a specific definitional orientation (e.g., non-suicidal self-injury) or a broad continuum of self-harmful acts, as appropriate to a particular research/clinical purpose. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Bramwell-Lalor, Sharon; Rainford, Marcia
2014-03-01
This paper reports on teachers' use of concept mapping as an alternative assessment strategy in advanced level biology classes and its effects on students' cognitive skills on selected biology concepts. Using a mixed methods approach, the study employed a pre-test/post-test quasi-experimental design involving 156 students and 8 teachers from intact classes. A researcher-constructed Biology Cognitive Skills Test was used to collect the quantitative data. Qualitative data were collected through interviews and students' personal documents. The data showed that the participants utilized concept mapping in various ways and they described positive experiences while being engaged in its use. The main challenge cited by teachers was the limited time available for more consistent use. The results showed that the use of concept mapping in advanced level biology can lead to learning gains that exceed those achieved in classes where mainly traditional methods are used. The students in the concept mapping experimental groups performed significantly better than their peers in the control group on both the lower-order (F(1) = 21.508; p < .001) and higher-order (F(1) = 42.842, p < .001) cognitive items of the biology test. A mean effect size of .56 was calculated representing the contribution of treatment to the students' performance on the test items.
ERIC Educational Resources Information Center
Laing-Kean, Claudine A. M.
2010-01-01
Programs supported by the Carl D. Perkins Act of 2006 are required to operate under the state or national content standards, and are expected to carry out evaluation procedures that address accountability. The Indiana high school course, "Advanced Life Science: Foods" ("ALS: Foods") operates under the auspices of the Perkins…
Lai, Jin-Shei; Cella, David; Choi, Seung; Junghaenel, Doerte U; Christodoulou, Christopher; Gershon, Richard; Stone, Arthur
2011-10-01
To illustrate how measurement practices can be advanced by using as an example the fatigue item bank (FIB) and its applications (short forms and computerized adaptive testing [CAT]) that were developed through the National Institutes of Health Patient Reported Outcomes Measurement Information System (PROMIS) Cooperative Group. Psychometric analysis of data collected by an Internet survey company using item response theory-related techniques. A U.S. general population representative sample collected through the Internet. Respondents used for dimensionality evaluation of the PROMIS FIB (N=603) and item calibrations (N=14,931). Not applicable. Fatigue items (112) developed by the PROMIS fatigue domain working group, 13-item Functional Assessment of Chronic Illness Therapy-Fatigue, and 4-item Medical Outcomes Study 36-Item Short Form Health Survey Vitality scale. The PROMIS FIB version 1, which consists of 95 items, showed acceptable psychometric properties. CAT showed consistently better precision than short forms. However, all 3 short forms showed good precision for most participants in that more than 95% of the sample could be measured precisely with reliability greater than 0.9. Measurement practice can be advanced by using a psychometrically sound measurement tool and its applications. This example shows that CAT and short forms derived from the PROMIS FIB can reliably estimate fatigue reported by the U.S. general population. Evaluation in clinical populations is warranted before the item bank can be used for clinical trials. Copyright © 2011 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Collective Protection (COLPRO) Novel Closures Testing
2013-03-28
science and technology programs for future ColPro systems may include interfaces such as novel designs using zippers, hook-and-pile closures, and...necessitate new testing procedures. Additionally, stand- ards of performance must be adjusted as technologies advance. Test procedures and parameters...listed in this TOP may require updating to accommodate new technologies in test items or in test instrumentation. Any variation to the TOP procedures
Development of an instrument for the evaluation of advanced life support performance.
Peltonen, L-M; Peltonen, V; Salanterä, S; Tommila, M
2017-10-01
Assessing advanced life support (ALS) competence requires validated instruments. Existing instruments include aspects of technical skills (TS), non-technical skills (NTS) or both, but one instrument for detailed assessment that suits all resuscitation situations is lacking. This study aimed to develop an instrument for the evaluation of the overall ALS performance of the whole team. This instrument development study had four phases. First, we reviewed literature and resuscitation guidelines to explore items to include in the instrument. Thereafter, we interviewed resuscitation team professionals (n = 66), using the critical incident technique, to determine possible additional aspects associated with the performance of ALS. Second, we developed an instrument based on the findings. Third, we used an expert panel (n = 20) to assess the validity of the developed instrument. Finally, we revised the instrument based on the experts' comments and tested it with six experts who evaluated 22 video recorded resuscitations. The final version of the developed instrument had 69 items divided into adherence to guidelines (28 items), clinical decision-making (5 items), workload management (12 items), team behaviour (8 items), information management (6 items), patient integrity and consideration of laymen (4 items) and work routines (6 items). The Cronbach's α values were good, and strong correlations between the overall performance and the instrument were observed. The instrument may be useful for detailed assessment of the team's overall performance, but the numerous items make the use demanding. The instrument is still under development, and more research is needed to determine its psychometric properties. © 2017 The Acta Anaesthesiologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Wylie, Elaine, Ed.
The proceedings of a working group conference on proficiency testing of Japanese as a second language contain a brief background paper distributed to conference invitees, a list of items included in the pre-conference portfolio, an advance organizer of potential discussion topics, a 77-item annotated list of bibliographies on second language…
Perez, Kathryn E.; Price, Rebecca M.
2014-01-01
Despite the impact of genetics on daily life, biology undergraduates understand some key genetics concepts poorly. One concept requiring attention is dominance, which many students understand as a fixed property of an allele or trait and regularly conflate with frequency in a population or selective advantage. We present the Dominance Concept Inventory (DCI), an instrument to gather data on selected alternative conceptions about dominance. During development of the 16-item test, we used expert surveys (n = 12), student interviews (n = 42), and field tests (n = 1763) from introductory and advanced biology undergraduates at public and private, majority- and minority-serving, 2- and 4-yr institutions in the United States. In the final field test across all subject populations (n = 709), item difficulty ranged from 0.08 to 0.84 (0.51 ± 0.049 SEM), while item discrimination ranged from 0.11 to 0.82 (0.50 ± 0.048 SEM). Internal reliability (Cronbach's alpha) was 0.77, while test–retest reliability values were 0.74 (product moment correlation) and 0.77 (intraclass correlation). The prevalence of alternative conceptions in the field tests shows that introductory and advanced students retain confusion about dominance after instruction. All measures support the DCI as a useful instrument for measuring undergraduate biology student understanding and alternative conceptions about dominance. PMID:26086665
ERIC Educational Resources Information Center
Moshinsky, Avital; Ziegler, David; Gafni, Naomi
2017-01-01
Many medical schools have adopted multiple mini-interviews (MMI) as an advanced selection tool. MMIs are expensive and used to test only a few dozen candidates per day, making it infeasible to develop a different test version for each test administration. Therefore, some items are reused both within and across years. This study investigated the…
NASA Astrophysics Data System (ADS)
Haydel, Angela Michelle
The purpose of this dissertation was to advance theoretical understanding about fit between the personal resources of individuals and the characteristics of science achievement tasks. Testing continues to be pervasive in schools, yet we know little about how students perceive tests and what they think and feel while they are actually working on test items. This study focused on both the personal (cognitive and motivational) and situational factors that may contribute to individual differences in achievement-related outcomes. 387 eighth grade students first completed a survey including measures of science achievement goals, capability beliefs, efficacy related to multiple-choice items and performance assessments, validity beliefs about multiple-choice items and performance assessments, and other perceptions of these item formats. Students then completed science achievement tests including multiple-choice items and two performance assessments. A sample of students was asked to verbalize both thoughts and feelings as they worked through the test items. These think-alouds were transcribed and coded for evidence of cognitive, metacognitive and motivational engagement. Following each test, all students completed measures of effort, mood, energy level and strategy use during testing. Students reported that performance assessments were more challenging, authentic, interesting and valid than multiple-choice tests. They also believed that comparisons between students were easier using multiple-choice items. Overall, students tried harder, felt better, had higher levels of energy and used more strategies while working on performance assessments. Findings suggested that performance assessments might be more congruent with a mastery achievement goal orientation, while multiple-choice tests might be more congruent with a performance achievement goal orientation. A variable-centered analytic approach including regression analyses provided information about how students, on average, who differed in terms of their teachers' ratings of their science ability, achievement goals, capability beliefs and experiences with science achievement tasks perceived, engaged in, and performed on multiple-choice items and performance assessments. Person-centered analyses provided information about the perceptions, engagement and performance of subgroups of individuals who had different motivational characteristics. Generally, students' personal goals and capability beliefs related more strongly to test perceptions, but not performance, while teacher ratings of ability and test-specific beliefs related to performance.
O'Kelly, Julian; Bodak, Rebeka
2016-01-01
Case studies of people with Huntington's disease (HD) report that music therapy provides a range of benefits that may improve quality of life; however, no robust music therapy assessment tools exist for this population. Develop and conduct preliminary psychometric testing of a music therapy assessment tool for patients with advanced HD. First, we established content and face validity of the Music Therapy Assessment Tool for Advanced HD (MATA-HD) through focus groups and field testing. Second, we examined psychometric properties of the resulting MATA-HD in terms of its construct validity, internal consistency, and inter-rater and intra-rater reliability over 10 group music therapy sessions with 19 patients. The resulting MATA-HD included a total of 15 items across six subscales (Arousal/Attention, Physical Presentation, Communication, Musical, Cognition, and Psychological/Behavioral). We found good construct validity (r ≥ 0.7) for Mood, Communication Level, Communication Effectiveness, Choice, Social Behavior, Arousal, and Attention items. Cronbach's α of 0.825 indicated good internal consistency across 11 items with a common focus of engagement in therapy. The inter-rater reliability (IRR) Intra-Class Coefficient (ICC) scores averaged 0.65, and a mean intra-rater ICC reliability of 0.68 was obtained. Further training and retesting provided a mean of IRR ICC of 0.7. Preliminary data indicate that the MATA-HD is a promising tool for measuring patient responses to music therapy interventions across psychological, physical, social, and communication domains of functioning in patients with advanced HD. © the American Music Therapy Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Advances Afoot in Microbiology
Karon, Brad S.
2017-01-01
ABSTRACT In 2016, the American Academy of Microbiology convened a colloquium to examine point-of-care (POC) microbiology testing and to evaluate its effects on clinical microbiology. Colloquium participants included representatives from clinical microbiology laboratories, industry, and the government, who together made recommendations regarding the implementation, oversight, and evaluation of POC microbiology testing. The colloquium report is timely and well written (V. Dolen et al., Changing Diagnostic Paradigms for Microbiology, 2017, https://www.asm.org/index.php/colloquium-reports/item/6421-changing-diagnostic-paradigms-for-microbiology?utm_source=Commentary&utm_medium=referral&utm_campaign=diagnostics). Emerging POC microbiology tests, especially nucleic acid amplification tests, have the potential to advance medical care. PMID:28539341
2012-05-01
tilted metamorphic rock . Typically, the surface layer of the soil is a brown gravelly silt with sand, about 4 inches thick. The subsoil is yellowish red...site setup, the placement of 200 seed items for use in measuring the capabilities of the advanced EMI sensors tested, the subsequent collection of...advanced sensors. The second team was responsible for the cued survey of 1,491 of the 2,143 targets using the MetalMapper, one of the advanced
Airworthiness and Flight Characteristics Test (A&FC) of the CH-47D helicopter
1984-02-01
Development Specification which were evaluated during this test. The Advanced Flight Control System heading select capability and the pressure refueling...determine compliance with the CH-47D Prime Item Development Specification (PIDS). 2. This Directorate agrees with the report conclusions and...Evaluations (PAE) (refs 1 and 2. app A), climatic laboratory tests (ref 3), and icing tests (ref 4). The US Army Aviation Research and Development
Morris, Scott; Bass, Mike; Lee, Mirinae; Neapolitan, Richard E
2017-09-01
The Patient Reported Outcomes Measurement Information System (PROMIS) initiative developed an array of patient reported outcome (PRO) measures. To reduce the number of questions administered, PROMIS utilizes unidimensional item response theory and unidimensional computer adaptive testing (UCAT), which means a separate set of questions is administered for each measured trait. Multidimensional item response theory (MIRT) and multidimensional computer adaptive testing (MCAT) simultaneously assess correlated traits. The objective was to investigate the extent to which MCAT reduces patient burden relative to UCAT in the case of PROs. One MIRT and 3 unidimensional item response theory models were developed using the related traits anxiety, depression, and anger. Using these models, MCAT and UCAT performance was compared with simulated individuals. Surprisingly, the root mean squared error for both methods increased with the number of items. These results were driven by large errors for individuals with low trait levels. A second analysis focused on individuals aligned with item content. For these individuals, both MCAT and UCAT accuracies improved with additional items. Furthermore, MCAT reduced the test length by 50%. For the PROMIS Emotional Distress banks, neither UCAT nor MCAT provided accurate estimates for individuals at low trait levels. Because the items in these banks were designed to detect clinical levels of distress, there is little information for individuals with low trait values. However, trait estimates for individuals targeted by the banks were accurate and MCAT asked substantially fewer questions. By reducing the number of items administered, MCAT can allow clinicians and researchers to assess a wider range of PROs with less patient burden. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
SPSS Syntax for Missing Value Imputation in Test and Questionnaire Data
ERIC Educational Resources Information Center
van Ginkel, Joost R.; van der Ark, L. Andries
2005-01-01
A well-known problem in the analysis of test and questionnaire data is that some item scores may be missing. Advanced methods for the imputation of missing data are available, such as multiple imputation under the multivariate normal model and imputation under the saturated logistic model (Schafer, 1997). Accompanying software was made available…
NASA Technical Reports Server (NTRS)
Trabanino, Rudy; Murphy, George L.; Yakut, M. M.
1986-01-01
An Advanced Food Hardware System galley for the initial operating capability (IOC) Space Station is discussed. Space Station will employ food hardware items that have never been flown in space, such as a dishwasher, microwave oven, blender/mixer, bulk food and beverage dispensers, automated food inventory management, a trash compactor, and an advanced technology refrigerator/freezer. These new technologies and designs are described and the trades, design, development, and testing associated with each are summarized.
Acquisition of specialized testing equipment for advanced cement-based materials : addendum.
DOT National Transportation Integrated Search
2014-07-01
The purpose of this addendum is to cover the installation cost associated with several of the specialized pieces of : equipment purchased in project 00038844. See report below from Missouri S&T Physical Facilities itemizing the scope of : work and as...
Advances Afoot in Microbiology.
Patel, Robin; Karon, Brad S
2017-07-01
In 2016, the American Academy of Microbiology convened a colloquium to examine point-of-care (POC) microbiology testing and to evaluate its effects on clinical microbiology. Colloquium participants included representatives from clinical microbiology laboratories, industry, and the government, who together made recommendations regarding the implementation, oversight, and evaluation of POC microbiology testing. The colloquium report is timely and well written (V. Dolen et al., Changing Diagnostic Paradigms for Microbiology , 2017, https://www.asm.org/index.php/colloquium-reports/item/6421-changing-diagnostic-paradigms-for-microbiology?utm_source=Commentary&utm_medium=referral&utm_campaign=diagnostics). Emerging POC microbiology tests, especially nucleic acid amplification tests, have the potential to advance medical care. Copyright © 2017 American Society for Microbiology.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?
Schweizer, Karl; Troche, Stefan
2018-02-01
In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.
Application of advanced technologies to small, short-haul aircraft
NASA Technical Reports Server (NTRS)
Andrews, D. G.; Brubaker, P. W.; Bryant, S. L.; Clay, C. W.; Giridharadas, B.; Hamamoto, M.; Kelly, T. J.; Proctor, D. K.; Myron, C. E.; Sullivan, R. L.
1978-01-01
The results of a preliminary design study which investigates the use of selected advanced technologies to achieve low cost design for small (50-passenger), short haul (50 to 1000 mile) transports are reported. The largest single item in the cost of manufacturing an airplane of this type is labor. A careful examination of advanced technology to airframe structure was performed since one of the most labor-intensive parts of the airplane is structures. Also, preliminary investigation of advanced aerodynamics flight controls, ride control and gust load alleviation systems, aircraft systems and turbo-prop propulsion systems was performed. The most beneficial advanced technology examined was bonded aluminum primary structure. The use of this structure in large wing panels and body sections resulted in a greatly reduced number of parts and fasteners and therefore, labor hours. The resultant cost of assembled airplane structure was reduced by 40% and the total airplane manufacturing cost by 16% - a major cost reduction. With further development, test verification and optimization appreciable weight saving is also achievable. Other advanced technology items which showed significant gains are as follows: (1) advanced turboprop-reduced block fuel by 15.30% depending on range; (2) configuration revisions (vee-tail)-empennage cost reduction of 25%; (3) leading-edge flap addition-weight reduction of 2500 pounds.
Preti, Antonio; Vellante, Marcello; Petretto, Donatella R
2017-05-01
The "Reading the Mind in the Eyes" Test (hereafter: Eyes Test) is considered an advanced task of the Theory of Mind aimed at assessing the performance of the participant in perspective-takingthat is, the ability to sense or understand other people's cognitive and emotional states. In this study, the item response theory analysis was applied to the adult version of the Eyes Test. The Italian version of the Eyes Test was administered to 200 undergraduate students of both genders (males = 46%). Modified parallel analysis (MPA) was used to test unidimensionality. Marginal maximum likelihood estimation was used to fit the 1-, 2-, and 3-parameter logistic (PL) model to the data. Differential Item Functioning (DIF) due to gender was explored with five independent methods. MPA provided evidence in favour of unidimensionality. The Rasch model (1-PL) was superior to the other two models in explaining participants' responses to the Eyes Test. There was no robust evidence of gender-related DIF in the Eyes Test, although some differences may exist for some items as a reflection of real differences by group. The study results support a one-factor model of the Eyes Test. Performance on the Eyes Test is defined by the participant's ability in perspective-taking. Researchers should cease using arbitrarily selected subscores in assessing the performance of participants to the Eyes Test. Lack of gender-related DIF favours the use of the Eyes Test in the investigation of gender differences concerning empathy and social cognition.
Logistics Reduction and Repurposing Technology for Long Duration Space Missions
NASA Technical Reports Server (NTRS)
Broyan, James L.; Chu, Andrew; Ewert, Michael K.
2014-01-01
One of NASA's Advanced Exploration Systems (AES) projects is the Logistics Reduction and Repurposing (LRR) project, which has the goal of reducing logistics resupply items through direct and indirect means. Various technologies under development in the project will reduce the launch mass of consumables and their packaging, enable reuse and repurposing of items and make logistics tracking more efficient. Repurposing also reduces the trash burden onboard spacecraft and indirectly reduces launch mass by replacing some items on the manifest. Examples include reuse of trash as radiation shielding or propellant. This paper provides the status of the LRR technologies in their third year of development under AES. Advanced clothing systems (ACS) are being developed to enable clothing to be worn longer, directly reducing launch mass. ACS has completed a ground exercise clothing study in preparation for an International Space Station (ISS) technology demonstration in 2014. Development of launch packaging containers and other items that can be repurposed on-orbit as part of habitation outfitting has resulted in a logistics-to-living (L2L) concept. L2L has fabricated and evaluated several multi-purpose cargo transfer bags (MCTBs) for potential reuse on orbit. Autonomous logistics management (ALM) is using radio frequency identification (RFID) to track items and thus reduce crew requirements for logistics functions. An RFID dense reader prototype is under construction and plans for integrated testing are being made. Development of a heat melt compactor (HMC) second generation unit for processing trash into compact and stable tiles is nearing completion. The HMC prototype compaction chamber has been completed and system development testing is underway. Research has been conducted on the conversion of trash-to-gas (TtG) for high levels of volume reduction and for use in propulsion systems. A steam reformation system was selected for further system definition of the TtG technology. And benefits analysis of all LRR technologies have been updated with the latest test and analysis results.
Hagquist, Curt; Andrich, David
2017-09-19
Rasch analysis with a focus on Differential Item Functioning (DIF) is increasingly used for examination of psychometric properties of health outcome measures. To take account of DIF in order to retain precision of measurement, split of DIF-items into separate sample specific items has become a frequently used technique. The purpose of the paper is to present and summarise recent advances of analysis of DIF in a unified methodology. In particular, the paper focuses on the use of analysis of variance (ANOVA) as a method to simultaneously detect uniform and non-uniform DIF, the need to distinguish between real and artificial DIF and the trade-off between reliability and validity. An illustrative example from health research is used to demonstrate how DIF, in this case between genders, can be identified, quantified and under specific circumstances accounted for using the Rasch model. Rasch analyses of DIF were conducted of a composite measure of psychosomatic problems using Swedish data from the Health Behaviour in School-aged Children study for grade 9 students collected during the 1985-2014 time periods. The procedures demonstrate how DIF can be identified efficiently by ANOVA of residuals, and how the magnitude of DIF can be quantified and potentially accounted for by resolving items according to identifiable groups and using principles of test equating on the resolved items. The results of the analysis also show that the real DIF in some items does affect person measurement estimates. Firstly, in order to distinguish between real and artificial DIF, the items showing DIF initially should not be resolved simultaneously but sequentially. Secondly, while resolving instead of deleting a DIF item may retain reliability, both options may affect the content validity negatively. Resolving items with DIF is not justified if the source of the DIF is relevant for the content of the variable; then resolving DIF may deteriorate the validity of the instrument. Generally, decisions on resolving items to deal with DIF should also rely on external information.
Echeverri, Margarita; Anderson, David; Nápoles, Anna María
2016-01-01
Objective Describe adaptation and initial validation of the Cancer Health Literacy Test (CHLT) for Spanish-speakers. Methods Cross-sectional field test of the CHLT Spanish version (CHLT-30-DKspa) among healthy Latinos in Louisiana. Diagonally Weighted Least Squares were used to confirm the factor structure. Item-Response Analysis using 2-parameter logistic estimates were used to identify questions that may require modification to avoid bias. Cronbach's alpha coefficients estimated scale internal consistency reliability. Analysis of variance was used to test for significant differences in CHLT-30-DKspa scores by gender, origin, age and education. Results Mean CHLT-30-DKspa score (N=400) was 17.13 (range 0 to 30; SD 6.65). Results confirmed a unidimensional structure (X2[405] =461.55, p=.027, CFI=.993; TLI=.992, RMSEA=.0180). Cronbach's alpha was 0.88. Items Q1-High calorie and Q15-Tumor spread had the lowest item-scale correlations (.148 and .288) and standardized factor loadings (.152 and .302). Items Q1-High Calories, Q8-Palliative Care, and Q19-Smoking Risk had the highest item-difficulty parameters (diff=1.12, 1.21, and 2.40). Conclusions Results generally supported the applicability of the CHLT-30-DKspa for Spanish-speaking healthy populations, with the exception of four items that need to be deleted or revised and further studied Q1, Q8, Q15, and Q19). Practical Implications The CHLT-30-DKspa can be used to assess cancer health literacy among Spanish-speaking populations to advance research on cancer health literacy and outcomes. PMID:27043760
Davies, Louise; Donnelly, Kyla Z; Goodman, Daisy J; Ogrinc, Greg
2016-04-01
The Standards for Quality Improvement Reporting Excellence (SQUIRE) Guideline was published in 2008 (SQUIRE 1.0) and was the first publication guideline specifically designed to advance the science of healthcare improvement. Advances in the discipline of improvement prompted us to revise it. We adopted a novel approach to the revision by asking end-users to 'road test' a draft version of SQUIRE 2.0. The aim was to determine whether they understood and implemented the guidelines as intended by the developers. Forty-four participants were assigned a manuscript section (ie, introduction, methods, results, discussion) and asked to use the draft Guidelines to guide their writing process. They indicated the text that corresponded to each SQUIRE item used and submitted it along with a confidential survey. The survey examined usability of the Guidelines using Likert-scaled questions and participants' interpretation of key concepts in SQUIRE using open-ended questions. On the submitted text, we evaluated concordance between participants' item usage/interpretation and the developers' intended application. For the survey, the Likert-scaled responses were summarised using descriptive statistics and the open-ended questions were analysed by content analysis. Consistent with the SQUIRE Guidelines' recommendation that not every item be included, less than one-third (n=14) of participants applied every item in their section in full. Of the 85 instances when an item was partially used or was omitted, only 7 (8.2%) of these instances were due to participants not understanding the item. Usage of Guideline items was highest for items most similar to standard scientific reporting (ie, 'Specific aim of the improvement' (introduction), 'Description of the improvement' (methods) and 'Implications for further studies' (discussion)) and lowest (<20% of the time) for those unique to healthcare improvement (ie, 'Assessment methods for context factors that contributed to success or failure' and 'Costs and strategic trade-offs'). Items unique to healthcare improvement, specifically 'Evolution of the improvement', 'Context elements that influenced the improvement', 'The logic on which the improvement was based', 'Process and outcome measures', demonstrated poor concordance between participants' interpretation and developers' intended application. User testing of a draft version of SQUIRE 2.0 revealed which items have poor concordance between developer intent and author usage, which will inform final editing of the Guideline and development of supporting supplementary materials. It also identified the items that require special attention when teaching about scholarly writing in healthcare improvement. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Planning a Study for Testing the Rasch Model given Missing Values due to the use of Test-booklets.
Yanagida, Takuya; Kubinger, Klaus D; Rasch, Dieter
2015-01-01
Though calibration of an achievement test within a psychological and educational context is very often carried out by the Rasch model, data sampling is hardly designed according to statistical foundations. However, Kubinger, Rasch, and Yanagida (2009, 2011) suggested an approach for the determination of sample size according to a given Type-I and Type-II risk and a certain effect of model contradiction when testing the Rasch model. The approach uses a three-way analysis of variance design with mixed classification. For the while, their simulation studies deal with complete data, meaning every examinee is administered with all of the items of an item pool. The simulation study now presented in this paper deals with the practical relevant case, in particular for large-scale assessments, that item presentation happens to use several test-booklets. As a consequence, there are missing values by design. Therefore, the question to be considered is, whether this approach works in this case as well. Besides the fact, that data are not normally distributed but there is a dichotomous variable (an examinee either solves an item or fails to solve it), only a single entry for each cell exists in the given three-way analysis of variance design, if at all, due to missing values. Hence, the obligatory test-statistic's distribution may not be retained, in contrast to the case of having no missing values. The result of our simulation study, despite applying only to a very special scenario, is that this approach works, indeed: Whether test-booklets were used or every examinee is administered all of the items changes nothing in respect to the actual Type-I risk or to the power of the test, given almost the same amount of information of examinees per item. However, as the results are limited to a special scenario, we currently recommend any interested researcher to simulate the appropriate one in advance by him/herself.
A summary of NASA/Air Force full scale engine research programs using the F100 engine
NASA Technical Reports Server (NTRS)
Deskin, W. J.; Hurrell, H. G.
1979-01-01
A full scale engine research (FSER) program conducted with the F100 engine is presented. The program mechanism is described and the F100 test vehicles utilized are illustrated. Technology items were addressed in the areas of swirl augmentation, flutter phenomenon, advanced electronic control logic theory, strain gage technology and distortion sensitivity. The associated test programs are described. The FSER approach utilizes existing state of the art engine hardware to evaluate advanced technology concepts and problem areas. Aerodynamic phenomenon previously not considered by design systems were identified and incorporated into industry design tools.
Latent class analysis of diagnostic science assessment data using Bayesian networks
NASA Astrophysics Data System (ADS)
Steedle, Jeffrey Thomas
2008-10-01
Diagnostic science assessments seek to draw inferences about student understanding by eliciting evidence about the mental models that underlie students' reasoning about physical systems. Measurement techniques for analyzing data from such assessments embody one of two contrasting assessment programs: learning progressions and facet-based assessments. Learning progressions assume that students have coherent theories that they apply systematically across different problem contexts. In contrast, the facet approach makes no such assumption, so students should not be expected to reason systematically across different problem contexts. A systematic comparison of these two approaches is of great practical value to assessment programs such as the National Assessment of Educational Progress as they seek to incorporate small clusters of related items in their tests for the purpose of measuring depth of understanding. This dissertation describes an investigation comparing learning progression and facet models. Data comprised student responses to small clusters of multiple-choice diagnostic science items focusing on narrow aspects of understanding of Newtonian mechanics. Latent class analysis was employed using Bayesian networks in order to model the relationship between students' science understanding and item responses. Separate models reflecting the assumptions of the learning progression and facet approaches were fit to the data. The technical qualities of inferences about student understanding resulting from the two models were compared in order to determine if either modeling approach was more appropriate. Specifically, models were compared on model-data fit, diagnostic reliability, diagnostic certainty, and predictive accuracy. In addition, the effects of test length were evaluated for both models in order to inform the number of items required to obtain adequately reliable latent class diagnoses. Lastly, changes in student understanding over time were studied with a longitudinal model in order to provide educators and curriculum developers with a sense of how students advance in understanding over the course of instruction. Results indicated that expected student response patterns rarely reflected the assumptions of the learning progression approach. That is, students tended not to systematically apply a coherent set of ideas across different problem contexts. Even those students expected to express scientifically-accurate understanding had substantial probabilities of reporting certain problematic ideas. The learning progression models failed to make as many substantively-meaningful distinctions among students as the facet models. In statistical comparisons, model-data fit was better for the facet model, but the models were quite comparable on all other statistical criteria. Studying the effects of test length revealed that approximately 8 items are needed to obtain adequate diagnostic certainty, but more items are needed to obtain adequate diagnostic reliability. The longitudinal analysis demonstrated that students either advance in their understanding (i.e., switch to the more advanced latent class) over a short period of instruction or stay at the same level. There was no significant relationship between the probability of changing latent classes and time between testing occasions. In all, this study is valuable because it provides evidence informing decisions about modeling and reporting on student understanding, it assesses the quality of measurement available from short clusters of diagnostic multiple-choice items, and it provides educators with knowledge of the paths that student may take as they advance from novice to expert understanding over the course of instruction.
Østergaard, Mia L; Nielsen, Kristina R; Albrecht-Beste, Elisabeth; Konge, Lars; Nielsen, Michael B
2018-01-01
This study aimed to develop a test with validity evidence for abdominal diagnostic ultrasound with a pass/fail-standard to facilitate mastery learning. The simulator had 150 real-life patient abdominal scans of which 15 cases with 44 findings were selected, representing level 1 from The European Federation of Societies for Ultrasound in Medicine and Biology. Four groups of experience levels were constructed: Novices (medical students), trainees (first-year radiology residents), intermediates (third- to fourth-year radiology residents) and advanced (physicians with ultrasound fellowship). Participants were tested in a standardized setup and scored by two blinded reviewers prior to an item analysis. The item analysis excluded 14 diagnoses. Both internal consistency (Cronbach's alpha 0.96) and inter-rater reliability (0.99) were good and there were statistically significant differences (p < 0.001) between all four groups, except the intermediate and advanced groups (p = 1.0). There was a statistically significant correlation between experience and test scores (Pearson's r = 0.82, p < 0.001). The pass/fail-standard failed all novices (no false positives) and passed all advanced (no false negatives). All intermediate participants and six out of 14 trainees passed. We developed a test for diagnostic abdominal ultrasound with solid validity evidence and a pass/fail-standard without any false-positive or false-negative scores. • Ultrasound training can benefit from competency-based education based on reliable tests. • This simulation-based test can differentiate between competency levels of ultrasound examiners. • This test is suitable for competency-based education, e.g. mastery learning. • We provide a pass/fail standard without false-negative or false-positive scores.
Anorexia/cachexia-related quality of life for children with cancer.
Lai, Jin-Shei; Cella, David; Peterman, Amy; Barocas, Joshua; Goldman, Stewart
2005-10-01
Anorexia is a common symptom in patients with cancer, which can lead to poor tolerance of treatment and can contribute to cachexia in extreme cases. Children with advanced-stage cancer are especially vulnerable to malnutrition resulting from anorexia and cachexia. Currently, there are no instruments that measure common concerns specifically associated with anorexia and cachexia in children with cancer. The purpose of the current article was to test the psychometric properties of a newly developed pediatric Functional Assessment of Anorexia and Cachexia Therapy (peds-FAACT) for children with cancer. Ninety-six patients (ages 7-17 yrs) receiving cancer treatment and their parents were asked to complete the 12-item peds-FAACT. The authors implemented both classical test theory and item response theory to evaluate the agreement between parents and patients, internal consistency and unidimensionality of the scale, and stability of items across subgroups. As a result, a patient-reported six-item scale was recommended as the core measure for all pediatric patients with cancer and four additional peripheral items were recommended for adolescent patients. The peds-FAACT demonstrated good psychometric properties, differentiated patients with different functional performance status, and was determined to be a useful tool for future clinical trials.
Eye-Movement Analysis Demonstrates Strategic Influences on Intelligence
ERIC Educational Resources Information Center
Vigneau, Francois; Caissie, Andre F.; Bors, Douglas A.
2006-01-01
Taking into account various models and findings pertaining to the nature of analogical reasoning, this study explored quantitative and qualitative individual differences in intelligence using latency and eye-movement data. Fifty-five university students were administered 14 selected items of the Raven's Advanced Progressive Matrices test. Results…
A knowledge-based theory of rising scores on "culture-free" tests.
Fox, Mark C; Mitchum, Ainsley L
2013-08-01
Secular gains in intelligence test scores have perplexed researchers since they were documented by Flynn (1984, 1987). Gains are most pronounced on abstract, so-called culture-free tests, prompting Flynn (2007) to attribute them to problem-solving skills availed by scientifically advanced cultures. We propose that recent-born individuals have adopted an approach to analogy that enables them to infer higher level relations requiring roles that are not intrinsic to the objects that constitute initial representations of items. This proposal is translated into item-specific predictions about differences between cohorts in pass rates and item-response patterns on the Raven's Matrices (Flynn, 1987), a seemingly culture-free test that registers the largest Flynn effect. Consistent with predictions, archival data reveal that individuals born around 1940 are less able to map objects at higher levels of relational abstraction than individuals born around 1990. Polytomous Rasch models verify predicted violations of measurement invariance, as raw scores are found to underestimate the number of analogical rules inferred by members of the earlier cohort relative to members of the later cohort who achieve the same overall score. The work provides a plausible cognitive account of the Flynn effect, furthers understanding of the cognition of matrix reasoning, and underscores the need to consider how test-takers select item responses. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Prior Knowledge Assessment Guide
2014-12-01
marksmanship, advanced rifle marksmanship, and even specialized shooting courses. A comparison of the means on the test for the two groups showed that the...hands- on evaluations of student knowledge and/or skills. Pretests however, determine how much knowledge a student currently possesses of the course...content; thus, questions on pretests assess knowledge about what is to be taught in the course. Also, most pretests will include test items
ERIC Educational Resources Information Center
He, Qingping
2012-01-01
Background: Although on-demand testing is being increasingly used in many areas of assessment, it has not been adopted in high stakes examinations like the General Certificate of Secondary Education (GCSE) and General Certificate of Education Advanced level (GCE A level) offered by awarding organisations (AOs) in the UK. One of the major issues…
Synthetic Vision Technology Demonstration. Volume 4. Appendices
1993-12-01
Synthetic Vision System and where advanced miaoebaro. tehnology would make signditcan oiprovements in capgwy or poduction cost. 2-6 SVSTDISIED Program Plan...achievement. 4. Determination of the pilot (test subject) mix and the test repetition needed to assure reasonable confidence In the results. 5...will contain the following elements: 1. Description - statement of what is to be accomplished. 2. Initial Conditions - items which must be accomplished
Dowling, N Maritza; Bolt, Daniel M; Deng, Sien
2016-12-01
When assessments are primarily used to measure change over time, it is important to evaluate items according to their sensitivity to change, specifically. Items that demonstrate good sensitivity to between-person differences at baseline may not show good sensitivity to change over time, and vice versa. In this study, we applied a longitudinal factor model of change to a widely used cognitive test designed to assess global cognitive status in dementia, and contrasted the relative sensitivity of items to change. Statistically nested models were estimated introducing distinct latent factors related to initial status differences between test-takers and within-person latent change across successive time points of measurement. Models were estimated using all available longitudinal item-level data from the Alzheimer's Disease Assessment Scale-Cognitive subscale, including participants representing the full-spectrum of disease status who were enrolled in the multisite Alzheimer's Disease Neuroimaging Initiative. Five of the 13 Alzheimer's Disease Assessment Scale-Cognitive items demonstrated noticeably higher loadings with respect to sensitivity to change. Attending to performance change on only these 5 items yielded a clearer picture of cognitive decline more consistent with theoretical expectations in comparison to the full 13-item scale. Items that show good psychometric properties in cross-sectional studies are not necessarily the best items at measuring change over time, such as cognitive decline. Applications of the methodological approach described and illustrated in this study can advance our understanding regarding the types of items that best detect fine-grained early pathological changes in cognition. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Logistics Reduction and Repurposing Technology for Long Duration Space Missions
NASA Technical Reports Server (NTRS)
Broyan, James Lee, Jr.; Chu, Andrew; Ewert, Michael K.
2014-01-01
One of NASA's Advanced Exploration Systems (AES) projects is the Logistics Reduction and Repurposing (LRR) project, which has the goal of reducing logistics resupply items through direct and indirect means. Various technologies under development in the project will reduce the launch mass of consumables and their packaging, enable reuse and repurposing of items, and make logistics tracking more efficient. Repurposing also reduces the trash burden onboard spacecraft and indirectly reduces launch mass by one manifest item having two purposes rather than two manifest items each having only one purpose. This paper provides the status of each of the LRR technologies in their third year of development under AES. Advanced clothing systems (ACSs) are being developed to enable clothing to be worn longer, directly reducing launch mass. ACS has completed a ground exercise clothing study in preparation for an International Space Station technology demonstration in 2014. Development of launch packaging containers and other items that can be repurposed on-orbit as part of habitation outfitting has resulted in a logistics-to-living (L2L) concept. L2L has fabricated and evaluated several multi-purpose cargo transfer bags for potential reuse on-orbit. Autonomous logistics management is using radio frequency identification (RFID) to track items and thus reduce crew time for logistics functions. An RFID dense reader prototype is under construction and plans for integrated testing are being made. A heat melt compactor (HMC) second generation unit for processing trash into compact and stable tiles is nearing completion. The HMC prototype compaction chamber has been completed and system development testing is under way. Research has been conducted on the conversion of trash-to-gas (TtG) for high levels of volume reduction and for use in propulsion systems. A steam reformation system was selected for further system definition of the TtG technology.
Atkinson, Thomas M.; DeBusk, Kendra P.A.; Liepa, Astra M.; Scanlon, Michael; Coons, Stephen Joel
2016-01-01
PURPOSE To describe the process and results of the preliminary qualitative development of a new symptom-based PRO measure intended to assess treatment benefit in advanced non-small cell lung cancer (NSCLC) clinical trials. METHODS Individual qualitative interviews were conducted with adult NSCLC (Stage I–IV) patients in the US. Experienced interviewers conducted concept elicitation (CE) and cognitive interviews using semi-structured interview guides. The CE interview guide was used to elicit spontaneous reports of symptom experiences along with probing to further explore and confirm concepts. Interview transcripts were coded and analyzed by professional qualitative coders using Atlas.ti software, and were summarized by like-content using an iterative coding framework. Data from the CE interviews were considered alongside existing literature and clinical expert opinion during an item-generation process, leading to development of a preliminary version of the NSCLC Symptom Assessment Questionnaire (NSCLC-SAQ). Three waves of cognitive interviews were conducted to evaluate concept relevance, item interpretability, and structure of the draft items to facilitate further instrument refinement. FINDINGS Fifty-one patients (mean age 64.9 [SD=11.2]; 51.0% female) participated in the CE interviews. A total of 1,897 expressions of NSCLC-related symptoms were identified and coded in interview transcripts, representing approximately 42 distinct symptom concepts. A 9-item initial draft instrument was developed for testing in three waves of cognitive interviews with additional NSCLC patients (n=20), during which both paper and electronic versions of the instrument were evaluated and refined. Participant responses and feedback during cognitive interviews led to the removal of 2 items and substantial modifications to others. IMPLICATIONS The NSCLC-SAQ is a 7-item PRO measure intended for use in advanced NSCLC clinical trials to support medical product labelling. The NSCLC-SAQ uses a 7-day recall period and verbal rating scales. It was developed in accordance with the FDA’s PRO Guidance and scientific best practices, and the resulting qualitative interview data provide evidence of content validity. The NSCLC-SAQ has been prepared in both paper and electronic administration formats and a tablet computer-based version is currently undergoing quantitative testing to confirm its measurement properties and support FDA qualification. PMID:27041408
Abraham, Joel K; Perez, Kathryn E; Price, Rebecca M
2014-01-01
Despite the impact of genetics on daily life, biology undergraduates understand some key genetics concepts poorly. One concept requiring attention is dominance, which many students understand as a fixed property of an allele or trait and regularly conflate with frequency in a population or selective advantage. We present the Dominance Concept Inventory (DCI), an instrument to gather data on selected alternative conceptions about dominance. During development of the 16-item test, we used expert surveys (n = 12), student interviews (n = 42), and field tests (n = 1763) from introductory and advanced biology undergraduates at public and private, majority- and minority-serving, 2- and 4-yr institutions in the United States. In the final field test across all subject populations (n = 709), item difficulty ranged from 0.08 to 0.84 (0.51 ± 0.049 SEM), while item discrimination ranged from 0.11 to 0.82 (0.50 ± 0.048 SEM). Internal reliability (Cronbach's alpha) was 0.77, while test-retest reliability values were 0.74 (product moment correlation) and 0.77 (intraclass correlation). The prevalence of alternative conceptions in the field tests shows that introductory and advanced students retain confusion about dominance after instruction. All measures support the DCI as a useful instrument for measuring undergraduate biology student understanding and alternative conceptions about dominance. © 2014 J. K. Abraham et al. CBE—Life Sciences Education © 2014 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
Buiza, Cristina; Navarro, Ana; Díaz-Orueta, Unai; González, Mari Feli; Alaba, Javier; Arriola, Enrique; Hernández, Carmen; Zulaica, Amaia; Yanguas, José Javier
2011-01-01
The cognitive assessment of patients with advanced dementia needs proper screening instruments that allow obtain information about the cognitive state and resources that these individuals still have. The present work conducts a Spanish validation study of the Severe Mini Mental State Examination (SMMSE). Forty-seven patients with advanced dementia (Mini-Cognitive Examination [MEC]<11) were evaluated with the Reisberg's Global Deterioration Scale, MEC, SMMSE and Severe Cognitive Impairment Profile scales. All test items were discriminative. The test showed high internal (α=0.88), test-retest (0.64 to 1.00, P<.01) and between observers reliabilities (0.69-1.00, p<0.01), both for scores total and for each item separately. Construct validity was tested through correlations between the instrument and MEC scores (r=0.59, P<0.01). Further information on the construct validity was obtained by dividing the sample into groups that scored above or below 5 points in the MEC and recalculating their correlations with SMMSE. The correlation between the scores in the SMMSE and MEC was significant in the MEC 0-5 group (r=0.55, P<.05), but not in the MEC>5 group. Additionally, differences in scores were found in the SMMSE, but not in the MEC, between the three GDS groups (5, 6 and 7) (H=11.1, P<.05). The SMMSE is an instrument for the assessment of advanced cognitive impairment which prevents the floor effect through an extension of lower measurement range relative to that of the MEC. From our results, this rapid screening tool and easy to administer, can be considered valid and reliable. Copyright © 2010 SEGG. Published by Elsevier Espana. All rights reserved.
2017-04-06
Research Hypothesis ........................................................................................................... 15 Research Design ...user community and of accommodating advancing software applications by the vendors. Research Design My approach to this project was to conduct... design descriptions , requirements specifications, test documentation, interface requirement specifications, product specifications, and software
Specification for Qualification and Certification for Level III - Expert Welders.
ERIC Educational Resources Information Center
American Welding Society, Miami, FL.
This document defines the requirements and program for the American Welding Society to certify expert welders through an evaluation process entailing performance qualification and practical knowledge tests requiring the use of advanced reading, computational, and manual skills. The following items are included: statement of the standard's scope;…
The Development and Validation of a Teacher Preparation Program: Follow-Up Survey
ERIC Educational Resources Information Center
Schulte, Laura E.
2008-01-01
Students in my applied advanced statistics course for educational administration doctoral students developed a follow-up survey for teacher preparation programs, using the following scale development processes: adopting a framework; developing items; providing evidence of content validity; conducting a pilot test; and analyzing data. The students…
Davies, Louise; Donnelly, Kyla Z; Goodman, Daisy J; Ogrinc, Greg
2016-01-01
Background The Standards for Quality Improvement Reporting Excellence (SQUIRE) Guideline was published in 2008 (SQUIRE 1.0) and was the first publication guideline specifically designed to advance the science of healthcare improvement. Advances in the discipline of improvement prompted us to revise it. We adopted a novel approach to the revision by asking end-users to ‘road test’ a draft version of SQUIRE 2.0. The aim was to determine whether they understood and implemented the guidelines as intended by the developers. Methods Forty-four participants were assigned a manuscript section (ie, introduction, methods, results, discussion) and asked to use the draft Guidelines to guide their writing process. They indicated the text that corresponded to each SQUIRE item used and submitted it along with a confidential survey. The survey examined usability of the Guidelines using Likert-scaled questions and participants’ interpretation of key concepts in SQUIRE using open-ended questions. On the submitted text, we evaluated concordance between participants’ item usage/interpretation and the developers’ intended application. For the survey, the Likert-scaled responses were summarised using descriptive statistics and the open-ended questions were analysed by content analysis. Results Consistent with the SQUIRE Guidelines’ recommendation that not every item be included, less than one-third (n=14) of participants applied every item in their section in full. Of the 85 instances when an item was partially used or was omitted, only 7 (8.2%) of these instances were due to participants not understanding the item. Usage of Guideline items was highest for items most similar to standard scientific reporting (ie, ‘Specific aim of the improvement’ (introduction), ‘Description of the improvement’ (methods) and ‘Implications for further studies’ (discussion)) and lowest (<20% of the time) for those unique to healthcare improvement (ie, ‘Assessment methods for context factors that contributed to success or failure’ and ‘Costs and strategic trade-offs’). Items unique to healthcare improvement, specifically ‘Evolution of the improvement’, ‘Context elements that influenced the improvement’, ‘The logic on which the improvement was based’, ‘Process and outcome measures’, demonstrated poor concordance between participants’ interpretation and developers’ intended application. Conclusions User testing of a draft version of SQUIRE 2.0 revealed which items have poor concordance between developer intent and author usage, which will inform final editing of the Guideline and development of supporting supplementary materials. It also identified the items that require special attention when teaching about scholarly writing in healthcare improvement. PMID:26263916
ERIC Educational Resources Information Center
Wilcox, Rand R.
A mastery test is frequently described as follows: an examinee responds to n dichotomously scored test items. Depending upon the examinee's observed (number correct) score, a mastery decision is made and the examinee is advanced to the next level of instruction. Otherwise, a nonmastery decision is made and the examinee is given remedial work. This…
Pedraza, Otto; Graff-Radford, Neill R.; Smith, Glenn E.; Ivnik, Robert J.; Willis, Floyd B.; Petersen, Ronald C.; Lucas, John A.
2010-01-01
Scores on the Boston Naming Test (BNT) are frequently lower for African American when compared to Caucasian adults. Although demographically-based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo's Older Americans and Older African Americans Normative Studies. Under a 2-parameter logistic IRT framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Six of these 12 items (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. PMID:19570311
Commercial portion-controlled foods in research studies: how accurate are label weights?
Conway, Joan M; Rhodes, Donna G; Rumpler, William V
2004-09-01
The purpose of this study was to evaluate the reliability of label weights as surrogates for actual weights in commercial portion-controlled foods used in a research setting. Actual weights of replicate samples of 82 portion-controlled food items and 17 discrete units of food from larger packaging were determined over time. Comparison was made to the package label weights for the portion-controlled food items and the per-serving weights for the discrete units. The study was conducted at the US Department of Agriculture's Beltsville Human Nutrition Research Center's Human Study Facility, which houses a metabolic kitchen and human nutrition research facility. The primary outcome measures were the actual and label weights of 99 food items consumed by human volunteers during controlled feeding studies. Statistical analyses performed The difference between label and actual weights was tested by the paired t test for those data that complied with the assumptions of normality. The Wilcoxon signed rank test was used for the remainder of the data. Compliance with federal guidelines for packaged weights was also assessed. There was no statistical difference between actual and label weights for only 37 food items. The actual weights of 15 portion-controlled food items were 1% or more less than label weights, making them potentially out of compliance with federal guidelines. With advance planning and continuous monitoring, well-controlled feeding studies could incorporate portion-controlled food items and discrete units, especially beverages and confectionery products. Dietetics professionals should encourage individuals with diabetes and others on strict dietary regimens to check actual weights of portion-controlled products carefully against package weights.
2011-01-01
Background Organizational context has the potential to influence the use of new knowledge. However, despite advances in understanding the theoretical base of organizational context, its measurement has not been adequately addressed, limiting our ability to quantify and assess context in healthcare settings and thus, advance development of contextual interventions to improve patient care. We developed the Alberta Context Tool (the ACT) to address this concern. It consists of 58 items representing 10 modifiable contextual concepts. We reported the initial validation of the ACT in 2009. This paper presents the second stage of the psychometric validation of the ACT. Methods We used the Standards for Educational and Psychological Testing to frame our validity assessment. Data from 645 English speaking healthcare aides from 25 urban residential long-term care facilities (nursing homes) in the three Canadian Prairie Provinces were used for this stage of validation. In this stage we focused on: (1) advanced aspects of internal structure (e.g., confirmatory factor analysis) and (2) relations with other variables validity evidence. To assess reliability and validity of scores obtained using the ACT we conducted: Cronbach's alpha, confirmatory factor analysis, analysis of variance, and tests of association. We also assessed the performance of the ACT when individual responses were aggregated to the care unit level, because the instrument was developed to obtain unit-level scores of context. Results Item-total correlations exceeded acceptable standards (> 0.3) for the majority of items (51 of 58). We ran three confirmatory factor models. Model 1 (all ACT items) displayed unacceptable fit overall and for five specific items (1 item on adequate space for resident care in the Organizational Slack-Space ACT concept and 4 items on use of electronic resources in the Structural and Electronic Resources ACT concept). This prompted specification of two additional models. Model 2 used the 7 scaled ACT concepts while Model 3 used the 3 count-based ACT concepts. Both models displayed substantially improved fit in comparison to Model 1. Cronbach's alpha for the 10 ACT concepts ranged from 0.37 to 0.92 with 2 concepts performing below the commonly accepted standard of 0.70. Bivariate associations between the ACT concepts and instrumental research utilization levels (which the ACT should predict) were statistically significant at the 5% level for 8 of the 10 ACT concepts. The majority (8/10) of the ACT concepts also showed a statistically significant trend of increasing mean scores when arrayed across the lowest to the highest levels of instrumental research use. Conclusions The validation process in this study demonstrated additional empirical support for construct validity of the ACT, when completed by healthcare aides in nursing homes. The overall pattern of the data was consistent with the structure hypothesized in the development of the ACT and supports the ACT as an appropriate measure for assessing organizational context in nursing homes. Caution should be applied in using the one space and four electronic resource items that displayed misfit in this study with healthcare aides until further assessments are made. PMID:21767378
Quantifying Media Literacy: Development, Reliability, and Validity of a New Measure
ERIC Educational Resources Information Center
Arke, Edward T.; Primack, Brian A.
2009-01-01
Media literacy has the potential to alter outcomes in various fields, including education, communication, and public health. However, measurement of media literacy remains a critical challenge in advancing this field of inquiry. In this manuscript, we describe the development and testing of a pilot measure of media literacy. Items were formed…
Ren, Xuezhu; Schweizer, Karl; Wang, Tengfei; Chu, Pei; Gong, Qin
2017-10-01
The aim of the current study is to provide new insights into the relationship between executive functions and intelligence measures in considering the item-position effect observed in intelligence items. Raven's Advanced Progressive Matrices (APM) and Horn's LPS reasoning test were used to assess fluid intelligence which served as criterion in investigating the relationship between intelligence and executive functions. A battery of six experimental tasks measured the updating, shifting, and inhibition processes of executive functions. Data were collected from 205 university students. Fluid intelligence showed substantial correlations with the updating and inhibition processes and no correlation with the shifting process without considering the item-position effect. Next, the fixed-link model was applied to APM and LPS data separately to decompose them into an ability component and an item-position component. The results of relating the components to executive functions showed that the updating and shifting processes mainly contributed to the item-position component whereas the inhibition process was mainly associated with the ability component of each fluid intelligence test. These findings suggest that improvements in the efficiency of updating and shifting processes are likely to occur during the course of completing intelligence measures and inhibition is important for intelligence in general. Copyright © 2017 Elsevier B.V. All rights reserved.
Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie
2009-01-01
Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in the development of measures. PMID:19422677
Technology development status at McDonnell Douglas
NASA Technical Reports Server (NTRS)
Rowe, W. T.
1981-01-01
The significant technology items of the Concorde and the conceptual MCD baseline advanced supersonic transport are compared. The four major improvements are in the areas of range performance, structures (materials), aerodynamics, and in community noise. Presentation charts show aerodynamic efficiency; the reoptimized wing; low scale lift/drag ratio; control systems; structural modeling and analysis; weight and cost comparisons for superplasticity diffusion bonded titanium sandwich structures and for aluminum brazed titanium honeycomb structures; operating cost reduction; suppressor nozzles; noise reduction and range; the bicone inlet; a market summary; environmental issues; high priority items; the titanium wing and fuselage test components; and technology validation.
Improving Inpatient Surveys: Web-Based Computer Adaptive Testing Accessed via Mobile Phone QR Codes
2016-01-01
Background The National Health Service (NHS) 70-item inpatient questionnaire surveys inpatients on their perceptions of their hospitalization experience. However, it imposes more burden on the patient than other similar surveys. The literature shows that computerized adaptive testing (CAT) based on item response theory can help shorten the item length of a questionnaire without compromising its precision. Objective Our aim was to investigate whether CAT can be (1) efficient with item reduction and (2) used with quick response (QR) codes scanned by mobile phones. Methods After downloading the 2008 inpatient survey data from the Picker Institute Europe website and analyzing the difficulties of this 70-item questionnaire, we used an author-made Excel program using the Rasch partial credit model to simulate 1000 patients’ true scores followed by a standard normal distribution. The CAT was compared to two other scenarios of answering all items (AAI) and the randomized selection method (RSM), as we investigated item length (efficiency) and measurement accuracy. The author-made Web-based CAT program for gathering patient feedback was effectively accessed from mobile phones by scanning the QR code. Results We found that the CAT can be more efficient for patients answering questions (ie, fewer items to respond to) than either AAI or RSM without compromising its measurement accuracy. A Web-based CAT inpatient survey accessed by scanning a QR code on a mobile phone was viable for gathering inpatient satisfaction responses. Conclusions With advances in technology, patients can now be offered alternatives for providing feedback about hospitalization satisfaction. This Web-based CAT is a possible option in health care settings for reducing the number of survey items, as well as offering an innovative QR code access. PMID:26935793
Improving Inpatient Surveys: Web-Based Computer Adaptive Testing Accessed via Mobile Phone QR Codes.
Chien, Tsair-Wei; Lin, Weir-Sen
2016-03-02
The National Health Service (NHS) 70-item inpatient questionnaire surveys inpatients on their perceptions of their hospitalization experience. However, it imposes more burden on the patient than other similar surveys. The literature shows that computerized adaptive testing (CAT) based on item response theory can help shorten the item length of a questionnaire without compromising its precision. Our aim was to investigate whether CAT can be (1) efficient with item reduction and (2) used with quick response (QR) codes scanned by mobile phones. After downloading the 2008 inpatient survey data from the Picker Institute Europe website and analyzing the difficulties of this 70-item questionnaire, we used an author-made Excel program using the Rasch partial credit model to simulate 1000 patients' true scores followed by a standard normal distribution. The CAT was compared to two other scenarios of answering all items (AAI) and the randomized selection method (RSM), as we investigated item length (efficiency) and measurement accuracy. The author-made Web-based CAT program for gathering patient feedback was effectively accessed from mobile phones by scanning the QR code. We found that the CAT can be more efficient for patients answering questions (ie, fewer items to respond to) than either AAI or RSM without compromising its measurement accuracy. A Web-based CAT inpatient survey accessed by scanning a QR code on a mobile phone was viable for gathering inpatient satisfaction responses. With advances in technology, patients can now be offered alternatives for providing feedback about hospitalization satisfaction. This Web-based CAT is a possible option in health care settings for reducing the number of survey items, as well as offering an innovative QR code access.
Validating the Assessment for Measuring Indonesian Secondary School Students Performance in Ecology
NASA Astrophysics Data System (ADS)
Rachmatullah, A.; Roshayanti, F.; Ha, M.
2017-09-01
The aims of this current study are validating the American Association for the Advancement of Science (AAAS) Ecology assessment and examining the performance of Indonesian secondary school students on the assessment. A total of 611 Indonesian secondary school students (218 middle school students and 393 high school students) participated in the study. Forty-five items of AAAS assessment in the topic of Interdependence in Ecosystems were divided into two versions which every version has 21 similar items. Linking item method was used as the method to combine those two versions of assessment and further Rasch analyses were utilized to validate the instrument. Independent sample t-test was also run to compare the performance of Indonesian students and American students based on the mean of item difficulty. We found that from the total of 45 items, three items were identified as misfitting items. Later on, we also found that both Indonesian middle and high school students were significantly lower performance with very large and medium effect size compared to American students. We will discuss our findings in the regard of validation issue and the connection to Indonesian student’s science literacy.
The Bangor Voice Matching Test: A standardized test for the assessment of voice perception ability.
Mühl, Constanze; Sheil, Orla; Jarutytė, Lina; Bestelmeyer, Patricia E G
2017-11-09
Recognising the identity of conspecifics is an important yet highly variable skill. Approximately 2 % of the population suffers from a socially debilitating deficit in face recognition. More recently the existence of a similar deficit in voice perception has emerged (phonagnosia). Face perception tests have been readily available for years, advancing our understanding of underlying mechanisms in face perception. In contrast, voice perception has received less attention, and the construction of standardized voice perception tests has been neglected. Here we report the construction of the first standardized test for voice perception ability. Participants make a same/different identity decision after hearing two voice samples. Item Response Theory guided item selection to ensure the test discriminates between a range of abilities. The test provides a starting point for the systematic exploration of the cognitive and neural mechanisms underlying voice perception. With a high test-retest reliability (r=.86) and short assessment duration (~10 min) this test examines individual abilities reliably and quickly and therefore also has potential for use in developmental and neuropsychological populations.
Measuring Advance Care Planning: Optimizing the Advance Care Planning Engagement Survey.
Sudore, Rebecca L; Heyland, Daren K; Barnes, Deborah E; Howard, Michelle; Fassbender, Konrad; Robinson, Carole A; Boscardin, John; You, John J
2017-04-01
A validated 82-item Advance Care Planning (ACP) Engagement Survey measures a broad range of behaviors. However, concise surveys are needed. The objective of this study was to validate shorter versions of the survey. The survey included 57 process (e.g., readiness) and 25 action items (e.g., discussions). For item reduction, we systematically eliminated questions based on face validity, item nonresponse, redundancy, ceiling effects, and factor analysis. We assessed internal consistency (Cronbach's alpha) and construct validity with cross-sectional correlations and the ability of the progressively shorter survey versions to detect change one week after exposure to an ACP intervention (Pearson correlation coefficients). Five hundred one participants (four Canadian and three US sites) were included in item reduction (mean age 69 years [±10], 41% nonwhite). Because of high correlations between readiness and action items, all action items were removed. Because of high correlations and ceiling effects, two process items were removed. Successive factor analysis then created 55-, 34-, 15-, nine-, and four-item versions; 664 participants (from three US ACP clinical trials) were included in validity analysis (age 65 years [±8], 72% nonwhite, 34% Spanish speaking). Cronbach's alphas were high for all versions (four items 0.84-55 items 0.97). Compared with the original survey, cross-sectional correlations were high (four items 0.85; 55 items 0.97) as were delta correlations (four items 0.68; 55 items 0.93). Shorter versions of the ACP Engagement Survey are valid, internally consistent, and able to detect change across a broad range of ACP behaviors for English and Spanish speakers. Shorter ACP surveys can efficiently measure broad ACP behaviors in research and clinical settings. Published by Elsevier Inc.
Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks
Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando
2014-01-01
Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843
Multi-step routes of capuchin monkeys in a laser pointer traveling salesman task.
Howard, Allison M; Fragaszy, Dorothy M
2014-09-01
Prior studies have claimed that nonhuman primates plan their routes multiple steps in advance. However, a recent reexamination of multi-step route planning in nonhuman primates indicated that there is no evidence for planning more than one step ahead. We tested multi-step route planning in capuchin monkeys using a pointing device to "travel" to distal targets while stationary. This device enabled us to determine whether capuchins distinguish the spatial relationship between goals and themselves and spatial relationships between goals and the laser dot, allocentrically. In Experiment 1, two subjects were presented with identical food items in Near-Far (one item nearer to subject) and Equidistant (both items equidistant from subject) conditions with a laser dot visible between the items. Subjects moved the laser dot to the items using a joystick. In the Near-Far condition, one subject demonstrated a bias for items closest to self but the other subject chose efficiently. In the second experiment, subjects retrieved three food items in similar Near-Far and Equidistant arrangements. Both subjects preferred food items nearest the laser dot and showed no evidence of multi-step route planning. We conclude that these capuchins do not make choices on the basis of multi-step look ahead strategies. © 2014 Wiley Periodicals, Inc.
Halm, Margo A
2018-05-14
Proficiency in evidence-based practice (EBP) is essential for relevant research findings to be integrated into clinical care when congruent with patient preferences. Few valid and reliable tools are available to evaluate the effectiveness of educational programs in advancing EBP attitudes, knowledge, skills, or behaviors, and ongoing competency. The Fresno test is one objective method to evaluate EBP knowledge and skills; however, the original and modified versions were validated with family physicians, physical therapists, and speech and language therapists. To adapt the Modified Fresno-Acute Care Nursing test and develop a psychometrically sound tool for use in academic and practice settings. In Phase 1, modified Fresno (Tilson, 2010) items were adapted for acute care nursing. In Phase 2, content validity was established with an expert panel. Content validity indices (I-CVI) ranged from .75 to 1.0. Scale CVI was .95%. A cross-sectional convenience sample of acute care nurses (n = 90) in novice, master, and expert cohorts completed the Modified Fresno-Acute Care Nursing test administered electronically via SurveyMonkey. Total scores were significantly different between training levels (p < .0001). Novice nurses scored significantly lower than master or expert nurses, but differences were not found between the latter cohorts. Total score reliability was acceptable: (interrater [ICC (2, 1)]) = .88. Cronbach's alpha was 0.70. Psychometric properties of most modified items were satisfactory; however, six require further revision and testing to meet acceptable standards. The Modified Fresno-Acute Care Nursing test is a 14-item test for objectively assessing EBP knowledge and skills of acute care nurses. While preliminary psychometric properties for this new EBP knowledge measure for acute care nursing are promising, further validation of some of the items and scoring rubric is needed. © 2018 Sigma Theta Tau International.
Lohse, Barbara; Satter, Ellyn; Arnold, Kristen
2014-04-01
Accurate early assessment and targeted intervention with problematic parent/child feeding dynamics is critical for the prevention and treatment of child obesity. The division of responsibility in feeding (sDOR), articulated by the Satter Feeding Dynamics Model (fdSatter), has been demonstrated clinically as an effective approach to reduce child feeding problems, including those leading to obesity. Lack of a tested instrument to examine adherence to fdSatter stimulated initial construction of the Satter Feeding Dynamics Inventory (fdSI). The aim of this project was to refine the item pool to establish translational validity, making the fdSI suitable for advanced psychometric analysis. Cognitive interviews (n = 80) with caregivers of varied socioeconomic strata informed revisions that demonstrated face and content validity. fdSI responses were mapped to interviews using an iterative, multi-phase thematic approach to provide an instrument ready for construct validation. fdSI development required five interview phases over 32 months: Foundational; Refinement; Transitional; Assurance; and Launching. Each phase was associated with item reduction and revision. Thirteen items were removed from the 38-item Foundational phase and seven were revised in the Refinement phase. Revisions, deletions, and additions prompted by Transitional and Assurance phase interviews resulted in the 15-item Launching phase fdSI. Only one Foundational phase item was carried through all development phases, emphasizing the need to test for item comprehension and interpretation before psychometric analyses. Psychometric studies of item pools without encrypted meanings will facilitate progress toward a tool that accurately detects adherence to sDOR. Ability to measure sDOR will facilitate focus on feeding behaviors associated with reduced risk of childhood obesity.
Roberts, Chris; Zoanetti, Nathan; Rothnie, Imogene
2009-04-01
The multiple mini-interview (MMI) was initially designed to test non-cognitive characteristics related to professionalism in entry-level students. However, it may be testing cognitive reasoning skills. Candidates to medical and dental schools come from diverse backgrounds and it is important for the validity and fairness of the MMI that these background factors do not impact on their scores. A suite of advanced psychometric techniques drawn from item response theory (IRT) was used to validate an MMI question bank in order to establish the conceptual equivalence of the questions. Bias against candidate subgroups of equal ability was investigated using differential item functioning (DIF) analysis. All 39 questions had a good fit to the IRT model. Of the 195 checklist items, none were found to have significant DIF after visual inspection of expected score curves, consideration of the number of applicants per category, and evaluation of the magnitude of the DIF parameter estimates. The question bank contains items that have been studied carefully in terms of model fit and DIF. Questions appear to measure a cognitive unidimensional construct, 'entry-level reasoning skills in professionalism', as suggested by goodness-of-fit statistics. The lack of items exhibiting DIF is encouraging in a contemporary high-stakes admission setting where candidates of diverse personal, cultural and academic backgrounds are assessed by common means. This IRT approach has potential to provide assessment designers with a quality control procedure that extends to the level of checklist items.
Arraras, Juan Ignacio; Wintner, Lisa M; Sztankay, Monika; Tomaszewski, Krzysztof A; Hofmeister, Dirk; Costantini, Anna; Bredart, Anne; Young, Teresa; Kuljanic, Karin; Tomaszewska, Iwona M; Kontogianni, Meropi; Chie, Wei-Chu; Kulis, Dagmara; Greimel, Eva
2017-05-01
Communication between patients and professionals is one major aspect of the support offered to cancer patients. The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group (QLG) has developed a cancer-specific instrument for the measurement of different issues related to the communication between cancer patients and their health care professionals. Questionnaire development followed the EORTC QLG Module Development Guidelines. A provisional questionnaire was pre-tested (phase III) in a multicenter study within ten countries from five cultural areas (Northern and South Europe, UK, Poland and Taiwan). Patients from seven subgroups (before, during and after treatment, for localized and advanced disease each, plus palliative patients) were recruited. Structured interviews were conducted. Qualitative and quantitative analyses have been performed. One hundred forty patients were interviewed. Nine items were deleted and one shortened. Patients' comments had a key role in item selection. No item was deleted due to just quantitative criteria. Consistency was observed in patients' answers across cultural areas. The revised version of the module EORTC QLQ-COMU26 has 26 items, organized in 6 scales and 4 individual items. The EORTC COMU26 questionnaire can be used in daily clinical practice and research, in various patient groups from different cultures. The next step will be an international field test with a large heterogeneous group of cancer patients.
Advanced quantitative measurement methodology in physics education research
NASA Astrophysics Data System (ADS)
Wang, Jing
The ultimate goal of physics education research (PER) is to develop a theoretical framework to understand and improve the learning process. In this journey of discovery, assessment serves as our headlamp and alpenstock. It sometimes detects signals in student mental structures, and sometimes presents the difference between expert understanding and novice understanding. Quantitative assessment is an important area in PER. Developing research-based effective assessment instruments and making meaningful inferences based on these instruments have always been important goals of the PER community. Quantitative studies are often conducted to provide bases for test development and result interpretation. Statistics are frequently used in quantitative studies. The selection of statistical methods and interpretation of the results obtained by these methods shall be connected to the education background. In this connecting process, the issues of educational models are often raised. Many widely used statistical methods do not make assumptions on the mental structure of subjects, nor do they provide explanations tailored to the educational audience. There are also other methods that consider the mental structure and are tailored to provide strong connections between statistics and education. These methods often involve model assumption and parameter estimation, and are complicated mathematically. The dissertation provides a practical view of some advanced quantitative assessment methods. The common feature of these methods is that they all make educational/psychological model assumptions beyond the minimum mathematical model. The purpose of the study is to provide a comparison between these advanced methods and the pure mathematical methods. The comparison is based on the performance of the two types of methods under physics education settings. In particular, the comparison uses both physics content assessments and scientific ability assessments. The dissertation includes three parts. The first part involves the comparison between item response theory (IRT) and classical test theory (CTT). The two theories both provide test item statistics for educational inferences and decisions. The two theories are both applied to Force Concept Inventory data obtained from students enrolled in The Ohio State University. Effort was made to examine the similarity and difference between the two theories, and the possible explanation to the difference. The study suggests that item response theory is more sensitive to the context and conceptual features of the test items than classical test theory. The IRT parameters provide a better measure than CTT parameters for the educational audience to investigate item features. The second part of the dissertation is on the measure of association for binary data. In quantitative assessment, binary data is often encountered because of its simplicity. The current popular measures of association fail under some extremely unbalanced conditions. However, the occurrence of these conditions is not rare in educational data. Two popular association measures, the Pearson's correlation and the tetrachoric correlation are examined. A new method, model based association is introduced, and an educational testing constraint is discussed. The existing popular methods are compared with the model based association measure with and without the constraint. Connections between the value of association and the context and conceptual features of questions are discussed in detail. Results show that all the methods have their advantages and disadvantages. Special attention to the test and data conditions is necessary. The last part of the dissertation is focused on exploratory factor analysis (EFA). The theoretical advantages of EFA are discussed. Typical misunderstanding and misusage of EFA are explored. The EFA is performed on Lawson's Classroom Test of Scientific Reasoning (LCTSR), a widely used assessment on scientific reasoning skills. The reasoning ability structures for U.S. and Chinese students at different educational levels are given by the analysis. A final discussion on the advanced quantitative assessment methodology and the pure mathematical methodology is presented at the end.
Methodology for developing and evaluating the PROMIS smoking item banks.
Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando
2014-09-01
This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Bender, Andrew R.; Raz, Naftali
2012-01-01
Ability to form new associations between unrelated items is particularly sensitive to aging, but the reasons for such differential vulnerability are unclear. In this study, we examined the role of objective and subjective factors (working memory and beliefs about memory strategies) on differential relations of age with recognition of items and associations. Healthy adults (N = 100, age 21 to 79) studied word pairs, completed item and association recognition tests, and rated the effectiveness of shallow (e.g., repetition) and deep (e.g., imagery or sentence generation) encoding strategies. Advanced age was associated with reduced working memory (WM) capacity and poorer associative recognition. In addition, reduced WM capacity, beliefs in the utility of ineffective encoding strategies, and lack of endorsement of effective ones were independently associated with impaired associative memory. Thus, maladaptive beliefs about memory in conjunction with reduced cognitive resources account in part for differences in associative memory commonly attributed to aging. PMID:22251381
A practical guide to assessing clinical decision-making skills using the key features approach.
Farmer, Elizabeth A; Page, Gordon
2005-12-01
This paper in the series on professional assessment provides a practical guide to writing key features problems (KFPs). Key features problems test clinical decision-making skills in written or computer-based formats. They are based on the concept of critical steps or 'key features' in decision making and represent an advance on the older, less reliable patient management problem (PMP) formats. The practical steps in writing these problems are discussed and illustrated by examples. Steps include assembling problem-writing groups, selecting a suitable clinical scenario or problem and defining its key features, writing the questions, selecting question response formats, preparing scoring keys, reviewing item quality and item banking. The KFP format provides educators with a flexible approach to testing clinical decision-making skills with demonstrated validity and reliability when constructed according to the guidelines provided.
Antunes, Bárbara; Murtagh, Fliss; Bausewein, Claudia; Harding, Richard; Higginson, Irene J
2015-02-01
Depression is common among patients with advanced disease but often difficult to detect. To assess the Palliative care Outcome Scale (POS) (10 items) against the Geriatric Depression Scale (GDS)-10 total score and the Hospital Anxiety and Depression Scale (HADS)-Depression subscale total score and determine if the POS has appropriate items to screen for depression among people with advanced disease. This was a secondary analysis performed on five studies. Four psychometric properties were assessed: data quality, scaling assumptions, acceptability, and internal consistency (reliability). Receiver operating characteristic (ROC) curves were used to determine the area under the curve. Sensitivity, specificity, positive and negative predictive values, false positive and negative rates, and positive and negative likelihood ratios were computed. The overall sample had 416 patients from Germany and England: 144 had cancer and 267 had nonmalignant conditions. Prevalence of depression across the sample was 17.5%. Floor and ceiling effects were rare. Cronbach's alpha coefficients for POS items 7 and 8 summed, GDS-10 and HADS-Depression items varied: 0.61 (heart failure) and 0.80 (cancer). Two items combined (Item 7-feeling depressed and Item 8-feeling good about yourself) consistently presented the highest area under the ROC curve, ranging from 0.76 (95% CI 0.60, 0.93) (Germany, lung cancer) to 0.97 (95% CI 0.91, 1.0) (heart failure), highest negative predictive value, and lowest false negative rate. For the overall sample, the cutoff 2/3 presented a negative predictive value of 89.4% (95% CI 84.7, 92.8) and false negative rate of 10.6 (95% CI 7.2, 15.3). POS items 7 and 8 summed are potentially useful to screen for depression in advanced disease populations. Copyright © 2015 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
24 CFR 242.48 - Insured advances for certain equipment and long lead items.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 24 Housing and Urban Development 2 2010-04-01 2010-04-01 false Insured advances for certain equipment and long lead items. 242.48 Section 242.48 Housing and Urban Development Regulations Relating to Housing and Urban Development (Continued) OFFICE OF ASSISTANT SECRETARY FOR HOUSING-FEDERAL HOUSING...
Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A
2018-06-01
Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018, 11: 928-941. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical techniques were applied to identify the best final items. The EDI is unique because it captures common emotional problems in ASD and is appropriate for both nonverbal and verbal youth. It is an efficient and sensitive measure for use in clinical assessments, monitoring, and research with youth with ASD. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
Cataudella, Danielle; Morley, Tara Elise; Nesin, April; Fernandez, Conrad V; Johnston, Donna Lynn; Sung, Lillian; Zelcer, Shayna
2014-10-01
There is currently no published, validated measures available that comprehensively capture quality of life (QoL) symptoms for children with poor-prognosis malignancies. The pediatric advanced care-quality of life scale (PAC-QoL) has been developed to address this gap. The current paper describes the first two phases in the development of this measure. The first two phases included: (1) construct and item generation, and (2) preliminary content validation. Domains of QoL relevant to this population were identified from the literature and items generated to capture each; items were then adapted to create versions sensitive to age/developmental differences. Two types of experts reviewed the draft PAC-QoL and rated items for relevance, understandability, and sensitivity of wording: bereaved parents (n = 8) and health care professionals (HCP; n = 7). Content validity was calculated using the index of content validity (CVI [Lynn. Nurs Res 1986;35:382-385]). One hundred and forty-one candidate items congruent with the domains identified as relevant to children with advanced malignancies were generated, and four report versions with a 5-choice response scale created. Parent mean scores for importance, understandability, and sensitivity of wording ranged from 4.29 (SD = 0.52) to 4.66 (SD = 0.50). The CVI ranged from 95% to 100%. These steps resulted in reductions of the PAC-QoL to 57-65 items, as well as a modification of the response scale to a 4-choice option with new anchors. The next phase of this study will be to conduct cognitive probing with the intended population to further modify and reduce candidate items prior to psychometric evaluation. © 2014 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Slater, Stephanie
2009-05-01
The Test Of Astronomy STandards (TOAST) assessment instrument is a multiple-choice survey tightly aligned to the consensus learning goals stated by the American Astronomical Society - Chair's Conference on ASTRO 101, the American Association of the Advancement of Science's Project 2061 Benchmarks, and the National Research Council's National Science Education Standards. Researchers from the Cognition in Astronomy, Physics and Earth sciences Research (CAPER) Team at the University of Wyoming's Science and Math Teaching Center (UWYO SMTC) have been conducting a question-by-question distractor analysis procedure to determine the sensitivity and effectiveness of each item. In brief, the frequency each possible answer choice, known as a foil or distractor on a multiple-choice test, is determined and compared to the existing literature on the teaching and learning of astronomy. In addition to having statistical difficulty and discrimination values, a well functioning assessment item will show students selecting distractors in the relative proportions to how we expect them to respond based on known misconceptions and reasoning difficulties. In all cases, our distractor analysis suggests that all items are functioning as expected. These results add weight to the validity of the Test Of Astronomy STandards (TOAST) assessment instrument, which is designed to help instructors and researchers measure the impact of course-length duration instructional strategies for undergraduate science survey courses with learning goals tightly aligned to the consensus goals of the astronomy education community.
Buiza, Cristina; Yanguas, Javier; Zulaica, Amaia; Antón, Iván; Arriola, Enrique; García, Alvaro
2018-04-13
Adaptation and validation to the Basque language of tests to assess advanced cognitive impairment is a not covered need for Basque-speaking people. The present work shows the validation of the Basque version of the Severe Mini Mental State Examination (SMMSE). A total of 109 people with advanced dementia (MEC<15) took part in the validation study, and were classified as GDS 5-7 on the Geriatric Depression Scale (GDS). All participants were Spanish-Basque bilingual. It was shown that SMMSE-eus has a high internal consistency (alpha=0.92), a good test-retest reliability (r=0.88; P<.01), and a high inter-rater reliability (CCI=0.99; P<.00) for the overall score, as well as for each item. Both the high internal consistency and inter-rater reliability, and to a lesser extent, test-retest reliability, made the SMMSE-eus a valid test for the brief assessment of cognitive status in people with advanced dementia in Basque-speaking people. For this reason, the SMMSE-eus is a usable and reliable alternative for assessing Basque-speaking people in their mother-tongue, or preferred language. Copyright © 2017 SEGG. Publicado por Elsevier España, S.L.U. All rights reserved.
Selecting Items for Criterion-Referenced Tests.
ERIC Educational Resources Information Center
Mellenbergh, Gideon J.; van der Linden, Wim J.
1982-01-01
Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Validation of Single-Item Screening Measures for Provider Burnout in a Rural Health Care Network.
Waddimba, Anthony C; Scribani, Melissa; Nieves, Melinda A; Krupa, Nicole; May, John J; Jenkins, Paul
2016-06-01
We validated three single-item measures for emotional exhaustion (EE) and depersonalization (DP) among rural physician/nonphysician practitioners. We linked cross-sectional survey data (on provider demographics, satisfaction, resilience, and burnout) with administrative information from an integrated health care network (1 academic medical center, 6 community hospitals, 31 clinics, and 19 school-based health centers) in an eight-county underserved area of upstate New York. In total, 308 physicians and advanced-practice clinicians completed a self-administered, multi-instrument questionnaire (65.1% response rate). Significant proportions of respondents reported high EE (36.1%) and DP (9.9%). In multivariable linear mixed models, scores on EE/DP subscales of the Maslach Burnout Inventory were regressed on each single-item measure. The Physician Work-Life Study's single-item measure (classifying 32.8% of respondents as burning out/completely burned out) was correlated with EE and DP (Spearman's ρ = .72 and .41, p < .0001; Kruskal-Wallis χ(2) = 149.9 and 56.5, p < .0001, respectively). In multivariable models, it predicted high EE (but neither low EE nor low/high DP). EE/DP single items were correlated with parent subscales (Spearman's ρ = .89 and .81, p < .0001; Kruskal-Wallis χ(2) = 230.98 and 197.84, p < .0001, respectively). In multivariable models, the EE item predicted high/low EE, whereas the DP item predicted only low DP. Therefore, the three single-item measures tested varied in effectiveness as screeners for EE/DP dimensions of burnout. © The Author(s) 2015.
Schlenstedt, Christian; Brombacher, Stephanie; Hartwigsen, Gesa; Weisser, Burkhard; Möller, Bettina; Deuschl, Günther
2016-04-01
The correct identification of patients with Parkinson disease (PD) at risk for falling is important to initiate appropriate treatment early. This study compared the Fullerton Advanced Balance (FAB) scale with the Mini-Balance Evaluation Systems Test (Mini-BESTest) and Berg Balance Scale (BBS) to identify individuals with PD at risk for falls and to analyze which of the items of the scales best predict future falls. This was a prospective study to assess predictive criterion-related validity. The study was conducted at a university hospital in an urban community. Eighty-five patients with idiopathic PD (Hoehn and Yahr stages: 1-4) participated in the study. Measures were number of falls (assessed prospectively over 6 months), FAB scale, Mini-BESTest, BBS, and Unified Parkinson's Disease Rating Scale. The FAB scale, Mini-BESTest, and BBS showed similar accuracy to predict future falls, with values for area under the curve (AUC) of the receiver operating characteristic (ROC) curve of 0.68, 0.65, and 0.69, respectively. A model combining the items "tandem stance," "rise to toes," "one-leg stance," "compensatory stepping backward," "turning," and "placing alternate foot on stool" had an AUC of 0.84 of the ROC curve. There was a dropout rate of 19/85 participants. The FAB scale, Mini-BESTest, and BBS provide moderate capacity to predict "fallers" (people with one or more falls) from "nonfallers." Only some items of the 3 scales contribute to the detection of future falls. Clinicians should particularly focus on the item "tandem stance" along with the items "one-leg stance," "rise to toes," "compensatory stepping backward," "turning 360°," and "placing foot on stool" when analyzing postural control deficits related to fall risk. Future research should analyze whether balance training including the aforementioned items is effective in reducing fall risk. © 2016 American Physical Therapy Association.
Esplen, Mary Jane; Cappelli, Mario; Wong, Jiahui; Bottorff, Joan L; Hunter, Jon; Carroll, June; Dorval, Michel; Wilson, Brenda; Allanson, Judith; Semotiuk, Kara; Aronson, Melyssa; Bordeleau, Louise; Charlemagne, Nicole; Meschino, Wendy
2013-01-01
Objectives To develop a brief, reliable and valid instrument to screen psychosocial risk among those who are undergoing genetic testing for Adult-Onset Hereditary Disease (AOHD). Design A prospective two-phase cohort study. Setting 5 genetic testing centres for AOHD, such as cancer, Huntington's disease or haemochromatosis, in ambulatory clinics of tertiary hospitals across Canada. Participants 141 individuals undergoing genetic testing were approached and consented to the instrument development phase of the study (Phase I). The Genetic Psychosocial Risk Instrument (GPRI) developed in Phase I was tested in Phase II for item refinement and validation. A separate cohort of 722 individuals consented to the study, 712 completed the baseline package and 463 completed all follow-up assessments. Most participants were female, at the mid-life stage. Individuals in advanced stages of the illness or with cognitive impairment or a language barrier were excluded. Interventions Phase I: GPRI items were generated from (1) a review of the literature, (2) input from genetic counsellors and (3) phase I participants. Phase II: further item refinement and validation were conducted with a second cohort of participants who completed the GPRI at baseline and were followed for psychological distress 1-month postgenetic testing results. Primary and secondary outcome measures GPRI, Hamilton Depression Rating Scale (HAM-D), Hamilton Anxiety Rating Scale (HAM-A), Brief Symptom Inventory (BSI) and Impact of Event Scale (IES). Results The final 20-item GPRI had a high reliability—Cronbach's α at 0.81. The construct validity was supported by high correlations between GPRI and BSI and IES. The predictive value was demonstrated by a receiver operating characteristic curve of 0.78 plotting GPRI against follow-up assessments using HAM-D and HAM-A. Conclusions With a cut-off score of 50, GPRI identified 84% of participants who displayed distress postgenetic testing results, supporting its potential usefulness in a clinical setting. PMID:23485718
Zargar, Homayoun; van den Bergh, Roderick; Moon, Daniel; Lawrentschuk, Nathan; Costello, Anthony; Murphy, Declan
2017-01-01
To assess the impact of the United States Preventive Services Task Force (USPTSTF) recommendations on prostate-specific antigen (PSA) testing, prostate biopsy, and prostatectomy in Australian men based on the available Medicare data. Events were identified using Medicare item numbers for PSA testing (66655, 66659), prostate biopsy (37219), prostatectomy (37210), and prostatectomy with lymph node dissection (37211). The occurrences of each procedure was queried per 100 000 capita for consecutive financial years over the period 2000-2015. For each item number, reports were also generated for all Australian States. For PSA testing the data was stratified into three age groups of 45-54, 55-64, and 65-74 years. For assessing the rate of prostatectomy the capita rate values for two item numbers of prostatectomy (37210) and prostatectomy with lymph node dissection (37211) were combined. Steady declines in per capita incidences of all five item numbers assessed were seen for the three consecutive financial years (2013-2015) since the publication of the USPTSTF recommendation statement. These declines were seen across all Australian States. When examining the rate of PSA testing for the three age brackets 45-54, 55-64, and 65-74 years, similar trends were identified. Since the introduction of the USPTSTF recommendation statement there has been a steady nationwide decline in per capita incidences of PSA testing, prostate biopsy, and prostatectomy based on the Australian Medicare data. Whether these declines are in the right direction toward reduction in over-diagnosis and overtreatment of clinically insignificant prostate cancer or stage migration toward more locally advanced disease due to lost opportunity in diagnosing and treating early clinically significant prostate cancer will remain to be seen. © 2016 The Authors BJU International © 2016 BJU International Published by John Wiley & Sons Ltd.
Analysis of the construct of dignity and content validity of the patient dignity inventory
2011-01-01
Background Maintaining dignity, the quality of being worthy of esteem or respect, is considered as a goal of palliative care. The aim of this study was to analyse the construct of personal dignity and to assess the content validity of the Patient Dignity Inventory (PDI) in people with an advance directive in the Netherlands. Methods Data were collected within the framework of an advance directives cohort study. This cohort study is aiming to get a better insight into how decisions are made at the end of life with regard to advance directives in the Netherlands. One half of the cohort (n = 2404) received an open-ended question concerning factors relevant to dignity. Content labels were assigned to issues mentioned in the responses to the open-ended question. The other half of the cohort (n = 2537) received a written questionnaire including the PDI. The relevance and comprehensiveness of the PDI items were assessed with the COSMIN checklist ('COnsensus-based Standards for the selection of health status Measurement INstruments'). Results The majority of the PDI items were found to be relevant for the construct to be measured, the study population, and the purpose of the study but the items were not completely comprehensive. The responses to the open-ended question indicated that communication and care-related aspects were also important for dignity. Conclusions This study demonstrated that the PDI items were relevant for people with an advance directive in the Netherlands. The comprehensiveness of the items can be improved by including items concerning communication and care. PMID:21682924
Analysis of the construct of dignity and content validity of the patient dignity inventory.
Albers, Gwenda; Pasman, H Roeline W; Rurup, Mette L; de Vet, Henrica C W; Onwuteaka-Philipsen, Bregje D
2011-06-19
Maintaining dignity, the quality of being worthy of esteem or respect, is considered as a goal of palliative care. The aim of this study was to analyse the construct of personal dignity and to assess the content validity of the Patient Dignity Inventory (PDI) in people with an advance directive in the Netherlands. Data were collected within the framework of an advance directives cohort study. This cohort study is aiming to get a better insight into how decisions are made at the end of life with regard to advance directives in the Netherlands. One half of the cohort (n = 2404) received an open-ended question concerning factors relevant to dignity. Content labels were assigned to issues mentioned in the responses to the open-ended question. The other half of the cohort (n = 2537) received a written questionnaire including the PDI. The relevance and comprehensiveness of the PDI items were assessed with the COSMIN checklist ('COnsensus-based Standards for the selection of health status Measurement INstruments'). The majority of the PDI items were found to be relevant for the construct to be measured, the study population, and the purpose of the study but the items were not completely comprehensive. The responses to the open-ended question indicated that communication and care-related aspects were also important for dignity. This study demonstrated that the PDI items were relevant for people with an advance directive in the Netherlands. The comprehensiveness of the items can be improved by including items concerning communication and care.
NASA Technical Reports Server (NTRS)
Fink, Patrick; Arndt, G. D.; Bondyopadhyay, P.; Shaw, Roland
1994-01-01
A communications experiment is described as a link between the Space Shuttle Orbiter (SSO) and the Advanced Communications Technology Satellite (ACTS). Breadboarding for this experiment has led to two items with potential for commercial application: a 1-Watt Ka-band amplifier and a Ka-band, circularly polarized microstrip antenna. Results of the hybrid Ka-band amplifier show gain at 30 dB and a saturated output power of 28.5 dBm. A second version comprised of MMIC amplifiers is discussed. Test results of the microstrip antenna subarray show a gain of approximately 13 dB and excellent circular polarization.
ERIC Educational Resources Information Center
Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.
2006-01-01
In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
Balsis, Steve; Choudhury, Tabina K; Geraci, Lisa; Benge, Jared F; Patrick, Christopher J
2018-04-01
Alzheimer's disease (AD) affects neurological, cognitive, and behavioral processes. Thus, to accurately assess this disease, researchers and clinicians need to combine and incorporate data across these domains. This presents not only distinct methodological and statistical challenges but also unique opportunities for the development and advancement of psychometric techniques. In this article, we describe relatively recent research using item response theory (IRT) that has been used to make progress in assessing the disease across its various symptomatic and pathological manifestations. We focus on applications of IRT to improve scoring, test development (including cross-validation and adaptation), and linking and calibration. We conclude by describing potential future multidimensional applications of IRT techniques that may improve the precision with which AD is measured.
ERIC Educational Resources Information Center
Matlock, Ki Lynn; Turner, Ronna
2016-01-01
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
ERIC Educational Resources Information Center
Spaan, Mary
2007-01-01
This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
ERIC Educational Resources Information Center
Hewitt, Margaret A.; Homan, Susan P.
2004-01-01
Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Retest of a Principal Components Analysis of Two Household Environmental Risk Instruments.
Oneal, Gail A; Postma, Julie; Odom-Maryon, Tamara; Butterfield, Patricia
2016-08-01
Household Risk Perception (HRP) and Self-Efficacy in Environmental Risk Reduction (SEERR) instruments were developed for a public health nurse-delivered intervention designed to reduce home-based, environmental health risks among rural, low-income families. The purpose of this study was to test both instruments in a second low-income population that differed geographically and economically from the original sample. Participants (N = 199) were recruited from the Women, Infants, and Children (WIC) program. Paper and pencil surveys were collected at WIC sites by research-trained student nurses. Exploratory principal components analysis (PCA) was conducted, and comparisons were made to the original PCA for the purpose of data reduction. Instruments showed satisfactory Cronbach alpha values for all components. HRP components were reduced from five to four, which explained 70% of variance. The components were labeled sensed risks, unseen risks, severity of risks, and knowledge. In contrast to the original testing, environmental tobacco smoke (ETS) items was not a separate component of the HRP. The SEERR analysis demonstrated four components explaining 71% of variance, with similar patterns of items as in the first study, including a component on ETS, but some differences in item location. Although low-income populations constituted both samples, differences in demographics and risk exposures may have played a role in component and item locations. Findings provided justification for changing or reducing items, and for tailoring the instruments to population-level risks and behaviors. Although analytic refinement will continue, both instruments advance the measurement of environmental health risk perception and self-efficacy. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
The Effect of the Position of an Item within a Test on the Item Difficulty Value.
ERIC Educational Resources Information Center
Rubin, Lois S.; Mott, David E. W.
An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
ERIC Educational Resources Information Center
Marie, S. Maria Josephine Arokia; Edannur, Sreekala
2015-01-01
This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
12 CFR 950.12 - Intradistrict transfer of advances.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 12 Banks and Banking 7 2010-01-01 2010-01-01 false Intradistrict transfer of advances. 950.12 Section 950.12 Banks and Banking FEDERAL HOUSING FINANCE BOARD FEDERAL HOME LOAN BANK ASSETS AND OFF-BALANCE SHEET ITEMS ADVANCES Advances to Members § 950.12 Intradistrict transfer of advances. (a) Advances...
ERIC Educational Resources Information Center
Wang, Wei
2013-01-01
Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
A summary of NASA/Air Force Full Scale Engine Research programs using the F100 engine
NASA Technical Reports Server (NTRS)
Deskin, W. J.; Hurrell, H. G.
1979-01-01
This paper summarizes a joint NASA/Air Force Full Scale Engine Research (FSER) program conducted with the F100 engine during the period 1974 through 1979. The program mechanism is described and the F100 test vehicles utilized are illustrated. Technology items which have been addressed in the areas of swirl augmentation, flutter phenomenon, advanced electronic control logic theory, strain gage technology, and distortion sensitivity are identified and the associated test programs conducted at the NASA-Lewis Research Center are described. Results presented show that the FSER approach, which utilizes existing state-of-the-art engine hardware to evaluate advanced technology concepts and problem areas, can contribute a significant data base for future system applications. Aerodynamic phenomenon previously not considered by current design systems have been identified and incorporated into current industry design tools.
Test item linguistic complexity and assessments for deaf students.
Cawthon, Stephanie
2011-01-01
Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
A Single-System Model Predicts Recognition Memory and Repetition Priming in Amnesia
Kessels, Roy P.C.; Wester, Arie J.; Shanks, David R.
2014-01-01
We challenge the claim that there are distinct neural systems for explicit and implicit memory by demonstrating that a formal single-system model predicts the pattern of recognition memory (explicit) and repetition priming (implicit) in amnesia. In the current investigation, human participants with amnesia categorized pictures of objects at study and then, at test, identified fragmented versions of studied (old) and nonstudied (new) objects (providing a measure of priming), and made a recognition memory judgment (old vs new) for each object. Numerous results in the amnesic patients were predicted in advance by the single-system model, as follows: (1) deficits in recognition memory and priming were evident relative to a control group; (2) items judged as old were identified at greater levels of fragmentation than items judged new, regardless of whether the items were actually old or new; and (3) the magnitude of the priming effect (the identification advantage for old vs new items) overall was greater than that of items judged new. Model evidence measures also favored the single-system model over two formal multiple-systems models. The findings support the single-system model, which explains the pattern of recognition and priming in amnesia primarily as a reduction in the strength of a single dimension of memory strength, rather than a selective explicit memory system deficit. PMID:25122896
Crows spontaneously exhibit analogical reasoning.
Smirnova, Anna; Zorina, Zoya; Obozova, Tanya; Wasserman, Edward
2015-01-19
Analogical reasoning is vital to advanced cognition and behavioral adaptation. Many theorists deem analogical thinking to be uniquely human and to be foundational to categorization, creative problem solving, and scientific discovery. Comparative psychologists have long been interested in the species generality of analogical reasoning, but they initially found it difficult to obtain empirical support for such thinking in nonhuman animals (for pioneering efforts, see [2, 3]). Researchers have since mustered considerable evidence and argument that relational matching-to-sample (RMTS) effectively captures the essence of analogy, in which the relevant logical arguments are presented visually. In RMTS, choice of test pair BB would be correct if the sample pair were AA, whereas choice of test pair EF would be correct if the sample pair were CD. Critically, no items in the correct test pair physically match items in the sample pair, thus demanding that only relational sameness or differentness is available to support accurate choice responding. Initial evidence suggested that only humans and apes can successfully learn RMTS with pairs of sample and test items; however, monkeys have subsequently done so. Here, we report that crows too exhibit relational matching behavior. Even more importantly, crows spontaneously display relational responding without ever having been trained on RMTS; they had only been trained on identity matching-to-sample (IMTS). Such robust and uninstructed relational matching behavior represents the most convincing evidence yet of analogical reasoning in a nonprimate species, as apes alone have spontaneously exhibited RMTS behavior after only IMTS training. Copyright © 2015 Elsevier Ltd. All rights reserved.
The Selection of Test Items for Decision Making with a Computer Adaptive Test.
ERIC Educational Resources Information Center
Spray, Judith A.; Reckase, Mark D.
The issue of test-item selection in support of decision making in adaptive testing is considered. The number of items needed to make a decision is compared for two approaches: selecting items from an item pool that are most informative at the decision point or selecting items that are most informative at the examinee's ability level. The first…
Tepe, Rodger; Tepe, Chabha
2015-03-01
To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. In this test-retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. The IL self-efficacy survey demonstrated good reliability (test-retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test-retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments.
A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.
ERIC Educational Resources Information Center
Lau, C. Allen; Wang, Tianyou
This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…
Semon, Natalie L.; Lating, Jeffrey M.; Everly, George S.; Perry, Charlene J.; Moore, Suzanne Straub; Mosley, Adrian M.; Thompson, Carol B.; Links, Jonathan M.
2014-01-01
Objectives Faculty and affiliates of the Johns Hopkins Preparedness and Emergency Response Research Center partnered with local health departments and faith-based organizations to develop a dual-intervention model of capacity-building for public mental health preparedness and community resilience. Project objectives included (1) determining the feasibility of the tri-partite collaborative concept; (2) designing, delivering, and evaluating psychological first aid (PFA) training and guided preparedness planning (GPP); and (3) documenting preliminary evidence of the sustainability and impact of the model. Methods We evaluated intervention effectiveness by analyzing pre- and post-training changes in participant responses on knowledge-acquisition tests administered to three urban and four rural community cohorts. Changes in percent of correct items and mean total correct items were evaluated. Criteria for model sustainability and impact were, respectively, observations of nonacademic partners engaging in efforts to advance post-project preparedness alliances, and project-attributable changes in preparedness-related practices of local or state governments. Results The majority (11 of 14) test items addressing technical or practical PFA content showed significant improvement; we observed comparable testing results for GPP training. Government and faith partners developed ideas and tools for sustaining preparedness activities, and numerous project-driven changes in local and state government policies were documented. Conclusions Results suggest that the model could be an effective approach to promoting public health preparedness and community resilience. PMID:25355980
McCabe, O Lee; Semon, Natalie L; Lating, Jeffrey M; Everly, George S; Perry, Charlene J; Moore, Suzanne Straub; Mosley, Adrian M; Thompson, Carol B; Links, Jonathan M
2014-01-01
Faculty and affiliates of the Johns Hopkins Preparedness and Emergency Response Research Center partnered with local health departments and faith-based organizations to develop a dual-intervention model of capacity-building for public mental health preparedness and community resilience. Project objectives included (1) determining the feasibility of the tri-partite collaborative concept; (2) designing, delivering, and evaluating psychological first aid (PFA) training and guided preparedness planning (GPP); and (3) documenting preliminary evidence of the sustainability and impact of the model. We evaluated intervention effectiveness by analyzing pre- and post-training changes in participant responses on knowledge-acquisition tests administered to three urban and four rural community cohorts. Changes in percent of correct items and mean total correct items were evaluated. Criteria for model sustainability and impact were, respectively, observations of nonacademic partners engaging in efforts to advance post-project preparedness alliances, and project-attributable changes in preparedness-related practices of local or state governments. The majority (11 of 14) test items addressing technical or practical PFA content showed significant improvement; we observed comparable testing results for GPP training. Government and faith partners developed ideas and tools for sustaining preparedness activities, and numerous project-driven changes in local and state government policies were documented. Results suggest that the model could be an effective approach to promoting public health preparedness and community resilience.
Knowledge of the ordinal position of list items in pigeons.
Scarf, Damian; Colombo, Michael
2011-10-01
Ordinal knowledge is a fundamental aspect of advanced cognition. It is self-evident that humans represent ordinal knowledge, and over the past 20 years it has become clear that nonhuman primates share this ability. In contrast, evidence that nonprimate species represent ordinal knowledge is missing from the comparative literature. To address this issue, in the present experiment we trained pigeons on three 4-item lists and then tested them with derived lists in which, relative to the training lists, the ordinal position of the items was either maintained or changed. Similar to the findings with human and nonhuman primates, our pigeons performed markedly better on the maintained lists compared to the changed lists, and displayed errors consistent with the view that they used their knowledge of ordinal position to guide responding on the derived lists. These findings demonstrate that the ability to acquire ordinal knowledge is not unique to the primate lineage. (PsycINFO Database Record (c) 2011 APA, all rights reserved).
Martirosov, Amber Lanae; Michael, Angela; McCarty, Melissa; Bacon, Opal; DiLodovico, John R; Jantz, Arin; Kostoff, Diana; MacDonald, Nancy C; Mikulandric, Nancy; Neme, Klodiana; Sulejmani, Nimisha; Summers, Bryant B
2018-05-29
The use of the ASHP Ambulatory Care Self-Assessment Tool to advance pharmacy practice at 8 ambulatory care clinics of a large academic medical center is described. The ASHP Ambulatory Care Self-Assessment Tool was developed to help ambulatory care pharmacists assess how their current practices align with the ASHP Practice Advancement Initiative. The Henry Ford Hospital Ambulatory Care Advisory Group (ACAG) opted to use the "Practitioner Track" sections of the tool to assess pharmacy practices within each of 8 ambulatory care clinics individually. The responses to self-assessment items were then compiled and discussed by ACAG members. The group identified best practices and ways to implement action items to advance ambulatory care practice throughout the institution. Three recommended action items were common to most clinics: (1) identify and evaluate solutions to deliver financially viable services, (2) develop technology to improve patient care, and (3) optimize the role of pharmacy technicians and support personnel. The ACAG leadership met with pharmacy administrators to discuss how action items that were both feasible and deemed likely to have a medium-to-high impact aligned with departmental goals and used this information to develop an ambulatory care strategic plan. This process informed and enabled initiatives to advance ambulatory care pharmacy practice within the system. The ASHP Ambulatory Care Self-Assessment Tool was useful in identifying opportunities for practice advancement in a large academic medical center. Copyright © 2018 by the American Society of Health-System Pharmacists, Inc. All rights reserved.
A Process for Reviewing and Evaluating Generated Test Items
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis
2016-01-01
Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…
ERIC Educational Resources Information Center
Banerjee, Jayanti; Papageorgiou, Spiros
2016-01-01
The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Identifying Core Competencies of Infection Control Nurse Specialists in Hong Kong.
Chan, Wai Fong; Bond, Trevor G; Adamson, Bob; Chow, Meyrick
2016-01-01
To confirm a core competency scale for Hong Kong infection control nurses at the advanced nursing practice level from the core competency items proposed in a previous phase of this study. This would serve as the foundation of competency assurance in Hong Kong hospitals. A cross-sectional survey design was used. All public and private hospitals in Hong Kong. All infection control nurses in hospitals of Hong Kong. The 83-item proposed core competency list established in an earlier study was transformed into a questionnaire and sent to 112 infection control nurses in 48 hospitals in Hong Kong. They were asked to rate the importance of each infection prevention and control item using Likert-style response categories. Data were analyzed using the Rasch model. The response rate of 81.25% was achieved. Seven items were removed from the proposed core competency list, leaving a scale of 76 items that fit the measurement requirements of the unidimensional Rasch model. Essential core competency items of advanced practice for infection control nurses in Hong Kong were identified based on the measurement criteria of the Rasch model. Several items of the scale that reflect local Hong Kong contextual characteristics are distinguished from the overseas standards. This local-specific competency list could serve as the foundation for education and for certification of infection control nurse specialists in Hong Kong. Rasch measurement is an appropriate analytical tool for identifying core competencies of advanced practice nurses in other specialties and in other locations in a manner that incorporates practitioner judgment and expertise.
Petry, Heidi; Suter-Riederer, Susanne; Kerker-Specker, Carmen; Imhof, Lorenz
2014-12-01
Patient centred and individually-tailored counselling of older people with a chronic condition who live at home is a useful intervention to support their independence. The paper presents the development and psychometric testing of the APN-BQ Instrument, to measure patient-centeredness. To measure the quality of an in-home counselling intervention, a 23-item questionnaire was developed and tested with 206 people 80 years and older. Principal component analysis with Varimax Rotation was conducted (n = 206). Analysis revealed a four factor (fs = 0.91) model scoring in 19 items. All factors loaded > 0.45. Cronbach's alpha was 0.86. The utility and acceptance of the instrument was confirmed by the high response rate (100 %) and the fact that participants answered 98.8 % of all questions. The APN-BQ has shown to be a reliable Instrument with good content and construct validity. It is a tool for APNs to measure structure, process, and outcome quality of a patient-centred and individually-tailored counselling program, including the degree of patient participation, and patient empowerment.
Item validity vs. item discrimination index: a redundancy?
NASA Astrophysics Data System (ADS)
Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.
2018-03-01
In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.
ERIC Educational Resources Information Center
Benson, Jeri; Wilson, Michael
Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
ERIC Educational Resources Information Center
Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem
2016-01-01
The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Measuring Knowledge of Introductory Psychology: What Are the Relevant Constructs?
ERIC Educational Resources Information Center
Milewski, Glenn B.; Patelis, Thanos
The 1999 Advanced Placement[R] (AP[R] Psychology Examination contains items drawn from 13 factors related to the study of psychology. This factor structure had not been explored previously. This study focuses on evaluating the fit of confirmatory factor analysis (CFA) models to examination items. Since examination items were dichotomous and…
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
ERIC Educational Resources Information Center
Sahin, Alper; Anil, Duygu
2017-01-01
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
[Perceptions on item disclosure for the Korean medical licensing examination].
Yang, Eunbae B
2015-09-01
This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
A Review of Classical Methods of Item Analysis.
ERIC Educational Resources Information Center
French, Christine L.
Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Modeling Item-Position Effects within an IRT Framework
ERIC Educational Resources Information Center
Debeer, Dries; Janssen, Rianne
2013-01-01
Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.
ERIC Educational Resources Information Center
Australian Council for Educational Research, Hawthorn.
The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
Parpa, Efi; Galanopoulou, Natasa; Tsilika, Eleni; Galanos, Antonis; Mystakidou, Kyriaki
2017-08-01
To investigate the psychometric properties of the Greek 13-item measure of patients' satisfaction (FAMCARE-P13) in palliative care setting. A hundred patients completed the FAMCARE-P13. Exploratory factor analysis and confirmatory factor analysis (CFA) have been conducted. Two factors' solution was revealed from CFA. The questionnaire was administered to an initial validation sample and then for test-retest in a sample of 40 patients 3 days later. The Rosenberg Self-Esteem Scale measuring global self-esteem has been also used as a gold standard for construct validity. Subscale and known groups validity have also been tested for FAMCARE-P13s' validity. A reduced 13-item version of our measure (FAMCARE-P13) possessed 2-factor structure with high reliability. Patient satisfaction was correlated with physical distress, communication and relationship with health-care providers, and caregiver satisfaction. We recommend the use of the Greek FAMCARE-P13 to assess care satisfaction of patients with advanced-stage cancer.
Finite Element Based Optimization of Material Parameters for Enhanced Ballistic Protection
NASA Astrophysics Data System (ADS)
Ramezani, Arash; Huber, Daniel; Rothe, Hendrik
2013-06-01
The threat imposed by terrorist attacks is a major hazard for military installations, vehicles and other items. The large amounts of firearms and projectiles that are available, pose serious threats to military forces and even civilian facilities. An important task for international research and development is to avert danger to life and limb. This work will evaluate the effect of modern armor with numerical simulations. It will also provide a brief overview of ballistic tests in order to offer some basic knowledge of the subject, serving as a basis for the comparison of simulation results. The objective of this work is to develop and improve the modern armor used in the security sector. Numerical simulations should replace the expensive ballistic tests and find vulnerabilities of items and structures. By progressively changing the material parameters, the armor is to be optimized. Using a sensitivity analysis, information regarding decisive variables is yielded and vulnerabilities are easily found and eliminated afterwards. To facilitate the simulation, advanced numerical techniques have been employed in the analyses.
Bolcic-Jankovic, Dragana; Lu, Fengxin; Colten, Mary Ellen; McCarthy, Ellen P
2016-02-01
We report the results from cognitive interviews with Asian American patients and their caregivers. We interviewed seven caregivers and six patients who were all bilingual Asian Americans. The main goal of the cognitive interviews was to test a survey instrument developed for a study about perspectives of Asian American patients with advanced cancer who are facing decisions around end-of-life care. We were particularly interested to see whether items commonly used in White and Black populations are culturally meaningful and equivalent in Asian populations, primarily those of Chinese and Vietnamese ethnicity. Our exploration shows that understanding respondents' language proficiency, degree of acculturation, and cultural context of receiving, processing, and communicating information about medical care can help design questions that are appropriate for Asian American patients and caregivers, and therefore can help researchers obtain quality data about the care Asian American cancer patients receive. © The Author(s) 2016.
Development and validation of the Body and Appearance Self-Conscious Emotions Scale (BASES).
Castonguay, Andrée L; Sabiston, Catherine M; Crocker, Peter R E; Mack, Diane E
2014-03-01
The purpose of these studies was to develop a psychometrically sound measure of shame, guilt, authentic pride, and hubristic pride for use in body and appearance contexts. In Study 1, 41 potential items were developed and assessed for item quality and comprehension. In Study 2, a panel of experts (N=8; M=11, SD=6.5 years of experience) reviewed the scale and items for evidence of content validity. Participants in Study 3 (n=135 males, n=300 females) completed the BASES and various body image, personality, and emotion scales. A separate sample (n=155; 35.5% male) in Study 3 completed the BASES twice using a two-week time interval. The BASES subscale scores demonstrated evidence for internal consistency, item-total correlations, concurrent, convergent, incremental, and discriminant validity, and 2-week test-retest reliability. The 4-factor solution was a good fit in confirmatory factor analysis, reflecting body-related shame, guilt, authentic and hubristic pride subscales of the BASES. The development and validation of the BASES may help advance body image and self-conscious emotion research by providing a foundation to examine the unique antecedents and outcomes of these specific emotional experiences. Copyright © 2014 Elsevier Ltd. All rights reserved.
Contraband detection using acoustic technology
NASA Astrophysics Data System (ADS)
George, Robert D.; Gauthier, Ronald D.; Denslow, Kayte D.; Cinson, Anthony M.; Diaz, Aaron A.; Griffin, Molly
2008-03-01
Maritime security personnel have a need for advanced technologies to address issues such as identification, confirmation or classification of substances and materials in sealed containers, both non-invasively and nondestructively in field and first response operations. Such substances include items such as hazardous/flammable liquids, drugs, contraband, and precursor chemicals used in the fabrication of illicit materials. Our initial efforts focused specifically on a commercial portable acoustic detector technology that was evaluated under operational conditions in a maritime environment. Technical/operational limitations were identified and enhancements were incorporated that would address these limitations. In this paper, application-specific improvements and performance testing/evaluation results will be described. Such enhancements will provide personnel/users of the detector a significantly more reliable method of screening materials for contraband items that might be hidden in cargo containers.
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Content validity of the NCCN-FACT ovarian symptom index-18 (NFOSI-18).
Jensen, Sally E; Kaiser, Karen; Lacson, Leilani; Schink, Julian; Cella, David
2015-02-01
This study examined the content validity of the NCCN-FACT Ovarian Symptom Index-18 (NFOSI-18), an advanced ovarian cancer symptom index comprised of symptoms perceived as most important by clinical experts and women with advanced ovarian cancer. Eighteen women with advanced ovarian cancer completed the NFOSI-18 and participated in cognitive interviews to assess: (a) the understandability of the NFOSI-18; and (b) the things patients have in mind when responding to the item, "I am bothered by side effects of treatment;" and (c) the interpretation patients place on items relating to fatigue and lack of energy. Interviews were recorded and transcribed for qualitative analysis. All but 2 (89%) participants indicated that each item was clear and understandable and the same proportion (89%) stated they were "very confident" or "confident" about providing accurate answers to all but one item. When responding to the item, "I am bothered by side effects of treatment," fatigue, nausea, and neuropathy constituted the most frequently mentioned concerns. Among the participants who were asked, eight participants responded that "fatigue" and "lack of energy" were the same concept and nine responded they were different. Participants associated "fatigue" with tiredness and associated "lack of energy" with the inability to perform daily tasks and activities. The findings support the content validity of the NFOSI-18. Item revisions, deletions or additions do not appear warranted. Future research can address the reliability and validity of the NFOSI-18 in clinical research. Copyright © 2014 Elsevier Inc. All rights reserved.
Methods to Develop the Eye-tem Bank to Measure Ophthalmic Quality of Life.
Khadka, Jyoti; Fenwick, Eva; Lamoureux, Ecosse; Pesudovs, Konrad
2016-12-01
There is an increasing demand for high-standard, comprehensive, and reliable patient-reported outcome (PRO) instruments in all the disciplines of health care including in ophthalmology and optometry. Over the past two decades, a plethora of PRO instruments have been developed to assess the impact of eye diseases and their treatments. Despite this large number of instruments, significant shortcomings exist for the measurement of ophthalmic quality of life (QoL). Most PRO instruments are short-form instruments designed for clinical use, but this limits their content coverage often poorly targeting any study population other than that which they were developed for. Also, existing instruments are static paper and pencil based and unable to be updated easily leading to outdated and irrelevant item content. Scores obtained from different PRO instruments may not be directly comparable. These shortcomings can be addressed using item banking implemented with computer-adaptive testing (CAT). Therefore, we designed a multicenter project (The Eye-tem Bank project) to develop and validate such PROs to enable comprehensive measurement of ophthalmic QoL in eye diseases. Development of the Eye-tem Bank follows four phases: Phase I, Content Development; Phase II, Pilot Testing and Item Calibration; Phase III, Validation; and Phase IV, Evaluation. This project will deliver technologically advanced comprehensive QoL PROs in the form of item banking implemented via a CAT system in eye diseases. Here, we present a detailed methodological framework of this project.
Brédart, A; Anota, A; Young, T; Tomaszewski, K A; Arraras, J I; Moura De Albuquerque Melo, H; Schmidt, H; Friend, E; Bergenmar, M; Costantini, A; Vassiliou, V; Hureaux, J; Marchal, F; Tomaszewska, I M; Chie, W-C; Ramage, J; Beaudeau, A; Conroy, T; Bleiker, E; Kulis, D; Bonnetain, F; Aaronson, N K
2018-01-01
Advances in cancer care delivery require revision and further development of questionnaires assessing patients' perceived quality of care. This study pre-tested the revised EORTC satisfaction with cancer care core questionnaire applicable in both the cancer inpatient and outpatient settings, and its new, outpatient-specific complementary module. The process of revision, development of the extended application, and pre-testing of these questionnaires was based on phases I to III of the "EORTC Quality of Life Group Module Development Guidelines." In phase III, patients in 11 countries in four European regions, South America and Asia completed provisional versions of the questionnaires. Fifty-seven relevant issues selected from literature reviews and input from experts were operationalized into provisional items, and subsequently translated into ten languages. Assessment of understanding, acceptability, redundancy and relevance by patients (n = 151) from oncology inpatient wards, and outpatient chemotherapy, radiotherapy and consultation settings, led to retention of, deletion of and merging of 40, 14 and 6 items respectively. Cronbach's alpha coefficients for hypothesized questionnaire scales were above 0.80. Our results provide preliminary support for the 33-item EORTC Satisfaction with cancer care core questionnaire and the 7-item complementary module specific for the outpatient care setting. A large scale phase IV cross-cultural psychometric study is now underway. © 2017 John Wiley & Sons Ltd.
Integral Color Anodizing of Aluminum Alloy 7075-T6 Upper Receivers of the M16A1 Rifle
1981-06-01
and control upper receivers were carried, fired, and maintained by soldiers in the field undergoing basic and advanced Infantry training and by other...soldiers undergoing Ranger training . The test and control items were subjected to typical field usage conditions involving rough handhng, firing...ICA hardcoat treatment will provide a longer inservice life for the M16A1 rifle receivers than will the low- temperature hardcoat process. 12
Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests
ERIC Educational Resources Information Center
van der Linden, Wim J.; Ariel, Adelaide; Veldkamp, Bernard P.
2006-01-01
Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory
ERIC Educational Resources Information Center
Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi
2016-01-01
High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Item Specifications, Science Grade 8. Blue Prints for Testing Minimum Performance Test.
ERIC Educational Resources Information Center
Arkansas State Dept. of Education, Little Rock.
These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Item Specifications, Science Grade 6. Blue Prints for Testing Minimum Performance Test.
ERIC Educational Resources Information Center
Arkansas State Dept. of Education, Little Rock.
These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Criterion-Referenced Test Items for Welding.
ERIC Educational Resources Information Center
Davis, Diane, Ed.
This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…
Jang, Yoonhee; Wixted, John T.; Pecher, Diane; Zeelenberg, René; Huber, David E.
2012-01-01
Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially non-retrievable items. In two experiments, an initial test determined item retrievability. Retrievable or non-retrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical crossover interaction between retention interval and practice type. For retrievable items, however, the crossover interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For non-retrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially non-retrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and non-retrievable items. PMID:22304454
Jang, Yoonhee; Wixted, John T; Pecher, Diane; Zeelenberg, René; Huber, David E
2012-01-01
Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially nonretrievable items. In two experiments, an initial test determined item retrievability. Retrievable or nonretrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical cross-over interaction between retention interval and practice type. For retrievable items, however, the cross-over interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For nonretrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially nonretrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and nonretrievable items.
Optimal Test Design with Rule-Based Item Generation
ERIC Educational Resources Information Center
Geerlings, Hanneke; van der Linden, Wim J.; Glas, Cees A. W.
2013-01-01
Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Criterion-Referenced Test Items for Small Engines.
ERIC Educational Resources Information Center
Herd, Amon
This notebook contains criterion-referenced test items for testing students' knowledge of small engines. The test items are based upon competencies found in the Missouri Small Engine Competency Profile. The test item bank is organized in 18 sections that cover the following duties: shop procedures; tools and equipment; fasteners; servicing fuel…
An Investigation of the Impact of Guessing on Coefficient α and Reliability
2014-01-01
Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
12 CFR 950.2 - Authorization and application for advances; obligation to repay advances.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 12 Banks and Banking 7 2010-01-01 2010-01-01 false Authorization and application for advances; obligation to repay advances. 950.2 Section 950.2 Banks and Banking FEDERAL HOUSING FINANCE BOARD FEDERAL HOME LOAN BANK ASSETS AND OFF-BALANCE SHEET ITEMS ADVANCES Advances to Members § 950.2 Authorization...
Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André
2016-01-01
Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Tepe, Rodger; Tepe, Chabha
2015-01-01
Objective To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. Methods In this test–retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. Results The IL self-efficacy survey demonstrated good reliability (test–retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test–retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). Conclusions This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments. PMID:25517736
Integrating Test-Form Formatting into Automated Test Assembly
ERIC Educational Resources Information Center
Diao, Qi; van der Linden, Wim J.
2013-01-01
Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis
2013-01-01
Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
48 CFR 32.408 - Application for advance payments.
Code of Federal Regulations, 2010 CFR
2010-10-01
... amount of advance payments. (4) The name and address of the financial institution at which the contractor... 48 Federal Acquisition Regulations System 1 2010-10-01 2010-10-01 false Application for advance... GENERAL CONTRACTING REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 32.408...
A Procedure To Detect Test Bias Present Simultaneously in Several Items.
ERIC Educational Resources Information Center
Shealy, Robin; Stout, William
A statistical procedure is presented that is designed to test for unidirectional test bias existing simultaneously in several items of an ability test, based on the assumption that test bias is incipient within the two groups' ability differences. The proposed procedure--Simultaneous Item Bias (SIB)--is based on a multidimensional item response…
An Item Response Theory Model for Test Bias.
ERIC Educational Resources Information Center
Shealy, Robin; Stout, William
This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
ERIC Educational Resources Information Center
Quaigrain, Kennedy; Arhin, Ato Kwamina
2017-01-01
Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
Code of Federal Regulations, 2010 CFR
2010-10-01
... Chief Financial Officer, Division of Accounting and Finance, to ensure completeness of contractor... REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 2032.402 General. (a) The contracting... contract terms concerning advance payments. (b) Before authorizing any advance payment agreements, except...
42 CFR 421.214 - Advance payments to suppliers furnishing items or services under Part B.
Code of Federal Regulations, 2010 CFR
2010-10-01
... integrity investigation. (3) Has not submitted any claims. (4) Has not accepted claims' assignments within... must determine and issue advance payments based on some other methodology approved by CMS. (v) Advance...
ERIC Educational Resources Information Center
Snyder, James
2010-01-01
This dissertation research examined the changes in item RIT calibration that occurred when adding audio to a set of currently calibrated RIT items and then placing these new items as field test items in the modified assessments on the NWEA MAP test platform. The researcher used test results from over 600 students in the Poway School District in…
Galindo-Garre, Francisca; Hendriks, Simone A; Volicer, Ladislav; Smalbrugge, Martin; Hertogh, Cees M P M; van der Steen, Jenny T
2014-02-01
The Bedford Alzheimer Nursing-Severity Scale (BANS-S) assesses disease severity in patients with advanced Alzheimer's disease. Since Alzheimer is a progressive disease, studying the hierarchy of the items in the scale can be useful to evaluate the progression of the disease. Data from 164 Alzheimer's patients and 186 patients with other dementia were analyzed using the Mokken Scaling Methodology to determine whether respondents can be ordered in the trait dementia severity, and to study whether an ordering between the items exist. The scalability of the scale was evaluated by the H coefficient. Results showed that the BANS-S is a reliable and medium scale (0.4≤H<0.5) for the Alzheimer group. All items with the exception of the item about mobility could be ordered. When later item was eliminated from the scale, the H coefficient decreased indicating that the scalability of the scale in the original form is more accurate than in the shorter version. For the other dementia group, the BANS-S did not fit any of the Mokken Scaling models because the scale was not unidimensional. In this group, a shorter version of the scale without the sleeping cycle item and the mobility item has better reliability and scalability properties than the original scale.
Student science achievement and the integration of Indigenous knowledge on standardized tests
NASA Astrophysics Data System (ADS)
Dupuis, Juliann; Abrams, Eleanor
2017-09-01
In this article, we examine how American Indian students in Montana performed on standardized state science assessments when a small number of test items based upon traditional science knowledge from a cultural curriculum, "Indian Education for All", were included. Montana is the first state in the US to mandate the use of a culturally relevant curriculum in all schools and to incorporate this curriculum into a portion of the standardized assessment items. This study compares White and American Indian student test scores on these particular test items to determine how White and American Indian students perform on culturally relevant test items compared to traditional standard science test items. The connections between student achievement on adapted culturally relevant science test items versus traditional items brings valuable insights to the fields of science education, research on student assessments, and Indigenous studies.
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items
ERIC Educational Resources Information Center
Aybek, Eren Can; Demirtasli, R. Nukhet
2017-01-01
This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
An Effect Size Measure for Raju's Differential Functioning for Items and Tests
ERIC Educational Resources Information Center
Wright, Keith D.; Oshima, T. C.
2015-01-01
This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah
2011-01-01
Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Influence of Fallible Item Parameters on Test Information During Adaptive Testing.
ERIC Educational Resources Information Center
Wetzel, C. Douglas; McBride, James R.
Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…
A Guide to Item Banking in Education. (Third Edition).
ERIC Educational Resources Information Center
Naccarato, Richard W.
The current status of banks of test items existing across the United States was determined through a survey conducted between September and December 1987. Item "bank" in this context does not imply that the test items are available in computerized form, but simply that "deposited" test items can be withdrawn for use. Emphasis…
48 CFR 970.3101-9 - Advance agreements.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 5 2013-10-01 2013-10-01 false Advance agreements. 970....3101-9 Advance agreements. (i) At any time, in accordance with the contract terms and conditions, the contracting officer may pursue an advance agreement in connection with any cost item under a contract. ...
48 CFR 970.3101-9 - Advance agreements.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Advance agreements. 970....3101-9 Advance agreements. (i) At any time, in accordance with the contract terms and conditions, the contracting officer may pursue an advance agreement in connection with any cost item under a contract. ...
48 CFR 970.3101-9 - Advance agreements.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Advance agreements. 970....3101-9 Advance agreements. (i) At any time, in accordance with the contract terms and conditions, the contracting officer may pursue an advance agreement in connection with any cost item under a contract. ...
Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.
Chen, Senlin; Zhu, Xihe; Kang, Minsoo
2017-05-01
A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.
NASA Astrophysics Data System (ADS)
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-12-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Machine Shop. Criterion-Referenced Test (CRT) Item Bank.
ERIC Educational Resources Information Center
Davis, Diane, Ed.
This drafting criterion-referenced test item bank is keyed to the machine shop competency profile developed by industry and education professionals in Missouri. The 16 references used for drafting the test items are listed. Test items are arranged under these categories: orientation to machine shop; performing mathematical calculations; performing…
Rescuing Computerized Testing by Breaking Zipf's Law.
ERIC Educational Resources Information Center
Wainer, Howard
2000-01-01
Suggests that because of the nonlinear relationship between item usage and item security, the problems of test security posed by continuous administration of standardized tests cannot be resolved merely by increasing the size of the item pool. Offers alternative strategies to overcome these problems, distributing test items so as to avoid the…
Hoben, Matthias; Bär, Marion; Mahler, Cornelia; Berger, Sarah; Squires, Janet E; Estabrooks, Carole A; Kruse, Andreas; Behrens, Johann
2014-01-31
To study the association between organizational context and research utilization in German residential long term care (LTC), we translated three Canadian assessment instruments: the Alberta Context Tool (ACT), Estabrooks' Kinds of Research Utilization (RU) items and the Conceptual Research Utilization Scale. Target groups for the tools were health care aides (HCAs), registered nurses (RNs), allied health professionals (AHPs), clinical specialists and care managers. Through a cognitive debriefing process, we assessed response processes validity-an initial stage of validity, necessary before more advanced validity assessment. We included 39 participants (16 HCAs, 5 RNs, 7 AHPs, 5 specialists and 6 managers) from five residential LTC facilities. We created lists of questionnaire items containing problematic items plus items randomly selected from the pool of remaining items. After participants completed the questionnaires, we conducted individual semi-structured cognitive interviews using verbal probing. We asked participants to reflect on their answers for list items in detail. Participants' answers were compared to concept maps defining the instrument concepts in detail. If at least two participants gave answers not matching concept map definitions, items were revised and re-tested with new target group participants. Cognitive debriefings started with HCAs. Based on the first round, we modified 4 of 58 ACT items, 1 ACT item stem and all 8 items of the RU tools. All items were understood by participants after another two rounds. We included revised HCA ACT items in the questionnaires for the other provider groups. In the RU tools for the other provider groups, we used different wording than the HCA version, as was done in the original English instruments. Only one cognitive debriefing round was needed with each of the other provider groups. Cognitive debriefing is essential to detect and respond to problematic instrument items, particularly when translating instruments for heterogeneous, less well educated provider groups such as HCAs. Cognitive debriefing is an important step in research tool development and a vital component of establishing response process validity evidence. Publishing cognitive debriefing results helps researchers to determine potentially critical elements of the translated tools and assists with interpreting scores.
2014-01-01
Background To study the association between organizational context and research utilization in German residential long term care (LTC), we translated three Canadian assessment instruments: the Alberta Context Tool (ACT), Estabrooks’ Kinds of Research Utilization (RU) items and the Conceptual Research Utilization Scale. Target groups for the tools were health care aides (HCAs), registered nurses (RNs), allied health professionals (AHPs), clinical specialists and care managers. Through a cognitive debriefing process, we assessed response processes validity–an initial stage of validity, necessary before more advanced validity assessment. Methods We included 39 participants (16 HCAs, 5 RNs, 7 AHPs, 5 specialists and 6 managers) from five residential LTC facilities. We created lists of questionnaire items containing problematic items plus items randomly selected from the pool of remaining items. After participants completed the questionnaires, we conducted individual semi-structured cognitive interviews using verbal probing. We asked participants to reflect on their answers for list items in detail. Participants’ answers were compared to concept maps defining the instrument concepts in detail. If at least two participants gave answers not matching concept map definitions, items were revised and re-tested with new target group participants. Results Cognitive debriefings started with HCAs. Based on the first round, we modified 4 of 58 ACT items, 1 ACT item stem and all 8 items of the RU tools. All items were understood by participants after another two rounds. We included revised HCA ACT items in the questionnaires for the other provider groups. In the RU tools for the other provider groups, we used different wording than the HCA version, as was done in the original English instruments. Only one cognitive debriefing round was needed with each of the other provider groups. Conclusion Cognitive debriefing is essential to detect and respond to problematic instrument items, particularly when translating instruments for heterogeneous, less well educated provider groups such as HCAs. Cognitive debriefing is an important step in research tool development and a vital component of establishing response process validity evidence. Publishing cognitive debriefing results helps researchers to determine potentially critical elements of the translated tools and assists with interpreting scores. PMID:24479645
2010-04-07
Commercialization Pilot Programs – Portable Fuel Analyzer – Non-woven FR Materials – Automatic Test Equipment – Night Vision Fusion • Significant efforts – Sensing...contract with the government". Advertising material , commercial item offer, or contribution, as defined in FAR 15.601 shall not be considered to...systems through the entire lifecycle. Our portfolio includes; •Individual & crew-served weapons ranging from 9 mm handguns to 87mm mortar systems
ERIC Educational Resources Information Center
Ito, Kyoko; Sykes, Robert C.
This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
ERIC Educational Resources Information Center
Atalmis, Erkan Hasan
2016-01-01
Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
Soleimani, Farin; Azari, Nadia; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud
2016-10-01
Advances in perinatal and neonatal care have substantially improved the survival of at-risk infants over the past two decades. The purpose of this study was to assess the reliability and validity of the Bayley Scales of infant and toddler developmental Screening test in Persian-speaking children. This was a cross-sectional prospective study of 403 children aged 1 - 42-months. The Bayley scales screening instrument, which consists of five domains (cognitive, receptive, and expressive communication and fine and gross motor items), was used to measure infants' and toddlers' development. The psychometric properties examined included the face and content validity of the scale, in addition to cultural and linguistic modifications to the scale and its test-retest and inter-rater reliability. An expert team changed some of the test items relating to cultural and linguistic issues. In almost all the age groups, cultural or linguistic changes were made to items in the communication domains. According to Cronbach's alpha for internal consistency, the reliability of the cognitive scale was r = 0.79, and the reliability of the receptive scale was r = 0.76. The reliability for expressive communication, fine motor, and gross motor scales was r = 0.81, r = 0.80, and r = 0.81, respectively. The construct validity of the tests was confirmed using a factor analysis and comparison of the mean scores of the age groups. The intra- and inter-rater reliabilities of the Bayley Scales were good-to-excellent. The results indicated that the Bayley Scales had a high level of reliability in the present study. Thus, the scale can be used in a Persian population.
Item difficulty and item validity for the Children's Group Embedded Figures Test.
Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S
1994-02-01
The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
1976-01-01
items. The items tested were the MODI-PAC, a proprietary item of Reming)on Arms Company, a standard 12 - gauge round of No. 4 lead shot, and an...to refrain from testing this item. Therefore, the final selection of items for testing were (1) the MODI-PAC, (2) a standard 12 - gauge shotgun round of...The first item evaluated was the MODI-PAC5. The MOQ1-PAC which standsfor “modified impact “ is a 12 - gauge shotgun shell loaded with approximately 320
Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.
ERIC Educational Resources Information Center
Commons, C., Ed.; Martin, P., Ed.
Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…
Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.
ERIC Educational Resources Information Center
Commons, C., Ed.; Martin, P., Ed.
The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…
Interactions Between Item Content And Group Membership on Achievement Test Items.
ERIC Educational Resources Information Center
Linn, Robert L.; Harnisch, Delwyn L.
The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.
ERIC Educational Resources Information Center
Hertz, Norman R.; Chinn, Roberta N.
This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H
2018-01-23
Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item PRISMA diagnostic test accuracy checklist provides specific guidance for reporting of systematic reviews. The PRISMA diagnostic test accuracy guideline can facilitate the transparent reporting of reviews, and may assist in the evaluation of validity and applicability, enhance replicability of reviews, and make the results from systematic reviews of diagnostic test accuracy studies more useful.
Zhou, Guiyun; Stoltzfus, Jill C; Houldin, Arlene D; Parks, Susan M; Swan, Beth Ann
2010-11-01
To establish initial reliability and validity of a Web-based survey focused on oncology advanced practice nurses' (APNs') knowledge, attitudes, and practice behaviors regarding advanced care planning, and to obtain preliminary understanding of APNs' knowledge, attitudes, and practice behaviors and perceived barriers to advanced care planning. Descriptive, cross-sectional, pilot survey study. The eastern United States. 300 oncology APNs. Guided by the Theory of Planned Behavior, a knowledge, attitudes, and practice behaviors survey was developed and reviewed for content validity. The survey was distributed to 300 APNs via e-mail and sent again to the 89 APNs who responded to the initial survey. Exploratory factor analysis was used to examine the construct validity and test-retest reliability of the survey's attitudinal and practice behavior portions. Respondents' demographics, knowledge, attitudes, practice behaviors, and perceived barriers to advanced care planning practice. Exploratory factor analysis yielded a five-factor solution from the survey's attitudes and practice behavior portions with internal consistency using Cronbach alpha. Respondents achieved an average of 67% correct answers in the 12-item knowledge section and scored positively in attitudes toward advanced care planning. Their practice behavior scores were marginally positive. The most common reported barriers were from patients' and families' as well as physicians' reluctance to discuss advanced care planning. The attitudinal and practice behaviors portions of the survey demonstrated preliminary construct validity and test-retest reliability. Regarding advanced care planning, respondents were moderately knowledgeable, but their advanced care planning practice was not routine. Validly assessing oncology APNs' knowledge, attitudes, and practice behaviors regarding advanced care planning will enable more tailored approaches to improve end-of-life care outcomes.
Worldwide Emerging Environmental Issues Affecting the U.S. Military
2010-11-01
in Cancun , Mexico, November 29-December 10, 2010, expectations of reaching agreement for a post-Kyoto greenhouse gas emissions treaty are low...and analysis of this report. Expanded details for some items are in the Appendix beginning on page 13. Item 1. NATO’s New Strategic Concept...by Diminishing Low-Cost Phosphorus…………………..2 Item 4. Renewed Protection for Refugees in Latin America………………………………….2 Item 5. Technological Advances
An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing
ERIC Educational Resources Information Center
Han, Kyung T.
2012-01-01
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
ERIC Educational Resources Information Center
Arendasy, Martin E.; Sommer, Markus
2012-01-01
The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot
ERIC Educational Resources Information Center
Magis, David; Facon, Bruno
2013-01-01
Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Conrad, Martina; Engelmann, Dorit; Friedrich, Michael; Scheffold, Katharina; Philipp, Rebecca; Schulz-Kindermann, Frank; Härter, Martin; Mehnert, Anja; Koranyi, Susan
2018-04-13
There are only a few valid instruments measuring couples' communication in patients with cancer for German speaking countries. The Couple Communication Scale (CCS) represents an established instrument to assess couples' communication. However, there is no evidence regarding the psychometric properties of the German version of the CCS until now and the assumed one factor structure of the CCS was not verified for patients with advanced cancer yet. The CCS was validated as a part of the study "Managing cancer and living meaningfully" (CALM) on N=136 patients with advanced cancer (≥18 years, UICC-state III/IV). The psychometric properties of the scale were calculated (factor reliability, item reliability, average variance extracted [DEV]) and a confirmatory factor analysis was conducted (Maximum Likelihood Estimation). The concurrent validity was tested against symptoms of anxiety (GAD-7), depression (BDI-II) and attachment insecurity (ECR-M16). In the confirmatory factor analysis, the one factor structure showed a low, but acceptable model fit and explained on average 49% of every item's variance (DEV). The CCS has an excellent internal consistency (Cronbachs α=0,91) and was negatively associated with attachment insecurity (ECR-M16: anxiety: r=- 0,55, p<0,01; avoidance: r=- 0,42, p<0,01) as well as with anxiety (GAD-7: r=- 0,20, p<0,05) and depression (BDI-II: r=- 0,27, p<0,01). The CCS is a reliable and valid instrument measuring couples' communication in patients with advanced cancer. © Georg Thieme Verlag KG Stuttgart · New York.
Study on the Automatic Detection Method and System of Multifunctional Hydrocephalus Shunt
NASA Astrophysics Data System (ADS)
Sun, Xuan; Wang, Guangzhen; Dong, Quancheng; Li, Yuzhong
2017-07-01
Aiming to the difficulty of micro pressure detection and the difficulty of micro flow control in the testing process of hydrocephalus shunt, the principle of the shunt performance detection was analyzed.In this study, the author analyzed the principle of several items of shunt performance detection,and used advanced micro pressure sensor and micro flow peristaltic pump to overcome the micro pressure detection and micro flow control technology.At the same time,This study also puted many common experimental projects integrated, and successfully developed the automatic detection system for a shunt performance detection function, to achieve a test with high precision, high efficiency and automation.
Hardware-in-the-Loop Testing of Utility-Scale Wind Turbine Generators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schkoda, Ryan; Fox, Curtiss; Hadidi, Ramtin
2016-01-26
Historically, wind turbine prototypes were tested in the field, which was--and continues to be--a slow and expensive process. As a result, wind turbine dynamometer facilities were developed to provide a more cost-effective alternative to field testing. New turbine designs were tested and the design models were validated using dynamometers to drive the turbines in a controlled environment. Over the years, both wind turbine dynamometer testing and computer technology have matured and improved, and the two are now being joined to provide hardware-in-the-loop (HIL) testing. This type of testing uses a computer to simulate the items that are missing from amore » dynamometer test, such as grid stiffness, voltage, frequency, rotor, and hub. Furthermore, wind input and changing electric grid conditions can now be simulated in real time. This recent advance has greatly increased the utility of dynamometer testing for the development of wind turbine systems.« less
Jia, Lin-Zhi; Ya-Jun, Ma; Cao, Yi; Qian, Fen; Li, Xiang-Yu
2012-04-30
The quality index among "Medical Parasitology" exam papers and measured data for students in three majors from the university in 2010 were compared and analyzed. The exam papers were formed from the test item bank. The alpha reliability coefficients of the three exam papers were above 0.70. The knowledge structure and capacity structure of the exam papers were basically balanced. But the alpha reliability coefficients of the second major was the lowest, mainly due to quality of test items in the exam paper and the failure of revising the index of test item bank in time. This observation demonstrated that revising the test items and their index in the item bank according to the measured data can improve the quality of test item bank proposition and reduce the difference among exam papers.
The Role of Item Models in Automatic Item Generation
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis
2012-01-01
Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Item Review and the Rearrangement Procedure: Its Process and Its Results
ERIC Educational Resources Information Center
Papanastasiou, Elena C.
2005-01-01
Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive…
A Model-Based Method for Content Validation of Automatically Generated Test Items
ERIC Educational Resources Information Center
Zhang, Xinxin; Gierl, Mark
2016-01-01
The purpose of this study is to describe a methodology to recover the item model used to generate multiple-choice test items with a novel graph theory approach. Beginning with the generated test items and working backward to recover the original item model provides a model-based method for validating the content used to automatically generate test…
Optimal Bayesian Adaptive Design for Test-Item Calibration.
van der Linden, Wim J; Ren, Hao
2015-06-01
An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
NASA Astrophysics Data System (ADS)
Lee, Hyunho; Jeong, Seonghoon; Jo, Yunhui; Yoon, Myonggeun
2015-07-01
Quality assurance (QA) for medical linear accelerators is indispensable for appropriate cancer treatment. Some international organizations and advanced Western countries have provided QA guidelines for linear accelerators. Currently, QA regulations for linear accelerators in Korean hospitals specify a system in which each hospital stipulates its independent hospital-based protocols for QA procedures (HP_QAPs) and conducts QA based on those HP_QAPs while regulatory authorities verify whether items under those HP_QAPs have been performed. However, because this regulatory method cannot guarantee the quality of universal treatment and QA items with tolerance criteria are different in many hospitals, the presentation of standardized QA items and tolerance criteria is essential. In this study, QA items in HP_QAPs from various hospitals and those presented by international organizations, such as the International Atomic Energy Agency, the European Union, and the American Association of Physicist in Medicine, and by advanced Western countries, such as the USA, the UK, and Canada, were compared. Concordance rates between QA items for linear accelerators that were presented by the aforementioned organizations and those currently being implemented in Korean hospitals were shown to exhibit a daily QA of 50%, a weekly QA of 22%, a monthly QA of 43%, and an annual QA of 65%, and the overall concordance rates of all QA items were approximately 48%. In the comparison between QA items being implemented in Korean hospitals and those being implemented in advanced Western countries, concordance rates were shown to exhibit a daily QA of 50%, a weekly QA of 33%, a monthly QA of 60%, and an annual QA of 67%, and the overall concordance rates of all QA items were approximately 57%. The results of this study indicate that the HP_QAPs currently implemented by Korean hospitals as QA standards for linear accelerators used in radiation therapy do not meet international standards. If this problem is to be solved, national standardized QA items and procedures for linear accelerators need to be developed.
Marfeo, Elizabeth E.; Ni, Pengsheng; Haley, Stephen M.; Jette, Alan M.; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Brandt, Diane E.; Rasch, Elizabeth K.
2014-01-01
Objectives To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Design Cross-sectional. Setting Community. Participants Item pools of behavioral health functioning were developed, refined, and field-tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working due to mental or both mental and physical conditions. Interventions None. Main Outcome Measure Social Security Administration Behavioral Health (SSA-BH) measurement instrument Results Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, and social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the four scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these four distinct scales of function. Conclusion This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. PMID:23548542
Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Jette, Alan M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Brandt, Diane E; Rasch, Elizabeth K
2013-09-01
To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Cross-sectional. Community. Item pools of behavioral health functioning were developed, refined, and field tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working because of mental or both mental and physical conditions. None. Social Security Administration Behavioral Health (SSA-BH) measurement instrument. Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the 4 scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these 4 distinct scales of function. This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work-related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work-related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
State Assessment Program Item Banks: Model Language for Request for Proposals (RFP) and Contracts
ERIC Educational Resources Information Center
Swanson, Leonard C.
2010-01-01
This document provides recommendations for request for proposal (RFP) and contract language that state education agencies can use to specify their requirements for access to test item banks. An item bank is a repository for test items and data about those items. Item banks are used by state agency staff to view items and associated data; to…
NASA Technical Reports Server (NTRS)
Uhran, M. L.; Youngblood, W. W.; Georgekutty, T.; Fiske, M. R.; Wear, W. O.
1986-01-01
Taking advantage of the microgravity environment of space NASA has initiated the preliminary design of a permanently manned space station that will support technological advances in process science and stimulate the development of new and improved materials having applications across the commercial spectrum. Previous studies have been performed to define from the researcher's perspective, the requirements for laboratory equipment to accommodate microgravity experiments on the space station. Functional requirements for the identified experimental apparatus and support equipment were determined. From these hardware requirements, several items were selected for concept designs and subsequent formulation of development plans. This report documents the concept designs and development plans for two items of experiment apparatus - the Combustion Tunnel and the Advanced Modular Furnace, and two items of support equipment the Laser Diagnostic System and the Integrated Electronics Laboratory. For each concept design, key technology developments were identified that are required to enable or enhance the development of the respective hardware.
The Impact of Receiving the Same Items on Consecutive Computer Adaptive Test Administrations.
ERIC Educational Resources Information Center
O'Neill, Thomas; Lunz, Mary E.; Thiede, Keith
2000-01-01
Studied item exposure in a computerized adaptive test when the item selection algorithm presents examinees with questions they were asked in a previous test administration. Results with 178 repeat examinees on a medical technologists' test indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate…
ERIC Educational Resources Information Center
Saß, Steffani; Schütte, Kerstin
2016-01-01
Solving test items might require abilities in test-takers other than the construct the test was designed to assess. Item and student characteristics such as item format or reading comprehension can impact the test result. This experiment is based on cognitive theories of text and picture comprehension. It examines whether integration aids, which…
Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly
ERIC Educational Resources Information Center
Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.
2013-01-01
Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…
Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis; Li, Johnson
2013-01-01
The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…
Code of Federal Regulations, 2014 CFR
2014-10-01
... REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 32.404 Exclusions. (a) This subpart... equivalent amount of the applicable foreign currency); and (ii) The advance payment is required by the laws...
Code of Federal Regulations, 2010 CFR
2010-10-01
... REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 32.404 Exclusions. (a) This subpart... equivalent amount of the applicable foreign currency); and (ii) The advance payment is required by the laws...
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift
ERIC Educational Resources Information Center
Guo, Rui; Zheng, Yi; Chang, Hua-Hua
2015-01-01
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank
2010-10-01
The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Item Analysis in Introductory Economics Testing.
ERIC Educational Resources Information Center
Tinari, Frank D.
1979-01-01
Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)
NASA Astrophysics Data System (ADS)
Ilich, Maria O.
Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
NASA Astrophysics Data System (ADS)
Wren, David A.
The research presented in this dissertation culminated in a 10-item Thermochemistry Concept Inventory (TCI). The development of the TCI can be divided into two main phases: qualitative studies and quantitative studies. Both phases focused on the primary stakeholders of the TCI, college-level general chemistry instructors and students. Each phase was designed to collect evidence for the validity of the interpretations and uses of TCI testing data. A central use of TCI testing data is to identify student conceptual misunderstandings, which are represented as incorrect options of multiple-choice TCI items. Therefore, quantitative and qualitative studies focused heavily on collecting evidence at the item-level, where important interpretations may be made by TCI users. Qualitative studies included student interviews (N = 28) and online expert surveys (N = 30). Think-aloud student interviews (N = 12) were used to identify conceptual misunderstandings used by students. Novice response process validity interviews (N = 16) helped provide information on how students interpreted and answered TCI items and were the basis of item revisions. Practicing general chemistry instructors (N = 18), or experts, defined boundaries of thermochemistry content included on the TCI. Once TCI items were in the later stages of development, an online version of the TCI was used in expert response process validity survey (N = 12), to provide expert feedback on item content, format and consensus of the correct answer for each item. Quantitative studies included three phases: beta testing of TCI items (N = 280), pilot testing of the a 12-item TCI (N = 485), and a large data collection using a 10-item TCI ( N = 1331). In addition to traditional classical test theory analysis, Rasch model analysis was also used for evaluation of testing data at the test and item level. The TCI was administered in both formative assessment (beta and pilot testing) and summative assessment (large data collection), with items performing well in both. One item, item K, did not have acceptable psychometric properties when the TCI was used as a quiz (summative assessment), but was retained in the final version of the TCI based on the acceptable psychometric properties displayed in pilot testing (formative assessment).
ERIC Educational Resources Information Center
Li, Yanmei
2012-01-01
In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…
Sinharay, Sandip
2017-09-01
Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
ERIC Educational Resources Information Center
McLeod, Lori D.; Lewis, Charles; Thissen, David.
With the increased use of computerized adaptive testing, which allows for continuous testing, new concerns about test security have evolved, one being the assurance that items in an item pool are safeguarded from theft. In this paper, the risk of score inflation and procedures to detect test takers using item preknowledge are explored. When test…
ERIC Educational Resources Information Center
Van Kuren, Lynda, Ed.
2001-01-01
Nine issues of the newsletter of the Council for Exceptional Children (CEC) include articles, news items, meeting announcements, news items of individual divisions, and professional advancement opportunities. Some major articles are: (1) "Home Schooling--A Viable Alternative for Students with Special Needs" (2) "High Stakes Testing…
Payload software technology: Software technology development plan
NASA Technical Reports Server (NTRS)
1977-01-01
Programmatic requirements for the advancement of software technology are identified for meeting the space flight requirements in the 1980 to 1990 time period. The development items are described, and software technology item derivation worksheets are presented along with the cost/time/priority assessments.
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection
ERIC Educational Resources Information Center
Kim, Jihye; Oshima, T. C.
2013-01-01
In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
Item Response Theory Models for Performance Decline during Testing
ERIC Educational Resources Information Center
Jin, Kuan-Yu; Wang, Wen-Chung
2014-01-01
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Samejima Items in Multiple-Choice Tests: Identification and Implications
ERIC Educational Resources Information Center
Rahman, Nazia
2013-01-01
Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as "Samejima items") if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers…
Computerized Numerical Control Test Item Bank.
ERIC Educational Resources Information Center
Reneau, Fred; And Others
This guide contains 285 test items for use in teaching a course in computerized numerical control. All test items were reviewed, revised, and validated by incumbent workers and subject matter instructors. Items are provided for assessing student achievement in such aspects of programming and planning, setting up, and operating machines with…
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
ERIC Educational Resources Information Center
He, Yong
2013-01-01
Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd; Gerritz, Kalle
1990-01-01
Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Item Structural Properties as Predictors of Item Difficulty and Item Association.
ERIC Educational Resources Information Center
Solano-Flores, Guillermo
1993-01-01
Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
Investigating Item Exposure Control Methods in Computerized Adaptive Testing
ERIC Educational Resources Information Center
Ozturk, Nagihan Boztunc; Dogan, Nuri
2015-01-01
This study aims to investigate the effects of item exposure control methods on measurement precision and on test security under various item selection methods and item pool characteristics. In this study, the Randomesque (with item group sizes of 5 and 10), Sympson-Hetter, and Fade-Away methods were used as item exposure control methods. Moreover,…
ERIC Educational Resources Information Center
Lee, Woo-yeol; Cho, Sun-Joo
2017-01-01
Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Item Pool Design for an Operational Variable-Length Computerized Adaptive Test
ERIC Educational Resources Information Center
He, Wei; Reckase, Mark D.
2014-01-01
For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
ERIC Educational Resources Information Center
Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin
2017-01-01
In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…
ERIC Educational Resources Information Center
Nissan, Susan; And Others
One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju
2017-01-01
The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
The beneficial effect of testing: an event-related potential study
Bai, Cheng-Hua; Bridger, Emma K.; Zimmer, Hubert D.; Mecklinger, Axel
2015-01-01
The enhanced memory performance for items that are tested as compared to being restudied (the testing effect) is a frequently reported memory phenomenon. According to the episodic context account of the testing effect, this beneficial effect of testing is related to a process which reinstates the previously learnt episodic information. Few studies have explored the neural correlates of this effect at the time point when testing takes place, however. In this study, we utilized the ERP correlates of successful memory encoding to address this issue, hypothesizing that if the benefit of testing is due to retrieval-related processes at test then subsequent memory effects (SMEs) should resemble the ERP correlates of retrieval-based processing in their temporal and spatial characteristics. Participants were asked to learn Swahili-German word pairs before items were presented in either a testing or a restudy condition. Memory performance was assessed immediately and 1-day later with a cued recall task. Successfully recalling items at test increased the likelihood that items were remembered over time compared to items which were only restudied. An ERP subsequent memory contrast (later remembered vs. later forgotten tested items), which reflects the engagement of processes that ensure items are recallable the next day were topographically comparable with the ERP correlate of immediate recollection (immediately remembered vs. immediately forgotten tested items). This result shows that the processes which allow items to be more memorable over time share qualitatively similar neural correlates with the processes that relate to successful retrieval at test. This finding supports the notion that testing is more beneficial than restudying on memory performance over time because of its engagement of retrieval processes, such as the re-encoding of actively retrieved memory representations. PMID:26441577
48 CFR 32.405 - Applying Pub. L. 85-804 to advance payments under sealed bid contracts.
Code of Federal Regulations, 2010 CFR
2010-10-01
... advance payments under sealed bid contracts. 32.405 Section 32.405 Federal Acquisition Regulations System... Non-Commercial Items 32.405 Applying Pub. L. 85-804 to advance payments under sealed bid contracts. (a... provisions of law relating to contracts, as explained in 50.101-1(a), also include making advance payments...
The development of a science process assessment for fourth-grade students
NASA Astrophysics Data System (ADS)
Smith, Kathleen A.; Welliver, Paul W.
In this study, a multiple-choice test entitled the Science Process Assessment was developed to measure the science process skills of students in grade four. Based on the Recommended Science Competency Continuum for Grades K to 6 for Pennsylvania Schools, this instrument measured the skills of (1) observing, (2) classifying, (3) inferring, (4) predicting, (5) measuring, (6) communicating, (7) using space/time relations, (8) defining operationally, (9) formulating hypotheses, (10) experimenting, (11) recognizing variables, (12) interpreting data, and (13) formulating models. To prepare the instrument, classroom teachers and science educators were invited to participate in two science education workshops designed to develop an item bank of test questions applicable to measuring process skill learning. Participants formed writing teams and generated 65 test items representing the 13 process skills. After a comprehensive group critique of each item, 61 items were identified for inclusion into the Science Process Assessment item bank. To establish content validity, the item bank was submitted to a select panel of science educators for the purpose of judging item acceptability. This analysis yielded 55 acceptable test items and produced the Science Process Assessment, Pilot 1. Pilot 1 was administered to 184 fourth-grade students. Students were given a copy of the test booklet; teachers read each test aloud to the students. Upon completion of this first administration, data from the item analysis yielded a reliability coefficient of 0.73. Subsequently, 40 test items were identified for the Science Process Assessment, Pilot 2. Using the test-retest method, the Science Process Assessment, Pilot 2 (Test 1 and Test 2) was administered to 113 fourth-grade students. Reliability coefficients of 0.80 and 0.82, respectively, were ascertained. The correlation between Test 1 and Test 2 was 0.77. The results of this study indicate that (1) the Science Process Assessment, Pilot 2, is a valid and reliable instrument applicable to measuring the science process skills of students in grade four, (2) using educational workshops as a means of developing item banks of test questions is viable and productive in the test development process, and (3) involving classroom teachers and science educators in the test development process is educationally efficient and effective.
Michaelides, Michalis P.
2010-01-01
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230
Michaelides, Michalis P
2010-01-01
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
Raykov, Tenko; Marcoulides, George A
2016-04-01
The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
Locally Dependent Linear Logistic Test Model with Person Covariates
ERIC Educational Resources Information Center
Ip, Edward H.; Smits, Dirk J. M.; De Boeck, Paul
2009-01-01
The article proposes a family of item-response models that allow the separate and independent specification of three orthogonal components: item attribute, person covariate, and local item dependence. Special interest lies in extending the linear logistic test model, which is commonly used to measure item attributes, to tests with embedded item…
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items
ERIC Educational Resources Information Center
Penfield, Randall D.
2006-01-01
This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?
ERIC Educational Resources Information Center
Jackson, Evelyn W.; And Others
1994-01-01
Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).
ERIC Educational Resources Information Center
Australian Council for Educational Research, Hawthorn.
This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2010-01-01
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Electronics. Criterion-Referenced Test (CRT) Item Bank.
ERIC Educational Resources Information Center
Davis, Diane, Ed.
This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…
Auto Mechanics. Criterion-Referenced Test (CRT) Item Bank.
ERIC Educational Resources Information Center
Tannehill, Dana, Ed.
This document contains 546 criterion-referenced multiple choice and true or false test items for a course in auto mechanics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 35 units covering the…
Developing a Strategy for Using Technology-Enhanced Items in Large-Scale Standardized Tests
ERIC Educational Resources Information Center
Bryant, William
2017-01-01
As large-scale standardized tests move from paper-based to computer-based delivery, opportunities arise for test developers to make use of items beyond traditional selected and constructed response types. Technology-enhanced items (TEIs) have the potential to provide advantages over conventional items, including broadening construct measurement,…
Doig, Emmah; Prescott, Sarah; Fleming, Jennifer; Cornwell, Petrea; Kuipers, Pim
2016-01-01
To examine the internal reliability and test-retest reliability of the Client-Centeredness of Goal Setting (C-COGS) scale. The C-COGS scale was administered to 42 participants with acquired brain injury after completion of multidisciplinary goal planning. Internal reliability of scale items was examined using item-partial total correlations and Cronbach's α coefficient. The scale was readministered within a 1-mo period to a subsample of 12 participants to examine test-retest reliability by calculating exact and close percentage agreement for each item. After examination of item-partial total correlations, test items were revised. The revised items demonstrated stronger internal consistency than the original items. Preliminary evaluation of test-retest reliability was fair, with an average exact percent agreement across all test items of 67%. Findings support the preliminary reliability of the C-COGS scale as a tool to evaluate and promote client-centered goal planning in brain injury rehabilitation. Copyright © 2016 by the American Occupational Therapy Association, Inc.
Item-Writing Guidelines for Physics
ERIC Educational Resources Information Center
Regan, Tom
2015-01-01
A teacher learning how to write test questions (test items) will almost certainly encounter item-writing guidelines--lists of item-writing do's and don'ts. Item-writing guidelines usually are presented as applicable across all assessment settings. Table I shows some guidelines that I believe to be generally applicable and two will be briefly…
Unidimensional Interpretations for Multidimensional Test Items
ERIC Educational Resources Information Center
Kahraman, Nilufer
2013-01-01
This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…
Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967
Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A
2013-12-01
A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
Test Bias: An Objective Definition for Test Items.
ERIC Educational Resources Information Center
Durovic, Jerry J.
A test bias definition, applicable at the item-level of a test is presented. The definition conceptually equates test bias with measuring different things in different groups, and operationally equates test bias with a difference in item fit to the Rasch Model, greater than one, between groups. It is suggested that the proposed definition avoids…
2013-01-01
Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M
2013-03-04
Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Lazenby, Mark; Dixon, Jane; Bai, Mei; McCorkle, Ruth
2014-02-01
Distress screening guidelines call for rapid screening for emotional distress at the time of cancer diagnosis. The purpose of this study was to examine the distress thermometer's (DT) ability to screen in patients in treatment for advanced cancer who may be depressed. Using cross-sectional data collected from patients within 30 days of diagnosis with advanced cancer, this study used ROC analysis to determine the optimal-cutoff point of the distress thermometer (DT) for screening for depression as measured by the physician health questionnaire (PHQ)-9; inter-test reliability analysis to compare the DT with the PHQ-2 for screening in possible cases of depression, and multivariate analysis to examine associations among the DT emotional problem list (EPL) items with cases of depression. The average age of the 123 patients in the study was 59.9 (12.9) years. Seventy (56.9%) were female. All had Stage 3 or 4 cancers (40% gastrointestinal, 19% gynecologic, 20% head and neck, 21% lung). The mean DT score was 4 (2.7)/10; and 56 (43%) were depressed as measured by the PHQ-9 ≥ 5. The optimal DT cut-off score to screen in possible cases of depression was ≥ 2/10, with a sensitivity of .96, compared to a sensitivity of .32 of the PHQ-2 ≥ 2. Correlation coefficients for the DT ≥ 2 and the PHQ-2 with the PHQ-9 ≥ 5 were 0.4 and -0.2, respectively. EPL items associated with cases of depression were Depression (OR = 0.15, 0.02-0.85) and Sadness (OR = 0.21, 0.06-0.72). The optimal DT threshold for identifying possible cases of depression at the time of diagnosis is ≥ 2; this threshold is more sensitive than the PHQ-2 ≥ 2. EPL items may be used with the DT score to triage patients for evaluation.
Detecting Gender Bias Through Test Item Analysis
NASA Astrophysics Data System (ADS)
González-Espada, Wilson J.
2009-03-01
Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
Estimating Total-test Scores from Partial Scores in a Matrix Sampling Design.
ERIC Educational Resources Information Center
Sachar, Jane; Suppes, Patrick
It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
ERIC Educational Resources Information Center
Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D.
2012-01-01
Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…
Science Library of Test Items. Volume Two.
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
The second volume of test items in the Science Library of Test Items is intended as a resource to assist teachers in implementing and evaluating science courses in the first 4 years of Australian secondary school. The items were selected from questions submitted to the School Certificate Development Unit by teachers in New South Wales. Only the…
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.
ERIC Educational Resources Information Center
Brutten, Sheila R.; And Others
A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing
ERIC Educational Resources Information Center
Yen, Yung-Chin; Ho, Rong-Guey; Liao, Wen-Wei; Chen, Li-Ju
2012-01-01
In a test, the testing score would be closer to examinee's actual ability when careless mistakes were corrected. In CAT, however, changing the answer of one item in CAT might cause the following items no longer appropriate for estimating the examinee's ability. These inappropriate items in a reviewable CAT might in turn introduce bias in ability…
ERIC Educational Resources Information Center
Lau, C. Allen; Wang, Tianyou
The purposes of this study were to: (1) extend the sequential probability ratio testing (SPRT) procedure to polytomous item response theory (IRT) models in computerized classification testing (CCT); (2) compare polytomous items with dichotomous items using the SPRT procedure for their accuracy and efficiency; (3) study a direct approach in…
A Conditional Exposure Control Method for Multidimensional Adaptive Testing
ERIC Educational Resources Information Center
Finkelman, Matthew; Nering, Michael L.; Roussos, Louis A.
2009-01-01
In computerized adaptive testing (CAT), ensuring the security of test items is a crucial practical consideration. A common approach to reducing item theft is to define maximum item exposure rates, i.e., to limit the proportion of examinees to whom a given item can be administered. Numerous methods for controlling exposure rates have been proposed…
ERIC Educational Resources Information Center
Downing, Steven M.; Maatsch, Jack L.
To test the effect of clinically relevant multiple-choice item content on the validity of statistical discriminations of physicians' clinical competence, data were collected from a field test of the Emergency Medicine Examination, test items for the certification of specialists in emergency medicine. Two 91-item multiple-choice subscales were…
The Effect of Including or Excluding Students with Testing Accommodations on IRT Calibrations.
ERIC Educational Resources Information Center
Karkee, Thakur; Lewis, Dan M.; Barton, Karen; Haug, Carolyn
This study aimed to determine the degree to which the inclusion of accommodated students with disabilities in the calibration sample affects the characteristics of item parameters and the test results. Investigated were effects on test reliability, item fit to the applicable item response theory (IRT) model, item parameter estimates, and students'…
Scheijen, Jean L J M; Clevers, Egbert; Engelen, Lian; Dagnelie, Pieter C; Brouns, Fred; Stehouwer, Coen D A; Schalkwijk, Casper G
2016-01-01
The aim of this study was to validate an ultra-performance liquid chromatography tandem mass-spectrometry (UPLC-MS/MS) method for the determination of advanced glycation endproducts (AGEs) in food items and to analyze AGEs in a selection of food items commonly consumed in a Western diet. N(ε)-(carboxymethyl)lysine (CML), N(ε)-(1-carboxyethyl)lysine (CEL) and N(δ)-(5-hydro-5-methyl-4-imidazolon-2-yl)-ornithine (MG-H1) were quantified in the protein fractions of 190 food items using UPLC-MS/MS. Intra- and inter-day accuracy and precision were 2-29%. The calibration curves showed perfect linearity in water and food matrices. We found the highest AGE levels in high-heat processed nut or grain products, and canned meats. Fruits, vegetables, butter and coffee had the lowest AGE content. The described method proved to be suitable for the quantification of three major AGEs in food items. The presented dietary AGE database opens the possibility to further quantify actual dietary exposure to AGEs and to explore its physiological impact on human health. Copyright © 2015 Elsevier Ltd. All rights reserved.
Three controversies over item disclosure in medical licensure examinations.
Park, Yoon Soo; Yang, Eunbae B
2015-01-01
In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Online Calibration of Polytomous Items Under the Generalized Partial Credit Model
Zheng, Yi
2016-01-01
Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. PMID:29881063
Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms
ERIC Educational Resources Information Center
Debeer, Dries; Ali, Usama S.; van Rijn, Peter W.
2017-01-01
Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
Nickel and cobalt release from jewellery and metal clothing items in Korea.
Cheong, Seung Hyun; Choi, You Won; Choi, Hae Young; Byun, Ji Yeon
2014-01-01
In Korea, the prevalence of nickel allergy has shown a sharply increasing trend. Cobalt contact allergy is often associated with concomitant reactions to nickel, and is more common in Korea than in western countries. The aim of the present study was to investigate the prevalence of items that release nickel and cobalt on the Korean market. A total of 471 items that included 193 branded jewellery, 202 non-branded jewellery and 76 metal clothing items were sampled and studied with a dimethylglyoxime (DMG) test and a cobalt spot test to detect nickel and cobalt release, respectively. Nickel release was detected in 47.8% of the tested items. The positive rates in the DMG test were 12.4% for the branded jewellery, 70.8% for the non-branded jewellery, and 76.3% for the metal clothing items. Cobalt release was found in 6.2% of items. Among the types of jewellery, belts and hair pins showed higher positive rates in both the DMG test and the cobalt spot test. Our study shows that the prevalence of items that release nickel or cobalt among jewellery and metal clothing items is high in Korea. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The Role of Item Feedback in Self-Adapted Testing.
ERIC Educational Resources Information Center
Roos, Linda L.; And Others
1997-01-01
The importance of item feedback in self-adapted testing was studied by comparing feedback and no feedback conditions for computerized adaptive tests and self-adapted tests taken by 363 college students. Results indicate that item feedback is not necessary to realize score differences between self-adapted and computerized adaptive testing. (SLD)
Criterion-Referenced Test Items for Auto Body.
ERIC Educational Resources Information Center
Tannehill, Dana, Ed.
This test item bank on auto body repair contains criterion-referenced test questions based upon competencies found in the Missouri Auto Body Competency Profile. Some test items are keyed for multiple competencies. The tests cover the following 26 competency areas in the auto body curriculum: auto body careers; measuring and mixing; tools and…
Automated Test-Form Generation
ERIC Educational Resources Information Center
van der Linden, Wim J.; Diao, Qi
2011-01-01
In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different…
Passik, Steven D; Inman, Alice; Kirsh, Kenneth; Theobald, Dale; Dickerson, Pamela
2003-03-01
The problem of boredom in people with cancer has received little research attention, and yet clinical experience suggests that it has the potential to profoundly affect quality of life in those patients. We were interested in developing a Purposelessness, Understimulation, and Boredom (PUB) Scale to identify this problem and to begin to differentiate it from depression. Cancer patients and professionals were interviewed using a semi-structured format to elicit their perceptions of the incidence, causes, scope, and consequences of boredom. From their responses, 45 questions were developed, edited for clarity, and piloted. A total of 100 cancer patients were recruited to participate in the study. Preliminary validation of the PUB using a cross-sectional survey of the measure was conducted. Other instruments used for purposes of convergent and divergent validity included the Functional Assessment of Cancer Therapy Scale-Anemia, Zung Self-Rating Depression Scale, Boredom Proneness Scale, Leisure Boredom Scale, Cancer Behavior Inventory, Systems of Belief Inventory, and the Eastern Cooperative Oncology Group Performance Status Scale. The average age of the sample was 62.37 years (SD = 13.43) and was comprised of 60 women (60.00%) and 40 men (40.00%). The results of a factor analysis on the 45 initial items (selected on the basis of professional and patient interviews) created a two-factor scale. The eight items from the strongest factor (items 1, 2, 3, 4, 5, 6, 9, 10) seemed to best tap the construct that could be deemed as overt boredom whereas the six items of the second factor (items 36, 38, 39, 42, 44, 45) seemed to tap the construct of boredom related to meaning and spirituality. Total scale internal consistency, when all 14 items were included in the analysis, yielded a coefficient alpha of 0.84 and good test-retest reliability at 2 weeks (r = .80, p < .001). The novel 14-item PUB Scale was significantly correlated to other measures of boredom; the Boredom Proneness Scale (r = -.588, p < .001) and the Leisure Boredom Scale (r = .576, p < .001). The PUB Scale was found to be a statistically viable tool with the ability to detect boredom and differentiate it from depression. In many respects this work is in concert with much of the current research and clinical effort going on in psycho-oncology that defines components of distress that in sum, redefines depression in advanced cancer.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
The Advanced Trauma Operative Management course in a Canadian residency program
Ali, Jameel; Ahmed, Najma; Jacobs, Lenworth M.; Luk, Stephen S.
2008-01-01
Background The Advanced Trauma Operative Management (ATOM) course was first introduced into Canada in 2003 at the University of Toronto, with senior general surgery residents being the primary focus. We present an assessment of the course in this Canadian general surgery residency program. Methods We compared trainees' pre-and postcourse self-efficacy scores and multiple choice question (MCQ) examination results, using paired t tests and resident (n = 24) and faculty (n = 7) course ratings made according to a 10-item, 5-point Likert scale. Faculty were previously trained as ATOM instructors. Results Mean pre-and postcourse self-efficacy scores were 68.9 (standard deviation [SD] 24.0) and 101.4 (SD 14.8), respectively (p < 0.001). Mean pre-and post-MCQ scores were 16.4 (SD 3.2) and 18.8 (SD 2.7), respectively (p = 0.006). On the Likert scale (1 = strongly disagree, 5 = strongly agree), all faculty and residents rated the following items as 4–5: objectives were met; knowledge, skills, clinical training, judgment and confidence improved; the live animal is a useful representation of clinical trauma; and the course should be continued but would be more appropriate for the fourth rather than the fifth year of residency. Residents rated as 1–2 the item that the human cadaver would be preferable for learning the surgical skills. Of 24 residents, 20 rated as 3 or less the item stating that the course prepares them for trauma management more adequately than their regular training program. Conclusion Self-efficacy, trauma knowledge and skills improved significantly with ATOM training. Preference was expressed for the live animal versus cadaver model, for ATOM training in the fourth rather than fifth year of residency and for the view that it complements general surgery trauma training. The data suggest that including ATOM training in Canadian general surgical residency should be considered. PMID:18682791
A Computerized Interactive Vocabulary Development System for Advanced Learners.
ERIC Educational Resources Information Center
Kukulska-Hulme, Agnes
1988-01-01
Argues that the process of recording newly encountered vocabulary items in a typical language learning situation can be improved through a computerized system of vocabulary storage based on database management software that improves the discovery and recording of meaning, subsequent retrieval of items for productive use, and memory retention.…
Solving the measurement invariance anchor item problem in item response theory.
Meade, Adam W; Wright, Natalie A
2012-09-01
The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.
When Listening Is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions.
Short, Kathleen; Bucak, S Deniz; Rosenthal, Francine; Raymond, Mark R
2018-05-01
In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
Revisiting the role of recollection in item versus forced-choice recognition memory.
Cook, Gabriel I; Marsh, Richard L; Hicks, Jason L
2005-08-01
Many memory theorists have assumed that forced-choice recognition tests can rely more on familiarity, whereas item (yes-no) tests must rely more on recollection. In actuality, several studies have found no differences in the contributions of recollection and familiarity underlying the two different test formats. Using word frequency to manipulate stimulus characteristics, the present study demonstrated that the contributions of recollection to item versus forced-choice tests is variable. Low word frequency resulted in significantly more recollection in an item test than did a forced-choice procedure, but high word frequency produced the opposite result. These results clearly constrain any uniform claim about the degree to which recollection supports responding in item versus forced-choice tests.
A Comparison of Methods of Vertical Equating.
ERIC Educational Resources Information Center
Loyd, Brenda H.; Hoover, H. D.
Rasch model vertical equating procedures were applied to three mathematics computation tests for grades six, seven, and eight. Each level of the test was composed of 45 items in three sets of 15 items, arranged in such a way that tests for adjacent grades had two sets (30 items) in common, and the sixth and eighth grades had 15 items in common. In…
ERIC Educational Resources Information Center
Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.
2012-01-01
Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…
Objective and Item Banking Computer Software and Its Use in Comprehensive Achievement Monitoring.
ERIC Educational Resources Information Center
Schriber, Peter E.; Gorth, William P.
The current emphasis on objectives and test item banks for constructing more effective tests is being augmented by increasingly sophisticated computer software. Items can be catalogued in numerous ways for retrieval. The items as well as instructional objectives can be stored and test forms can be selected and printed by the computer. It is also…
48 CFR 32.403 - Applicability.
Code of Federal Regulations, 2011 CFR
2011-10-01
... Section 32.403 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION GENERAL CONTRACTING REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 32.403 Applicability. Advance payments may be considered useful and appropriate for the following: (a) Contracts for experimental...
An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38
ERIC Educational Resources Information Center
Ali, Usama S.; Chang, Hua-Hua
2014-01-01
Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…
Fitting the Rasch Model to Account for Variation in Item Discrimination
ERIC Educational Resources Information Center
Weitzman, R. A.
2009-01-01
Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…
Weighted Maximum-a-Posteriori Estimation in Tests Composed of Dichotomous and Polytomous Items
ERIC Educational Resources Information Center
Sun, Shan-Shan; Tao, Jian; Chang, Hua-Hua; Shi, Ning-Zhong
2012-01-01
For mixed-type tests composed of dichotomous and polytomous items, polytomous items often yield more information than dichotomous items. To reflect the difference between the two types of items and to improve the precision of ability estimation, an adaptive weighted maximum-a-posteriori (WMAP) estimation is proposed. To evaluate the performance of…
ERIC Educational Resources Information Center
Sengul Avsar, Asiye; Tavsancil, Ezel
2017-01-01
This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Rasch Measurement and Item Banking: Theory and Practice.
ERIC Educational Resources Information Center
Nakamura, Yuji
The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Test Design Project: Studies in Test Bias. Annual Report.
ERIC Educational Resources Information Center
McArthur, David
Item bias in a multiple-choice test can be detected by appropriate analyses of the persons x items scoring matrix. This permits comparison of groups of examinees tested with the same instrument. The test may be biased if it is not measuring the same thing in comparable groups, if groups are responding to different aspects of the test items, or if…
ERIC Educational Resources Information Center
Truell, Allen D.; Zhao, Jensen J.; Alexander, Melody W.
2005-01-01
The purposes of this study were to determine if there is a significant difference in postsecondary business student scores and test completion time based on settable test item exposure control interface format, and to determine if there is a significant difference in student scores and test completion time based on settable test item exposure…
Estimating Total-Test Scores from Partial Scores in a Matrix Sampling Design.
ERIC Educational Resources Information Center
Sachar, Jane; Suppes, Patrick
1980-01-01
The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students and 60 items of the 110-item Stanford Mental Arithmetic Test. Three methods yielded fairly good estimates of the total-test score. (Author/RL)
Code of Federal Regulations, 2010 CFR
2010-10-01
... adequate security; (ii) The advance payments will not exceed the unpaid contract price (see 32.410(b... REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 32.402 General. (a) A limitation on authority to grant advance payments under Pub. L. 85-804 (50 U.S.C. 1431-1435) is described at 50.102-3(b)(4...
ERIC Educational Resources Information Center
Penfield, Randall D.; Algina, James
2006-01-01
One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…
Smolen, Tomasz; Chuderski, Adam
2015-01-01
Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
NASA Technical Reports Server (NTRS)
1974-01-01
The Terminal Area Compatibility (TAC) study is briefly summarized for background information. The most important research items for the areas of noise congestion, and emissions are identified. Other key research areas are also discussed. The 50 recommended research items are categorized by flight phase, technology, and compatibility benefits. The relationship of the TAC recommendations to the previous ATT recommendations is discussed. The bulk of the document contains the 50 recommended research items. For each item, the potential payoff, state of readiness, recommended action and estimated cost and schedule are given.
Overview of Recent Flight Flutter Testing Research at NASA Dryden
NASA Technical Reports Server (NTRS)
Brenner, Martin J.; Lind, Richard C.; Voracek, David F.
1997-01-01
In response to the concerns of the aeroelastic community, NASA Dryden Flight Research Center, Edwards, California, is conducting research into improving the flight flutter (including aeroservoelasticity) test process with more accurate and automated techniques for stability boundary prediction. The important elements of this effort so far include the following: (1) excitation mechanisms for enhanced vibration data to reduce uncertainty levels in stability estimates; (2) investigation of a variety of frequency, time, and wavelet analysis techniques for signal processing, stability estimation, and nonlinear identification; and (3) robust flutter boundary prediction to substantially reduce the test matrix for flutter clearance. These are critical research topics addressing the concerns of a recent AGARD Specialists' Meeting on Advanced Aeroservoelastic Testing and Data Analysis. This paper addresses these items using flight test data from the F/A-18 Systems Research Aircraft and the F/A-18 High Alpha Research Vehicle.
Item response theory analysis of the mechanics baseline test
NASA Astrophysics Data System (ADS)
Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.
2012-02-01
Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Computerized adaptive testing: the capitalization on chance problem.
Olea, Julio; Barrada, Juan Ramón; Abad, Francisco J; Ponsoda, Vicente; Cuevas, Lara
2012-03-01
This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of theta, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (theta). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.
ERIC Educational Resources Information Center
Öztürk-Gübes, Nese; Kelecioglu, Hülya
2016-01-01
The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…
Force Limited Vibration Testing
NASA Technical Reports Server (NTRS)
Scharton, Terry; Chang, Kurng Y.
2005-01-01
This slide presentation reviews the concept and applications of Force Limited Vibration Testing. The goal of vibration testing of aerospace hardware is to identify problems that would result in flight failures. The commonly used aerospace vibration tests uses artificially high shaker forces and responses at the resonance frequencies of the test item. It has become common to limit the acceleration responses in the test to those predicted for the flight. This requires an analysis of the acceleration response, and requires placing accelerometers on the test item. With the advent of piezoelectric gages it has become possible to improve vibration testing. The basic equations have are reviewed. Force limits are analogous and complementary to the acceleration specifications used in conventional vibration testing. Just as the acceleration specification is the frequency spectrum envelope of the in-flight acceleration at the interface between the test item and flight mounting structure, the force limit is the envelope of the in-flight force at the interface . In force limited vibration tests, both the acceleration and force specifications are needed, and the force specification is generally based on and proportional to the acceleration specification. Therefore, force limiting does not compensate for errors in the development of the acceleration specification, e.g., too much conservatism or the lack thereof. These errors will carry over into the force specification. Since in-flight vibratory force data are scarce, force limits are often derived from coupled system analyses and impedance information obtained from measurements or finite element models (FEM). Fortunately, data on the interface forces between systems and components are now available from system acoustic and vibration tests of development test models and from a few flight experiments. Semi-empirical methods of predicting force limits are currently being developed on the basis of the limited flight and system test data. A simple two degree of freedom system is shown and the governing equations for basic force limiting results for this system are reviewed. The design and results of the shuttle vibration forces (SVF) experiments are reviewed. The Advanced Composition Explorer (ACE) also was used to validate force limiting. Test instrumentation and supporting equipment are reviewed including piezo-electric force transducers, signal processing and conditioning systems, test fixtures, and vibration controller systems. Several examples of force limited vibration testing are presented with some results.
ERIC Educational Resources Information Center
Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.
2015-01-01
Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
Designing a Virtual Item Bank Based on the Techniques of Image Processing
ERIC Educational Resources Information Center
Liao, Wen-Wei; Ho, Rong-Guey
2011-01-01
One of the major weaknesses of the item exposure rates of figural items in Intelligence Quotient (IQ) tests lies in its inaccuracies. In this study, a new approach is proposed and a useful test tool known as the Virtual Item Bank (VIB) is introduced. The VIB combine Automatic Item Generation theory and image processing theory with the concepts of…
The Rasch Model and Missing Data, with an Emphasis on Tailoring Test Items.
ERIC Educational Resources Information Center
de Gruijter, Dato N. M.
Many applications of educational testing have a missing data aspect (MDA). This MDA is perhaps most pronounced in item banking, where each examinee responds to a different subtest of items from a large item pool and where both person and item parameter estimates are needed. The Rasch model is emphasized, and its non-parametric counterpart (the…
Three controversies over item disclosure in medical licensure examinations
Park, Yoon Soo; Yang, Eunbae B.
2015-01-01
In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 5 2013-10-01 2013-10-01 false Interest. 1432.407 Section... REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 1432.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 7 2012-10-01 2012-10-01 false Interest. 2932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 2932.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 5 2012-10-01 2012-10-01 false Interest. 1432.407 Section... REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 1432.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 7 2010-10-01 2010-10-01 false Interest. 2932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 2932.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Interest. 1432.407 Section... REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 1432.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 7 2013-10-01 2012-10-01 true Interest. 2932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 2932.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Interest. 1432.407 Section... REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 1432.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 7 2011-10-01 2011-10-01 false Interest. 2932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 2932.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 5 2011-10-01 2011-10-01 false Interest. 1432.407 Section... REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 1432.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 7 2014-10-01 2014-10-01 false Interest. 2932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 2932.407 Interest. The HCA may authorize advance payments without interest pursuant to FAR 32.407. ...
Chien, Tsair-Wei; Shao, Yang; Kuo, Shu-Chun
2017-01-10
Many continuous item responses (CIRs) are encountered in healthcare settings, but no one uses item response theory's (IRT) probabilistic modeling to present graphical presentations for interpreting CIR results. A computer module that is programmed to deal with CIRs is required. To present a computer module, validate it, and verify its usefulness in dealing with CIR data, and then to apply the model to real healthcare data in order to show how the CIR that can be applied to healthcare settings with an example regarding a safety attitude survey. Using Microsoft Excel VBA (Visual Basic for Applications), we designed a computer module that minimizes the residuals and calculates model's expected scores according to person responses across items. Rasch models based on a Wright map and on KIDMAP were demonstrated to interpret results of the safety attitude survey. The author-made CIR module yielded OUTFIT mean square (MNSQ) and person measures equivalent to those yielded by professional Rasch Winsteps software. The probabilistic modeling of the CIR module provides messages that are much more valuable to users and show the CIR advantage over classic test theory. Because of advances in computer technology, healthcare users who are familiar to MS Excel can easily apply the study CIR module to deal with continuous variables to benefit comparisons of data with a logistic distribution and model fit statistics.
NASA Technical Reports Server (NTRS)
Cook, J.; Dumbacher, D.; Ise, M.; Singer, C.
1990-01-01
A modified space shuttle main engine (SSME), which primarily includes an enlarged throat main combustion chamber with the acoustic cavities removed and a main injector with the stability control baffles removed, was tested. This one-of-a-kind engine's design changes are being evaluated for potential incorporation in the shuttle flight program in the mid-1990's. Engine testing was initiated on September 15, 1988 and has accumulated 1,915 seconds and 19 starts. Testing is being conducted to characterize the engine system performance, combustion stability with the baffle-less injector, and both low pressure oxidizer turbopump (LPOTP) and high pressure oxidizer turbopump (HPOTP) for suction performance. These test results are summarized and compared with the SSME flight configuration data base. Testing of this new generation SSME is the first product from the technology test bed (TTB). Figure test plans for the TTB include the highly instrumented flight configuration SSME and advanced liquid propulsion technology items.
Silent Aircraft Initiative Concept Risk Assessment
NASA Technical Reports Server (NTRS)
Nickol, Craig L.
2008-01-01
A risk assessment of the Silent Aircraft Initiative's SAX-40 concept design for extremely low noise has been performed. A NASA team developed a list of 27 risk items, and evaluated the level of risk for each item in terms of the likelihood that the risk would occur and the consequences of the occurrence. The following risk items were identified as high risk, meaning that the combination of likelihood and consequence put them into the top one-fourth of the risk matrix: structures and weight prediction; boundary-layer ingestion (BLI) and inlet design; variable-area exhaust and thrust vectoring; displaced-threshold and continuous descent approach (CDA) operational concepts; cost; human factors; and overall noise performance. Several advanced-technology baseline concepts were created to serve as a basis for comparison to the SAX-40 concept. These comparisons indicate that the SAX-40 would have significantly greater research, development, test, and engineering (RDT&E) and production costs than a conventional aircraft with similar technology levels. Therefore, the cost of obtaining the extremely low noise capability that has been estimated for the SAX-40 is significant. The SAX-40 concept design proved successful in focusing attention toward low noise technologies and in raising public awareness of the issue.
Bayesian Item Selection in Constrained Adaptive Testing Using Shadow Tests
ERIC Educational Resources Information Center
Veldkamp, Bernard P.
2010-01-01
Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item…
Mathematics Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Fraser, Graham, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from previous tests are made available to teachers for the construction of pretests or posttests, reference tests for inter-class comparisons and general assignments. The collection was reviewed for content…
Are Learning Disabled Students "Test-Wise?": An Inquiry into Reading Comprehension Test Items.
ERIC Educational Resources Information Center
Scruggs, Thomas E.; Lifson, Steve
The ability to correctly answer reading comprehension test items, without having read the accompanying reading passage, was compared for third grade learning disabled students and their peers from a regular classroom. In the first experiment, fourteen multiple choice items were selected from the Stanford Achievement Test. No reading passages were…
Agriculture Library of Test Items.
ERIC Educational Resources Information Center
Sutherland, Duncan, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
ERIC Educational Resources Information Center
Bermundo, Cesar B.; Bermundo, Alex B.; Ballester, Rex C.
2012-01-01
iBank is a project that utilizes a software to create an item Bank that store quality questions, generate test and print exam. The items are from analyze teacher-constructed test questions that provides the basis for discussing test results, by determining why a test item is or not discriminating between the better and poorer students, and by…
Effects of Test Item Disclosure on Medical Licensing Examination
ERIC Educational Resources Information Center
Yang, Eunbae B.; Lee, Myung Ae; Park, Yoon Soo
2018-01-01
In 2012, the National Health Personnel Licensing Examination Board of Korea decided to publicly disclose all test items and answers to satisfy the test takers' right to know and enhance the transparency of tests administered by the government. This study investigated the effects of item disclosure on the medical licensing examination (MLE),…
Controlling Item Exposure Conditional on Ability in Computerized Adaptive Testing.
ERIC Educational Resources Information Center
Stocking, Martha L.; Lewis, Charles
1998-01-01
Ensuring item and pool security in a continuous testing environment is explored through a new method of controlling exposure rate of items conditional on ability level in computerized testing. Properties of this conditional control on exposure rate, when used in conjunction with a particular adaptive testing algorithm, are explored using simulated…
Battalion Combat Operations Center (COC) Test. Volume II. Test Report,
1982-02-08
reveal, perhaps, that item X can perform a task faster than item-Y. A utility assessment from an experienced, knowledgeable test participant, however...can ascertain whether or not item X can better enable him to accomplish his mission than item Y. 2.4 GENeRALIZED TEST FACILITY. The capabilities of...ATHE MIX D -IX AE4SY MIXES A & C MIX A .IX D M X D IMIX C RATHER DIFFICUJLT VERY DIFFICULT ABILITY TO ABILITY TO ABILITY TO CONTROL DATA EXPLOIT DATA
V-TECS Criterion-Referenced Test Item Bank for Radiologic Technology Occupations.
ERIC Educational Resources Information Center
Reneau, Fred; And Others
This Vocational-Technical Education Consortium of States (V-TECS) criterion-referenced test item bank provides 696 multiple-choice items and 33 matching items for radiologic technology occupations. These job titles are included: radiologic technologist, chief; radiologic technologist; nuclear medicine technologist; radiation therapy technologist;…
The multiple sclerosis visual pathway cohort: understanding neurodegeneration in MS.
Martínez-Lapiscina, Elena H; Fraga-Pumar, Elena; Gabilondo, Iñigo; Martínez-Heras, Eloy; Torres-Torres, Ruben; Ortiz-Pérez, Santiago; Llufriu, Sara; Tercero, Ana; Andorra, Magi; Roca, Marc Figueras; Lampert, Erika; Zubizarreta, Irati; Saiz, Albert; Sanchez-Dalmau, Bernardo; Villoslada, Pablo
2014-12-15
Multiple Sclerosis (MS) is an immune-mediated disease of the Central Nervous System with two major underlying etiopathogenic processes: inflammation and neurodegeneration. The latter determines the prognosis of this disease. MS is the main cause of non-traumatic disability in middle-aged populations. The MS-VisualPath Cohort was set up to study the neurodegenerative component of MS using advanced imaging techniques by focusing on analysis of the visual pathway in a middle-aged MS population in Barcelona, Spain. We started the recruitment of patients in the early phase of MS in 2010 and it remains permanently open. All patients undergo a complete neurological and ophthalmological examination including measurements of physical and disability (Expanded Disability Status Scale; Multiple Sclerosis Functional Composite and neuropsychological tests), disease activity (relapses) and visual function testing (visual acuity, color vision and visual field). The MS-VisualPath protocol also assesses the presence of anxiety and depressive symptoms (Hospital Anxiety and Depression Scale), general quality of life (SF-36) and visual quality of life (25-Item National Eye Institute Visual Function Questionnaire with the 10-Item Neuro-Ophthalmic Supplement). In addition, the imaging protocol includes both retinal (Optical Coherence Tomography and Wide-Field Fundus Imaging) and brain imaging (Magnetic Resonance Imaging). Finally, multifocal Visual Evoked Potentials are used to perform neurophysiological assessment of the visual pathway. The analysis of the visual pathway with advance imaging and electrophysilogical tools in parallel with clinical information will provide significant and new knowledge regarding neurodegeneration in MS and provide new clinical and imaging biomarkers to help monitor disease progression in these patients.
ERIC Educational Resources Information Center
Magno, Carlo
2009-01-01
The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…
Modeling Local Item Dependence Due to Common Test Format with a Multidimensional Rasch Model
ERIC Educational Resources Information Center
Baghaei, Purya; Aryadoust, Vahid
2015-01-01
Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a shared…
Garcia, Sofia F.; Hahn, Elizabeth A.; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W.
2014-01-01
Objective To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. Design A mixed methods approach included a literature review; item classification, selection and writing; cognitive interviews and field testing with participants with spinal cord injury (SCI), traumatic brain injury (TBI) or stroke; and rating scale analysis to evaluate initial psychometric properties. Setting General community. Participants Nine individuals with SCI, TBI or stroke participated in cognitive interviews; 305 community residents with those same conditions participated in field testing. Interventions None. Main Outcome Measure(s) Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. Results An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing 82 items. Field test data indicated that the pool satisfies a one-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Conclusions Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample in order to develop a social attitudes item bank for persons with disabilities. PMID:25045803
Garcia, Sofia F; Hahn, Elizabeth A; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W
2015-04-01
To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. A mixed-methods approach included a literature review; item classification, selection, and writing; cognitive interviews and field testing of participants with spinal cord injury (SCI), traumatic brain injury (TBI), or stroke; and rating scale analysis to evaluate initial psychometric properties. General community. Individuals with SCI, TBI, or stroke participated in cognitive interviews (n=9); community residents with those same conditions participated in field testing (n=305). None. Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing of 82 items. Field test data indicated that the pool satisfies a 1-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample to develop a social attitudes item bank for persons with disabilities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S
2011-10-01
We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.
Kalpakjian, Claire Z.; Tate, Denise G.; Kisala, Pamela A.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Design Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory- (IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI completed the self-esteem items. Results A unidimensional model was observed (CFI = 0.946; RMSEA = 0.087) and measurement precision was good (theta range between −2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. Conclusion This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010972
Kalpakjian, Claire Z; Tate, Denise G; Kisala, Pamela A; Tulsky, David S
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory-(IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. A total of 717 individuals with SCI completed the self-esteem items. A unidimensional model was observed (CFI=0.946; RMSEA=0.087) and measurement precision was good (theta range between -2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Weiland, Brian; Choi, Seung W
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Resilience item bank and short form. Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. A total of 717 individuals with SCI completed the Resilience items. A unidimensional model was observed (CFI=0.968; RMSEA=0.074) and measurement precision was good (theta range between -3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S.; Kisala, Pamela A.; Kalpakjian, Claire Z.; Weiland, Brian; Choi, Seung W.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury - Quality of Life (SCI-QOL) Resilience item bank and short form. Design Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). Setting We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. Participants A total of 717 individuals with SCI completed the Resilience items. Results A unidimensional model was observed (CFI = 0.968; RMSEA = 0.074) and measurement precision was good (theta range between −3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. Conclusion This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010971
Grundgeiger, Tobias
2014-04-01
Retrieving a subset of learned items can lead to the forgetting of related items. Such retrieval-induced forgetting (RIF) can be explained by the inhibition of irrelevant items in order to overcome retrieval competition when the target item is retrieved. According to the retrieval inhibition account, such retrieval competition is a necessary condition for RIF. However, research has indicated that noncompetitive retrieval practice can also cause RIF by strengthening cue-item associations. According to the strength-dependent competition account, the strengthened items interfere with the retrieval of weaker items, resulting in impaired recall of weaker items in the final memory test. The aim of this study was to replicate RIF caused by noncompetitive retrieval practice and to determine whether this forgetting is also observed in recognition tests. In the context of RIF, it has been assumed that recognition tests circumvent interference and, therefore, should not be sensitive to forgetting due to strength-dependent competition. However, this has not been empirically tested, and it has been suggested that participants may reinstate learned cues as retrieval aids during the final test. In the present experiments, competitive practice or noncompetitive practice was followed by either final cued-recall tests or recognition tests. In cued-recall tests, RIF was observed in both competitive and noncompetitive conditions. However, in recognition tests, RIF was observed only in the competitive condition and was absent in the noncompetitive condition. The result underscores the contribution of strength-dependent competition to RIF. However, recognition tests seem to be a reliable way of distinguishing between RIF due to retrieval inhibition or strength-dependent competition.
Translation and adaption of the interRAI suite to local requirements in Belgian hospitals
2012-01-01
Background The interRAI Suite contains comprehensive geriatric assessment tools designed for various healthcare settings. Although each instrument is developed for a particular population, together they form an integrated health evaluation system. The interRAI Acute Care Minimum Data Set (interRAI AC) is tailored for hospitalized older persons. Our aim in this study was to translate and adapt the interRAI AC to the Belgian hospital context, where it can be used together with the interRAI Home Care (HC) and the interRAI Long Term Care Facility (LTCF). Methods A systematic, comprehensive, and rigorous 10-step approach was used to adapt the interRAI AC to local requirements. After linguistic translation by an official translator, five researchers assessed the translation for appropriate hospital jargon. Three researchers double-checked for translation accuracy and proposed additional items. A provisional version was converted into the three official languages of Belgium—Flemish, French, and German. Next, a multidisciplinary panel of nine experts judged item relevance to the Belgian care context and advised which country-specific items should be added. After these suggestions were incorporated into the interRAI AC, hospital staff from nine Flemish hospitals field-tested the tool in their practice. After evaluating field-test results, we compared the interRAI AC with Belgian versions of the interRAI HC and interRAI LTCF. Next, the Flemish, French, and German versions of the Belgian interRAI portfolio were harmonized. Finally, we submitted the Belgian interRAI AC to the interRAI organization for ratification. Results Eighteen administrative items of the interRAI AC were adapted to the Belgian healthcare context (e.g., usual residence, formal community services prior to admission). Fourteen items assessing the ‘informal caregiver’, and 17 items, including country-specific items, were added (e.g., advanced directive for euthanasia). Conclusions The interRAI AC was adapted to local requirements using a meticulous and recursive 10-step approach. As use of the interRAI Suite continues to grow worldwide and as it continues to expand to other care settings and populations, this procedure can guide future translations. This procedure might also be used by others facing similar challenges of complex translation and adaptation situations, where multidimensional instruments are used across multiple care settings in multiple languages. PMID:22958520
A Quantum Chemistry Concept Inventory for Physical Chemistry Classes
ERIC Educational Resources Information Center
Dick-Perez, Marilu; Luxford, Cynthia J.; Windus, Theresa L.; Holme, Thomas
2016-01-01
A 14-item, multiple-choice diagnostic assessment tool, the quantum chemistry concept inventory or QCCI, is presented. Items were developed based on published student misconceptions and content coverage and then piloted and used in advanced physical chemistry undergraduate courses. In addition to the instrument itself, data from both a pretest,…
75 FR 25763 - Addition to the List of Validated End-Users: Advanced Micro Devices China, Inc.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-10
.... Additional Validated End-User in the PRC and Its Respective ``Eligible Items (By ECCN)'' and ``Eligible... to the ``development'' of products under ECCN 4A003). This authorization was made based on an... Country Validated end-user Eligible items (by ECCN) Eligible destination China (People's Republic of...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-26
... Definition To Address Advanced Fuel Designs,'' Using the Consolidated Line Item Improvement Process AGENCY...-specific adoption using the Consolidated Line Item Improvement Process (CLIIP). Additionally, the NRC staff..., which may be more reactive at shutdown temperatures above 68[emsp14][deg]F. This STS improvement is part...
Adaptive Mental Testing: The State of the Art
1979-11-01
typically vary in their psychometric properties --particularly in their difficulty--the test designer must decide what configuration of these item...psychometric properties best suits the test’s purpose. There are two extreme ration- ales to guide that decision. One rationale is to choose items that are...development of item response theory (Rasch, 1960; Lord, 1952, 1970, 1974a; Birnbaum, 1968) that provided the needed invariance properties for item
ERIC Educational Resources Information Center
Pohl, Steffi; Gräfe, Linda; Rose, Norman
2014-01-01
Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
Procedures for Selecting Items for Computerized Adaptive Tests.
ERIC Educational Resources Information Center
Kingsbury, G. Gage; Zara, Anthony R.
1989-01-01
Several classical approaches and alternative approaches to item selection for computerized adaptive testing (CAT) are reviewed and compared. The study also describes procedures for constrained CAT that may be added to classical item selection approaches to allow them to be used for applied testing. (TJH)
Efforts Toward the Development of Unbiased Selection and Assessment Instruments.
ERIC Educational Resources Information Center
Rudner, Lawrence M.
Investigations into item bias provide an empirical basis for the identification and elimination of test items which appear to measure different traits across populations or cultural groups. The Psychometric rationales for six approaches to the identification of biased test items are reviewed: (1) Transformed item difficulties: within-group…
Effect of Differential Item Functioning on Test Equating
ERIC Educational Resources Information Center
Kabasakal, Kübra Atalay; Kelecioglu, Hülya
2015-01-01
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Ramsay-Curve Differential Item Functioning
ERIC Educational Resources Information Center
Woods, Carol M.
2011-01-01
Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
ERIC Educational Resources Information Center
Çikirikçi Demirtasli, Nükhet; Ulutas, Seher
2015-01-01
Problem Statement: Item bias occurs when individuals from different groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test…
ERIC Educational Resources Information Center
Egberink, Iris J. L.; Meijer, Rob R.; Tendeiro, Jorge N.
2015-01-01
A popular method to assess measurement invariance of a particular item is based on likelihood ratio tests with all other items as anchor items. The results of this method are often only reported in terms of statistical significance, and researchers proposed different methods to empirically select anchor items. It is unclear, however, how many…
ERIC Educational Resources Information Center
Masters, James S.
2010-01-01
With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…
Unilateral neglect: further validation of the baking tray task.
Appelros, Peter; Karlsson, Gunnel M; Thorwalls, Annika; Tham, Kerstin; Nydevik, Ingegerd
2004-11-01
The Baking Tray Task is a comprehensible, simple-to-perform test for use in assessing unilateral neglect. The aim of this study was to validate further its use with stroke patients. The Baking Tray Task was compared with 2 versions of the Behaviour Inattention Test and a test for personal neglect. A total of 270 patients were subjected to a 3-item version of the Behaviour Inattention Test and 40 patients were subjected to an 8-item version of the Behaviour Inattention Test, besides the Baking Tray Task and the personal neglect test. The Baking Tray Task was more sensitive than the 3-item Behaviour Inattention Test, but the 8-item Behaviour Inattention Test was more sensitive than the Baking Tray Task. The best combination of any 3 tests was Baking Tray Task, Reading an article, and Figure copying; the 2 last-mentioned being a part of the 8-item Behaviour Inattention Test. Multi-item tests detect more cases of neglect than do single tests. However, it is tiresome for the patient to undergo a larger test battery than necessary. It is also time-consuming for the staff. Behavioural tests seem more appropriate when assessing neglect. The Baking Tray Task seems to be one of the most sensitive single tests, but its sensitivity can be further enhanced when it is used in combination with other tests.
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.
Gibbons, C J; Skevington, S M
2018-04-01
Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
Item analysis of three Spanish naming tests: a cross-cultural investigation.
Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Turkington, T.
This education session will cover the physics and operation principles of gamma cameras and PET scanners. The first talk will focus on PET imaging. An overview of the principles of PET imaging will be provided, including positron decay physics, and the transition from 2D to 3D imaging. More recent advances in hardware and software will be discussed, such as time-of-flight imaging, and improvements in reconstruction algorithms that provide for options such as depth-of-interaction corrections. Quantitative applications of PET will be discussed, as well as the requirements for doing accurate quantitation. Relevant performance tests will also be described. Learning Objectives: Bemore » able to describe basic physics principles of PET and operation of PET scanners. Learn about recent advances in PET scanner hardware technology. Be able to describe advances in reconstruction techniques and improvements Be able to list relevant performance tests. The second talk will focus on gamma cameras. The Nuclear Medicine subcommittee has charged a task group (TG177) to develop a report on the current state of physics testing of gamma cameras, SPECT, and SPECT/CT systems. The report makes recommendations for performance tests to be done for routine quality assurance, annual physics testing, and acceptance tests, and identifies those needed satisfy the ACR accreditation program and The Joint Commission imaging standards. The report is also intended to be used as a manual with detailed instructions on how to perform tests under widely varying conditions. Learning Objectives: At the end of the presentation members of the audience will: Be familiar with the tests recommended for routine quality assurance, annual physics testing, and acceptance tests of gamma cameras for planar imaging. Be familiar with the tests recommended for routine quality assurance, annual physics testing, and acceptance tests of SPECT systems. Be familiar with the tests of a SPECT/CT system that include the CT images for SPECT reconstructions. Become knowledgeable of items to be included in annual acceptance testing reports including CT dosimetry and PACS monitor measurements. T. Turkington, GE Healthcare.« less
The Effects of an Afternoon Nap on Episodic Memory in Young and Older Adults
Fairley, Jacqueline; Decker, Michael J.; Bliwise, Donald L.
2017-01-01
Abstract Study Objectives: In young adults, napping is hypothesized to benefit episodic memory retention (eg, via consolidation). Whether this relationship is present in older adults has not been adequately tested but is an important question because older adults display marked changes in sleep and memory. Design: Between-subjects design. Setting: Sleep laboratory at Emory University School of Medicine. Participants: Fifty healthy young adults (18–29) and 45 community-dwelling older adults (58–83). Intervention: Participants were randomly assigned to a 90-minute nap opportunity or an equal interval of quiet wakefulness. Measurements and Results: Participants underwent an item-wise directed forgetting learning procedure in which they studied words that were individually followed by the instruction to “remember” or “forget.” Following a 90-minute retention interval filled with quiet wakefulness or a nap opportunity, they were asked to free recall and recognize those words. Young adults retained significantly more words following a nap interval than a quiet wakefulness interval on both free recall and recognition tests. There was modest evidence for greater nap-related retention of “remember” items relative to “forget” items for free recall but not recognition. Older adults’ memory retention did not differ across nap and quiet wakefulness conditions, although they demonstrated greater fragmentation, lower N3, and lower rapid eye movement duration than the young adults. Conclusions: In young adults, an afternoon nap benefits episodic memory retention, but such benefits decrease with advancing age. PMID:28329381
Ivanova, Masha Y; Achenbach, Thomas; Leite, Manuela; Almeida, Vera; Caldas, Carlos; Turner, Lori; Dumas, Julie A
2018-05-01
As the world population ages, mental health professionals increasingly need empirically supported assessment instruments for older adult psychopathology. This study tested the degree to which syndromes derived from self-ratings of psychopathology by elders in the US would fit self-ratings by elders in Portugal. The Older Adult Self-Report (OASR) was completed by 352 60- to 102-year-olds in Portuguese community and residential settings. Confirmatory factor analyses tested the fit of the 7-syndrome OASR model to self-ratings by Portuguese elders. The primary fit index (Root Mean Square Error of Approximation) showed good fit, while secondary fit indices (the Comparative Fit Index and the Tucker-Lewis Index) showed acceptable fit. Loadings of 95 of the 97 items on their expected syndromes were statistically significant (mean = .63), indicating that the items measured the syndromes well. Correlations between latent factors, ie, between the hypothesized syndrome constructs measured by the items, averaged .66. The correlations between syndromes reflect varying degrees of comorbidity between problems comprising particular pairs of syndromes. The results support the syndrome structure of the OASR for Portuguese elders, offering Portuguese clinicians and researchers a useful instrument for assessing a broad spectrum of psychopathology. The results also offer a core of empirically supported taxonomic constructs of later life psychopathology as a basis for advancing clinical practice, training, and cross-cultural research. Copyright © 2017 John Wiley & Sons, Ltd.
Testing enhances both encoding and retrieval for both tested and untested items.
Cho, Kit W; Neely, James H; Crocco, Stephanie; Vitrano, Deana
2017-07-01
In forward testing effects, taking a test enhances memory for subsequently studied material. These effects have been observed for previously studied and tested items, a potentially item-specific testing effect, and newly studied untested items, a purely generalized testing effect. We directly compared item-specific and generalized forward testing effects using procedures to separate testing benefits due to encoding versus retrieval. Participants studied two lists of Swahili-English word pairs, with the second study list containing "new" pairs intermixed with the previously studied "old" pairs. Participants completed a review phase in which they took a cued-recall test on only the "old" pairs or restudied them. In Experiments 1a, 1b, and 2, the review phase was given either before or after the second study list. Testing benefited memory to the same degree for both "new" and "old" pairs, suggesting that there were no pair-specific benefits of testing. The larger benefit from testing when review was given before rather than after the second study list suggests that the memory enhancement was due to both testing-enhanced encoding and testing-enhanced retrieval. To better equate generalized testing effects for "new" and "old" pairs, Experiment 3 intermixed them in the review phase. A statistically significant pair-specific testing effect for "old" items was now observed. Overall, these results show that forward testing effects are due to both testing-enhanced encoding and retrieval effects and that direct, pair-specific forward testing benefits are considerably smaller than indirect, generalized forward testing benefits.
Marchand, Alain; Haines, Victor Y; Dextras-Gauthier, Julie
2013-05-04
This study advances a measurement approach for the study of organizational culture in population-based occupational health research, and tests how different organizational culture types are associated with psychological distress, depression, emotional exhaustion, and well-being. Data were collected over a sample of 1,164 employees nested in 30 workplaces. Employees completed the 26-item OCP instrument. Psychological distress was measured with the General Health Questionnaire (12-item); depression with the Beck Depression Inventory (21-item); and emotional exhaustion with five items from the Maslach Burnout Inventory general survey. Exploratory factor analysis evaluated the dimensionality of the OCP scale. Multilevel regression models estimated workplace-level variations, and the contribution of organizational culture factors to mental health and well-being after controlling for gender, age, and living with a partner. Exploratory factor analysis of OCP items revealed four factors explaining about 75% of the variance, and supported the structure of the Competing Values Framework. Factors were labeled Group, Hierarchical, Rational and Developmental. Cronbach's alphas were high (0.82-0.89). Multilevel regression analysis suggested that the four culture types varied significantly between workplaces, and correlated with mental health and well-being outcomes. The Group culture type best distinguished between workplaces and had the strongest associations with the outcomes. This study provides strong support for the use of the OCP scale for measuring organizational culture in population-based occupational health research in a way that is consistent with the Competing Values Framework. The Group organizational culture needs to be considered as a relevant factor in occupational health studies.
The Influence of Item Calibration Error on Variable-Length Computerized Adaptive Testing
ERIC Educational Resources Information Center
Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi
2013-01-01
Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be "tailored" to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends…
A Paradox in the Study of the Benefits of Test-Item Review
ERIC Educational Resources Information Center
van der Linden, Wim J.; Jeon, Minjeong; Ferrara, Steve
2011-01-01
According to a popular belief, test takers should trust their initial instinct and retain their initial responses when they have the opportunity to review test items. More than 80 years of empirical research on item review, however, has contradicted this belief and shown minor but consistently positive score gains for test takers who changed…
Geography Library of Test Items. Volume Four.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Home Science Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Smith, Jan, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Languages Library of Test Items. Volume Two: German, Latin.
ERIC Educational Resources Information Center
Campbell, Thomas; And Others
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Languages Library of Test Items. Volume One: French, Indonesian.
ERIC Educational Resources Information Center
Campbell, Thomas; And Others
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Three.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Commerce Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Meeve, Brian, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Five.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Textiles and Design Library of Test Items. Volume I.
ERIC Educational Resources Information Center
Smith, Jan, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Commerce Library of Test Items. Volume Two.
ERIC Educational Resources Information Center
Meeve, Brian, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Six.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography: Library of Test Items. Volume II.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Sex Differences in the Tendency to Omit Items on Multiple-Choice Tests: 1980-2000
ERIC Educational Resources Information Center
von Schrader, Sarah; Ansley, Timothy
2006-01-01
Much has been written concerning the potential group differences in responding to multiple-choice achievement test items. This discussion has included references to possible disparities in tendency to omit such test items. When test scores are used for high-stakes decision making, even small differences in scores and rankings that arise from male…
A Person Fit Test for IRT Models for Polytomous Items
ERIC Educational Resources Information Center
Glas, C. A. W.; Dagohoy, Anna Villa T.
2007-01-01
A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability parameters. It is shown that the Lagrange multiplier…
How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation
ERIC Educational Resources Information Center
Chuah, Siang Chee; Drasgow, Fritz; Luecht, Richard
2006-01-01
Adaptive tests offer the advantages of reduced test length and increased accuracy in ability estimation. However, adaptive tests require large pools of precalibrated items. This study looks at the development of an item pool for 1 type of adaptive administration: the computer-adaptive sequential test. An important issue is the sample size required…
An Explanatory Item Response Theory Approach for a Computer-Based Case Simulation Test
ERIC Educational Resources Information Center
Kahraman, Nilüfer
2014-01-01
Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…
Geography Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Electronic Quality of Life Assessment Using Computer-Adaptive Testing
2016-01-01
Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100
Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann
2015-01-01
Background Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). Objective The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. Methods The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Results Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Conclusions Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES. PMID:26399428
Alber, Julia M; Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann
2015-09-23
Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES.
ERIC Educational Resources Information Center
Lee, Guemin; Park, In-Yong
2012-01-01
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
Bai, Mei; Dixon, Jane K
2014-01-01
The purpose of this study was to reexamine the factor pattern of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale (FACIT-Sp-12) using exploratory factor analysis in people newly diagnosed with advanced cancer. Principal components analysis (PCA) and 3 common factor analysis methods were used to explore the factor pattern of the FACIT-Sp-12. Factorial validity was assessed in association with quality of life (QOL). Principal factor analysis (PFA), iterative PFA, and maximum likelihood suggested retrieving 3 factors: Peace, Meaning, and Faith. Both Peace and Meaning positively related to QOL, whereas only Peace uniquely contributed to QOL. This study supported the 3-factor model of the FACIT-Sp-12. Suggestions for revision of items and further validation of the identified factor pattern were provided.
Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel
2017-06-15
Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Application of Item Response Theory to Tests of Substance-related Associative Memory
Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.
2015-01-01
A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Bäuml, Karl-Heinz T; Holterman, Christoph; Abel, Magdalena
2014-11-01
The testing effect refers to the finding that retrieval practice in comparison to restudy of previously encoded contents can improve memory performance and reduce time-dependent forgetting. Naturally, long retention intervals include both wake and sleep delay, which can influence memory contents differently. In fact, sleep immediately after encoding can induce a mnemonic benefit, stabilizing and strengthening the encoded contents. We investigated in a series of 5 experiments whether sleep influences the testing effect. After initial study of categorized item material (Experiments 1, 2, and 4A), paired associates (Experiment 3), or educational text material (Experiment 4B), subjects were asked to restudy encoded contents or engage in active retrieval practice. A final recall test was conducted after a 12-hr delay that included diurnal wakefulness or nocturnal sleep. The results consistently showed typical testing effects after the wake delay. However, these testing effects were reduced or even eliminated after sleep, because sleep benefited recall of restudied items but left recall of retrieved items unaffected. The findings are consistent with the bifurcation model of the testing effect (Kornell, Bjork, & Garcia, 2011), according to which the distribution of memory strengths across items is shifted differentially by retrieving and restudying, with retrieval strengthening items to a much higher degree than restudy does. On the basis of this model, most of the retrieved items already fall above recall threshold in the absence of sleep, so additional sleep-induced strengthening may not improve recall of retrieved items any further. PsycINFO Database Record (c) 2014 APA, all rights reserved.
ERIC Educational Resources Information Center
van der Linden, Wim J.; Scrams, David J.; Schnipke, Deborah L.
This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has been administered. Predictions from the model are…
Identification of metallic items that caused nickel dermatitis in Danish patients.
Thyssen, Jacob P; Menné, Torkil; Johansen, Jeanne D
2010-09-01
Nickel allergy is prevalent as assessed by epidemiological studies. In an attempt to further identify and characterize sources that may result in nickel allergy and dermatitis, we analysed items identified by nickel-allergic dermatitis patients as causative of nickel dermatitis by using the dimethylglyoxime (DMG) test. Dermatitis patients with nickel allergy of current relevance were identified over a 2-year period in a tertiary referral patch test centre. When possible, their work tools and personal items were examined with the DMG test. Among 95 nickel-allergic dermatitis patients, 70 (73.7%) had metallic items investigated for nickel release. A total of 151 items were investigated, and 66 (43.7%) gave positive DMG test reactions. Objects were nearly all purchased or acquired after the introduction of the EU Nickel Directive. Only one object had been inherited, and only two objects had been purchased outside of Denmark. DMG testing is valuable as a screening test for nickel release and should be used to identify relevant exposures in nickel-allergic patients. Mainly consumer items, but also work tools used in an occupational setting, released nickel in dermatitis patients. This study confirmed 'risk items' from previous studies, including mobile phones.
Alterio, D; Franco, P; Numico, G; Licitra, L; Cossu Rocca, M; Ferrari, A; Pinto, C; Russi, E G; Ricardi, U; Jereczek Fossa, B A
2016-07-01
Chemoradiotherapy is the treatment mostly used as organ preservation (OP) strategy worldwide in advanced laryngo-hypopharyngeal cancer. Due to the not homogeneous results of the literature data regarding the pre-treatment assessment and treatment schedule in this setting of patients, the Italian societies of radiation oncology and medical oncology surveyed (by an online survey) their memberships regarding the Italian attitude on larynx preservation in clinical practice. The survey outline addressed different items such as: demographics (11 items), pre-treatment evaluation (12 items), treatment schedules (10 items) and outcomes (3 items). The survey was filled in by 116 clinical oncologists (64 % radiation and 36 % medical oncologists). Results highlighted that pretreatment evaluation was not homogeneous among the respondents. The treatment of choice for the OP program resulted the concurrent chemoradiotherapy (66 %). Induction chemotherapy was proposed mostly in case of aggressive tumors such as advanced stage (T4 or N3) and/or unfavorable primary sites (hypopharynx). Moreover, after induction chemotherapy, for responders patients most participants (46 %) proposed concurrent chemoradiotherapy, while 18 and 19 % proposed radiotherapy alone or radiotherapy and cetuximab, respectively. For patients with stable disease after induction chemotherapy, the respondents declared to suggest surgery, radiotherapy and cetuximab or radiotherapy alone in 38, 32 and 15 % of cases, respectively. Results of the present survey highlighted the variability of therapeutic approaches offered in clinical practice for patients candidate to a larynx OP program. Analysis of abovementioned results may give the chance to modify some clinical attitudes and create the background for future clinical investigation in this field.
Portuguese Medical Students' Knowledge and Attitudes Towards Homosexuality.
Lopes, Lucas; Gato, Jorge; Esteves, Manuel
2016-11-01
Lesbian, gay, bisexual and transgender people still face discrimination in healthcare environments and physicians often report lack of knowledge on this population's specific healthcare needs. In fact, recommendations have been put forward to include lesbian, gay, bisexual and transgender health in medical curricula. This study aimed to explore factors associated with medical students' knowledge and attitudes towards homosexuality in different years of the medical course. An anonymous online-based questionnaire was sent to all medical students enrolled at the Faculty of Medicine - University of Porto, Portugal, in December 2015. The questionnaire included socio-demographic questions, the Multidimensional Scale of Attitudes Toward Lesbians and Gay Men (27 items) and a Homosexuality Knowledge Questionnaire (17 items). Descriptive statistics, ANOVAs, Chi-square tests and Pearson's correlations were used in the analysis. A total of 489 completed responses was analyzed. Male gender, religiosity and absence of lesbian, gay or bisexual friends were associated with more negative attitudes towards homosexuality. Attitudinal scores did not correlate with advanced years in medical course or contact with lesbian, gay or bisexual patients. Students aiming to pursue technique-oriented specialties presented higher scores in the 'Modern Heterosexism' subscale than students seeking patient-oriented specialties. Although advanced years in medical course correlated significantly with higher knowledge scores, items related with lesbian, gay or bisexual health showed the lowest percentage of correct answers. There seems to be a lack of exploration of medical students' personal attitudes towards lesbians and gay men, and also a lack of knowledge on lesbian, gay or bisexual specific healthcare needs. This study highlights the importance of inclusive undergraduate curriculum development in order to foster quality healthcare.
The rationale, development, and standardization of a basic word vocabulary test.
Dupuy, H J
1974-04-01
The results of the studies to date indicate that the Basic Word Vocabulary Test provides a range of items in terms of item difficulty levels useful in printed form from about the third grade to the highest educational levels. Since pictorial and orally given vocabulary tests are used from about ages 2 to 8 years, further work should be done to extend the scale downward so that a single comprehensive vocabulary scale ranging from age 2 years to the highest level of verbal development is available for general use. Validation studies should also be conducted with other well-known intelligence tests so that scores can be compared. Alternate forms need to be developed to allow for longitudinal studies of growth and development. The use of a single standard of measurement of vocabulary development, suitable over a wide range of age and ability levels, by different investigators should materially aid in comparing results across studies and samples and lead to more consistent findings, advances in knowledge, and wider application of findings in practical circumstances, The findings presented in this report indicate that the Basic Word Vocabulary Test adequately measures basic word knowledge acquisition and development. The BWVT is suitable for evaluation of individuals and for use in making group comparisons in levels of basic word knowledge attainment, growth, and development. All material appearing in this report is in the public domain and may be reproduced or copied without permission; citation as to source, however, is appreciated.
Hecimovich, Mark; Marais, Ida
2017-06-26
Awareness of sport-related concussion (SRC) is an essential step in increasing the number of athletes or parents who report on SRC. This awareness is important, as there is no established data on medical care at youth-level sports and may be limited to individuals with only first aid training. In this circumstance, aside from the coach, it is the players and their parents who need to be aware of possible signs and symptoms. The aim of this study was to examine the psychometric properties of a parent and player concussion survey intended for use before and after an education campaign regarding SRC. 1441 questionnaires were received from parents and 284 questionnaires from players. The responses to the sixteen-item section of the questionnaire's 'recognition of signs and symptoms' were submitted to psychometric analysis using the dichotomous and polytomous Rasch model via the Rasch Unidimensional Measurement Model software RUMM2030. The Rasch model of Modern Test Theory can be considered a refinement of, or advance on, traditional analyses of an instrument's psychometric properties. The main finding is that these sixteen items measure two factors: items that are symptoms of concussion and items that are not symptoms of concussion. Parents and athletes were able to identify most or all of the symptoms, but were not as good at distinguishing symptoms that are not symptoms of concussion. Analyzing these responses revealed differential item functioning for parents and athletes on non-symptom items. When the DIF was resolved a significant difference was found between parents and athletes. The main finding is that the items measure two 'dimensions' in concussion symptom recognition. The first dimension consists of those items that are symptoms of concussion and the second dimension of those items that are not symptoms of concussion. Parents and players were able to identify most or all of the symptoms of concussion, so one would not expect to pick up any positive change on these items after an education campaign. Parents and players were not as good at distinguishing symptoms that are not symptoms of concussion. It is on these items that one may possibly expect improvement to manifest, so to evaluate the effectiveness of an education campaign it would pay to look for improvement in distinguishing symptoms that are not symptoms of concussion.
A Comparison of the One-and Three-Parameter Logistic Models on Measures of Test Efficiency.
ERIC Educational Resources Information Center
Benson, Jeri
Two methods of item selection were used to select sets of 40 items from a 50-item verbal analogies test, and the resulting item sets were compared for relative efficiency. The BICAL program was used to select the 40 items having the best mean square fit to the one parameter logistic (Rasch) model. The LOGIST program was used to select the 40 items…
ERIC Educational Resources Information Center
Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill
2014-01-01
The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Computerized Adaptive Testing: Overview and Introduction.
ERIC Educational Resources Information Center
Meijer, Rob R.; Nering, Michael L.
1999-01-01
Provides an overview of computerized adaptive testing (CAT) and introduces contributions to this special issue. CAT elements discussed include item selection, estimation of the latent trait, item exposure, measurement precision, and item-bank development. (SLD)
Flens, Gerard; Smits, Niels; Terwee, Caroline B; Dekker, Joost; Huijbrechts, Irma; de Beurs, Edwin
2017-03-01
We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample ( N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development.
ERIC Educational Resources Information Center
Swiggett, Wanda D.; Kotloff, Laurie; Ezzo, Chelsea; Adler, Rachel; Oliveri, Maria Elena
2014-01-01
The computer-based "Graduate Record Examinations"® ("GRE"®) revised General Test includes interactive item types and testing environment tools (e.g., test navigation, on-screen calculator, and help). How well do test takers understand these innovations? If test takers do not understand the new item types, these innovations may…
Effects of advanced aging on the neural correlates of successful recognition memory
Wang, Tracy H.; Kruggel, Frithjof; Rugg, Michael D.
2009-01-01
Functional neuroimaging studies have reported that the neural correlates of retrieval success (old>new effects) are larger and more widespread in older than in young adults. In the present study we investigated whether this pattern of age-related ‘over-recruitment’ continues into advanced age. Using functional magnetic resonance imaging (fMRI), retrieval-related activity from two groups (N = 18 per group) of older adults aged 84–96 yrs (‘old-old’) and 64–77 yrs (‘young-old’) was contrasted. Subjects studied a series of pictures, half of which were presented once, and half twice. At test, subjects indicated whether each presented picture was old or new. Recognition performance of the old-old subjects for twice-studied items was equivalent to that of the young-old subjects for once-studied items. Old>new effects common to the two groups were identified in several cortical regions, including medial and lateral parietal and prefrontal cortex. There were no regions where these effects were of greater magnitude in the old-old group, and thus no evidence of over-recruitment in this group relative to the young-old individuals. In one region of medial parietal cortex, effects were greater (and only significant) in the young-old group. The failure to find evidence of over-recruitment in the old-old subjects relative to the young-old group, despite their markedly poorer cognitive performance, suggests that age-related over-recruitment effects plateau in advanced age. The findings for the medial parietal cortex underscore the sensitivity of this cortical region to increasing age. PMID:19428399
Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study
ERIC Educational Resources Information Center
Yi, Qing; Zhang, Jinming; Chang, Hua-Hua
2008-01-01
Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive…
Detecting Item Drift in Large-Scale Testing
ERIC Educational Resources Information Center
Guo, Hongwen; Robin, Frederic; Dorans, Neil
2017-01-01
The early detection of item drift is an important issue for frequently administered testing programs because items are reused over time. Unfortunately, operational data tend to be very sparse and do not lend themselves to frequent monitoring analyses, particularly for on-demand testing. Building on existing residual analyses, the authors propose…
Tree versus Geometric Representation of Tests and Items.
ERIC Educational Resources Information Center
Beller, Michael
1990-01-01
Geometric approaches to representing interrelations among tests and items are compared with an additive tree model (ATM), using 2,644 examinees and 2 other data sets. The ATM's close fit to the data and its coherence of presentation indicate that it is the best means of representing tests and items. (TJH)
Superficial Priming in Episodic Recognition
ERIC Educational Resources Information Center
Dopkins, Stephen; Sargent, Jesse; Ngo, Catherine T.
2010-01-01
We explored the effect of superficial priming in episodic recognition and found it to be different from the effect of semantic priming in episodic recognition. Participants made recognition judgments to pairs of items, with each pair consisting of a prime item and a test item. Correct positive responses to the test item were impeded if the prime…
Statistical Indexes for Monitoring Item Behavior under Computer Adaptive Testing Environment.
ERIC Educational Resources Information Center
Zhu, Renbang; Yu, Feng; Liu, Su
A computerized adaptive test (CAT) administration usually requires a large supply of items with accurately estimated psychometric properties, such as item response theory (IRT) parameter estimates, to ensure the precision of examinee ability estimation. However, an estimated IRT model of a given item in any given pool does not always correctly…
Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)
ERIC Educational Resources Information Center
Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn
2018-01-01
The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Aggregating Polytomous DIF Results over Multiple Test Administrations
ERIC Educational Resources Information Center
Zwick, Rebecca; Ye, Lei; Isham, Steven
2018-01-01
In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
ERIC Educational Resources Information Center
Nitko, Anthony J.; Hsu, Tse-chi
Item analysis procedures appropriate for domain-referenced classroom testing are described. A conceptual framework within which item statistics can be considered and promising statistics in light of this framework are presented. The sampling fluctuations of the more promising item statistics for sample sizes comparable to the typical classroom…
Fissile interrogation using gamma rays from oxygen
Smith, Donald; Micklich, Bradley J.; Fessler, Andreas
2004-04-20
The subject apparatus provides a means to identify the presence of fissionable material or other nuclear material contained within an item to be tested. The system employs a portable accelerator to accelerate and direct protons to a fluorine-compound target. The interaction of the protons with the fluorine-compound target produces gamma rays which are directed at the item to be tested. If the item to be tested contains either a fissionable material or other nuclear material the interaction of the gamma rays with the material contained within the test item with result in the production of neutrons. A system of neutron detectors is positioned to intercept any neutrons generated by the test item. The results from the neutron detectors are analyzed to determine the presence of a fissionable material or other nuclear material.
Methods of rapid diagnosis for the etiology of meningitis in adults
Bahr, Nathan C; Boulware, David R
2014-01-01
Infectious meningitis may be due to bacterial, mycobacterial, fungal or viral agents. Diagnosis of meningitis must take into account numerous items of patient history and symptomatology along with regional epidemiology and basic cerebrospinal fluid testing (protein, etc.) to allow the clinician to stratify the likelihood of etiology possibilities and rationally select additional diagnostic tests. Culture is the mainstay for diagnosis in many cases, but technology is evolving to provide more rapid, reliable diagnosis. The cryptococcal antigen lateral flow assay (Immuno-Mycologics) has revolutionized diagnosis of cryptococcosis and automated nucleic acid amplification assays hold promise for improving diagnosis of bacterial and mycobacterial meningitis. This review will focus on a holistic approach to diagnosis of meningitis as well as recent technological advances. PMID:25402579
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Interest. 1332.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 1332.407 Interest. The designee authorized to approve advance payment without interest is as set forth in CAM 1301.70. ...
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Interest. 932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 932.407 Interest. (d)(4) Advance payments may be made without interest under cost-reimbursement contracts for construction or engineering services. ...
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 5 2012-10-01 2012-10-01 false Interest. 932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 932.407 Interest. (d)(4) Advance payments may be made without interest under cost-reimbursement contracts for construction or engineering services. ...
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Interest. 932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 932.407 Interest. (d)(4) Advance payments may be made without interest under cost-reimbursement contracts for construction or engineering services. ...
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 5 2013-10-01 2013-10-01 false Interest. 1332.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 1332.407 Interest. The designee authorized to approve advance payment without interest is as set forth in CAM 1301.70. ...
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 5 2012-10-01 2012-10-01 false Interest. 1332.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 1332.407 Interest. The designee authorized to approve advance payment without interest is as set forth in CAM 1301.70. ...
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 5 2011-10-01 2011-10-01 false Interest. 932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 932.407 Interest. (d)(4) Advance payments may be made without interest under cost-reimbursement contracts for construction or engineering services. ...
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Interest. 1332.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 1332.407 Interest. The designee authorized to approve advance payment without interest is as set forth in CAM 1301.70. ...
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 5 2013-10-01 2013-10-01 false Interest. 932.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 932.407 Interest. (d)(4) Advance payments may be made without interest under cost-reimbursement contracts for construction or engineering services. ...
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 5 2011-10-01 2011-10-01 false Interest. 1332.407 Section... CONTRACT FINANCING Advance Payments for Non-Commercial Items 1332.407 Interest. The designee authorized to approve advance payment without interest is as set forth in CAM 1301.70. ...
Validation of a clinical critical thinking skills test in nursing.
Shin, Sujin; Jung, Dukyoo; Kim, Sungeun
2015-01-27
The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing
2015-01-01
Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
ERIC Educational Resources Information Center
Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan
2014-01-01
C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
ERIC Educational Resources Information Center
Lynch, Mervin D.; Chaves, John
Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…
ERIC Educational Resources Information Center
Browning, Robert; And Others
1979-01-01
Effects that item order and basal and ceiling rules have on test means, variances, and internal consistency estimates for the Peabody Individual Achievement Test mathematics and reading recognition subtests were examined. Items on the math and reading recognition subtests were significantly easier or harder than test placements indicated. (Author)
Current State of Test Development, Administration, and Analysis: A Study of Faculty Practices.
Bristol, Timothy J; Nelson, John W; Sherrill, Karin J; Wangerin, Virginia S
Developing valid and reliable test items is a critical skill for nursing faculty. This research analyzed the test item writing practice of 674 nursing faculty. Relationships between faculty characteristics and their test item writing practices were analyzed. Findings reveal variability in practice and a gap in implementation of evidence-based standards when developing and evaluating teacher-made examinations.
A Review of Guidelines on Home Drug Testing Websites for Parents
Washio, Yukiko; Fairfax-Columbo, Jaymes; Ball, Emily; Cassey, Heather; Arria, Amelia M.; Bresani, Elena; Curtis, Brenda L.; Kirby, Kimberly C.
2014-01-01
Purpose To update and extend prior work reviewing websites that discuss home drug testing for parents and assess the quality of information that the websites provide to assist them to decide when and how to use home drug testing. Methods We conducted a world-wide web search that identified eight websites providing information for parents on home drug testing. We assessed the information on the sites using checklist developed with field experts in adolescent substance abuse and psychosocial interventions that focus on urine testing. Results None of the websites covered all of items on the 24-item checklist, and only three covered at least half of the items (12, 14, and 21 items, respectively). The five remaining websites covered less than half the checklist items. The mean number of items covered by the websites was 11. Conclusions Among the websites that we reviewed, few provided thorough information to parents regarding empirically-supported strategies to effectively use drug testing to intervene on adolescent substance use. Furthermore, most websites did not provide thorough information regarding the risks and benefits to inform parents’ decision to use home drug testing. Empirical evidence regarding efficacy, benefits, risks, and limitations of home drug testing is needed. PMID:25026103
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
Continuing a series of short tests aimed at measuring student mastery of specific skills in the natural sciences, this supplementary volume includes teachers' notes, a users' guide and inspection copies of test items 27 to 50. Answer keys and test scoring statistics are provided. The items are designed for grades 7 through 10, and a list of the…
ERIC Educational Resources Information Center
Weiss, David J., Ed.
This symposium consists of five papers and presents some recent developments in adaptive testing which have applications to several military testing problems. The overview, by James R. McBride, defines adaptive testing and discusses some of its item selection and scoring strategies. Item response theory, or item characteristic curve theory, is…
From sweaty towels to genetic stats: stalking athletes for their genetic information.
Suter, Sonia M
2012-12-01
With recent advances in genetics, sports fans may soon have access to a new category of statistics: genetic information. With patented correlations between genetics and athletics, and with the emergence of a growing and unregulated market in direct-to-consumer ("DTC") genetic testing, fans may be able to obtain an athlete's genetic information on their own, as long as they can access any item that may have some DNA on it. In some jurisdictions, they may even be able to do so legally, notwithstanding the potential harms to the athletes and their privacy.
DeGeest, David Scott; Schmidt, Frank
2015-01-01
Our objective was to apply the rigorous test developed by Browne (1992) to determine whether the circumplex model fits Big Five personality data. This test has yet to be applied to personality data. Another objective was to determine whether blended items explained correlations among the Big Five traits. We used two working adult samples, the Eugene-Springfield Community Sample and the Professional Worker Career Experience Survey. Fit to the circumplex was tested via Browne's (1992) procedure. Circumplexes were graphed to identify items with loadings on multiple traits (blended items), and to determine whether removing these items changed five-factor model (FFM) trait intercorrelations. In both samples, the circumplex structure fit the FFM traits well. Each sample had items with dual-factor loadings (8 items in the first sample, 21 in the second). Removing blended items had little effect on construct-level intercorrelations among FFM traits. We conclude that rigorous tests show that the fit of personality data to the circumplex model is good. This finding means the circumplex model is competitive with the factor model in understanding the organization of personality traits. The circumplex structure also provides a theoretically and empirically sound rationale for evaluating intercorrelations among FFM traits. Even after eliminating blended items, FFM personality traits remained correlated.
Advanced Progressive Matrices and Sex Differences: Comment to Mackintosh and Bennett (2005)
ERIC Educational Resources Information Center
Colom, Roberto; Abad, Francisco J.
2007-01-01
Mackintosh and Bennett's [Mackintosh, N. J. and Bennett, E. S, (2005). ''What do Raven's Matrices measure? An analysis in terms of sex differences.'' Intelligence 33: 663-674.] study shows that males outperform females in some APM items but not in others, implicating that these items are measuring discriminable mental processes. The present…
The Graded Unfolding Model: A Unidimensional Item Response Model for Unfolding Graded Responses.
ERIC Educational Resources Information Center
Roberts, James S.; Laughlin, James E.
Binary or graded disagree-agree responses to attitude items are often collected for the purpose of attitude measurement. Although such data are sometimes analyzed with cumulative measurement models, recent investigations suggest that unfolding models are more appropriate (J. S. Roberts, 1995; W. H. Van Schuur and H. A. L. Kiers, 1994). Advances in…
ERIC Educational Resources Information Center
Burney, Laurie; Zascavage, Victoria; Matherly, Michele
2017-01-01
Literature consistently documents a positive, direct effect of students' attitudes on learning (Lizzio, Wilson, & Simons, 2002). Hence, accounting studies describing active learning activities often report student attitudes as evidence of efficacy (e.g., Matherly & Burney, 2013), but rely on single-item instead of multi-item scales. This…
Free-Response and Multiple-Choice Items: Measures of the Same Ability?
ERIC Educational Resources Information Center
Bennett, Randy Elliot; And Others
This study examined the relationship of multiple-choice and free-response items contained on the College Board's Advanced Placement Computer Science (APCS) examination. Subjects were two samples of 1,000 randomly drawn from the population of 7,372 high school students taking the 1988 examination of the APCS "AB" form. Most were high…
[Mokken scaling of the Cognitive Screening Test].
Diesfeldt, H F A
2009-10-01
The Cognitive Screening Test (CST) is a twenty-item orientation questionnaire in Dutch, that is commonly used to evaluate cognitive impairment. This study applied Mokken Scale Analysis, a non-parametric set of techniques derived from item response theory (IRT), to CST-data of 466 consecutive participants in psychogeriatric day care. The full item set and the standard short version of fourteen items both met the assumptions of the monotone homogeneity model, with scalability coefficient H = 0.39, which is considered weak. In order to select items that would fulfil the assumption of invariant item ordering or the double monotonicity model, the subjects were randomly partitioned into a training set (50% of the sample) and a test set (the remaining half). By means of an automated item selection eleven items were found to measure one latent trait, with H = 0.67 and item H coefficients larger than 0.51. Cross-validation of the item analysis in the remaining half of the subjects gave comparable values (H = 0.66; item H coefficients larger than 0.56). The selected items involve year, place of residence, birth date, the monarch's and prime minister's names, and their predecessors. Applying optimal discriminant analysis (ODA) it was found that the full set of twenty CST items performed best in distinguishing two predefined groups of patients of lower or higher cognitive ability, as established by an independent criterion derived from the Amsterdam Dementia Screening Test. The chance corrected predictive value or prognostic utility was 47.5% for the full item set, 45.2% for the fourteen items of the standard short version of the CST, and 46.1% for the homogeneous, unidimensional set of selected eleven items. The results of the item analysis support the application of the CST in cognitive assessment, and revealed a more reliable 'short' version of the CST than the standard short version (CST14).
Osth, Adam F; Jansson, Anna; Dennis, Simon; Heathcote, Andrew
2018-08-01
A robust finding in recognition memory is that performance declines monotonically across test trials. Despite the prevalence of this decline, there is a lack of consensus on the mechanism responsible. Three hypotheses have been put forward: (1) interference is caused by learning of test items (2) the test items cause a shift in the context representation used to cue memory and (3) participants change their speed-accuracy thresholds through the course of testing. We implemented all three possibilities in a combined model of recognition memory and decision making, which inherits the memory retrieval elements of the Osth and Dennis (2015) model and uses the diffusion decision model (DDM: Ratcliff, 1978) to generate choice and response times. We applied the model to four datasets that represent three challenges, the findings that: (1) the number of test items plays a larger role in determining performance than the number of studied items, (2) performance decreases less for strong items than weak items in pure lists but not in mixed lists, and (3) lexical decision trials interspersed between recognition test trials do not increase the rate at which performance declines. Analysis of the model's parameter estimates suggests that item interference plays a weak role in explaining the effects of recognition testing, while context drift plays a very large role. These results are consistent with prior work showing a weak role for item noise in recognition memory and that retrieval is a strong cause of context change in episodic memory. Copyright © 2018 Elsevier Inc. All rights reserved.
Multistage Computerized Adaptive Testing with Uniform Item Exposure
ERIC Educational Resources Information Center
Edwards, Michael C.; Flora, David B.; Thissen, David
2012-01-01
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…
Primary Science Assessment Item Setters' Misconceptions Concerning the State Changes of Water
ERIC Educational Resources Information Center
Boo, Hong Kwen
2006-01-01
Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the questions setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…
Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing
ERIC Educational Resources Information Center
Deng, Hui; Ansley, Timothy; Chang, Hua-Hua
2010-01-01
In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with…
Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model
ERIC Educational Resources Information Center
Wang, Wen-Chung; Wilson, Mark
2005-01-01
This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…
Ethnic Group Bias in Intelligence Test Items.
ERIC Educational Resources Information Center
Scheuneman, Janice
In previous studies of ethnic group bias in intelligence test items, the question of bias has been confounded with ability differences between the ethnic group samples compared. The present study is based on a conditional probability model in which an unbiased item is defined as one where the probability of a correct response to an item is the…
Primary Science Assessment Item Setters' Misconceptions Concerning Biological Science Concepts
ERIC Educational Resources Information Center
Boo, Hong Kwen
2007-01-01
Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the question setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…
Examination of Different Item Response Theory Models on Tests Composed of Testlets
ERIC Educational Resources Information Center
Kogar, Esin Yilmaz; Kelecioglu, Hülya
2017-01-01
The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…
A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis
ERIC Educational Resources Information Center
Cao, Mengyang; Tay, Louis; Liu, Yaowu
2017-01-01
This study examined the performance of a proposed iterative Wald approach for detecting differential item functioning (DIF) between two groups when preknowledge of anchor items is absent. The iterative approach utilizes the Wald-2 approach to identify anchor items and then iteratively tests for DIF items with the Wald-1 approach. Monte Carlo…
A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing
ERIC Educational Resources Information Center
Wang, Chun; Fan, Zhewen; Chang, Hua-Hua; Douglas, Jeffrey A.
2013-01-01
The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests
ERIC Educational Resources Information Center
Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.
2013-01-01
Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document deals with testing in intermediate communication arts for seventh graders in Missouri public schools. The document contains the following items from the Session 1 Test Booklet: "Swimming in Snow" (Diana C. Conway) (Items 1, 2, and 5); "Discovery" (Marion Dane Bauer) (Item 13); writing prompt; and a writer's…
Automated Item Generation with Recurrent Neural Networks.
von Davier, Matthias
2018-03-12
Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Significant issues in proof testing: A critical appraisal
NASA Technical Reports Server (NTRS)
Chell, G. G.; Mcclung, R. C.; Russell, D. A.; Chang, K. J.; Donnelly, B.
1994-01-01
Issues which impact on the interpretation and quantification of proof test benefits are reviewed. The importance of each issue in contributing to the extra quality assurance conferred by proof testing components is discussed, particularly with respect to the application of advanced fracture mechanics concepts to enhance the flaw screening capability of a proof test analysis. Items covered include the role in proof testing of elastic-plastic fracture mechanics, ductile instability analysis, deterministic versus probabilistic analysis, single versus multiple cycle proof testing, and non-destructive examination (NDE). The effects of proof testing on subsequent service life are reviewed, particularly with regard to stress redistribution and changes in fracture behavior resulting from the overload. The importance of proof test conditions are also addressed, covering aspects related to test temperature, simulation of service environments, test media and the application of real-time NDE. The role of each issue in a proof test methodology is assessed with respect to its ability to: promote proof test practice to a state-of-the-art; aid optimization of proof test design; and increase awareness and understanding of outstanding issues.
Assessing the Conceptual Understanding about Heat and Thermodynamics at Undergraduate Level
ERIC Educational Resources Information Center
Kulkarni, Vasudeo Digambar; Tambade, Popat Savaleram
2013-01-01
In this study, a Thermodynamic Concept Test (TCT) was designed to assess student's conceptual understanding heat and thermodynamics at undergraduate level. The different statistical tests such as item difficulty index, item discrimination index, point biserial coefficient were used for assessing TCT. For each item of the test these indices were…
A Study of Inference in Standardized Reading Test Items and Its Relationship to Difficulty.
ERIC Educational Resources Information Center
Marzano, Robert J.
To study the relationship between inferences made on standardized reading tests and item difficulty, 50 items on the reading comprehension section of the Metropolitan Achievement Test were analyzed independently in this study by two raters using four general categories of inferences: (1) reference inferences, (2) between proposition inferences,…
Questions and Problems in Science.
ERIC Educational Resources Information Center
Dressel, Paul L.; Nelson, Clarence H.
This folio of test items, contributed by a number of colleges and universities from their course, placement, entrance, or other institutional examinations, was compiled to aid teachers in constructing tests. Only those science courses offered in the first two years of college are represented by the scope of the items. The test items may also serve…
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties
ERIC Educational Resources Information Center
Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.
2010-01-01
This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…
Optimal Stratification of Item Pools in a-Stratified Computerized Adaptive Testing.
ERIC Educational Resources Information Center
Chang, Hua-Hua; van der Linden, Wim J.
2003-01-01
Developed a method based on 0-1 linear programming to stratify an item pool optimally for use in alpha-stratified adaptive testing. Applied the method to a previous item pool from the computerized adaptive test of the Graduate Record Examinations. Results show the new method performs well in practical situations. (SLD)
The Development and Validation of a Formula for Measuring Single-Sentence Test Item Readability.
ERIC Educational Resources Information Center
Homan, Susan; And Others
1994-01-01
A study was conducted with 782 elementary school students to determine whether the Homan-Hewitt Readability Formula could identify the readability of a single-sentence test item. Results indicate that a relationship exists between students' reading grade levels and responses to test items written at higher readability levels. (SLD)
Development and Validation of a Computer Adaptive EFL Test
ERIC Educational Resources Information Center
He, Lianzhen; Min, Shangchao
2017-01-01
The first aim of this study was to develop a computer adaptive EFL test (CALT) that assesses test takers' listening and reading proficiency in English with dichotomous items and polytomous testlets. We reported in detail on the development of the CALT, including item banking, determination of suitable item response theory (IRT) models for item…
The Development and Management of Banks of Performance Based Test Items.
ERIC Educational Resources Information Center
Curtis, H. A., Ed.
Symposium papers presented at an Annual Meeting of the National Council on Measurement in Education (Chicago, 1972), all of which concern banks of test items for use in constructing criterion referenced tests, comprise this document. The first paper, "Locally Produced Item Banks" by Thomas J. Slocum, presents information on the…
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 7 2011-10-01 2011-10-01 false Interest. 3432.407 Section... CONTRACTING REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 3432.407 Interest. The HCA is designated as the official who may authorize advance payments without interest under FAR 32.407...
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 7 2013-10-01 2012-10-01 true Interest. 3432.407 Section... CONTRACTING REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 3432.407 Interest. The HCA is designated as the official who may authorize advance payments without interest under FAR 32.407...
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 7 2012-10-01 2012-10-01 false Interest. 3432.407 Section... CONTRACTING REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 3432.407 Interest. The HCA is designated as the official who may authorize advance payments without interest under FAR 32.407...
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 7 2014-10-01 2014-10-01 false Interest. 3432.407 Section... CONTRACTING REQUIREMENTS CONTRACT FINANCING Advance Payments for Non-Commercial Items 3432.407 Interest. The HCA is designated as the official who may authorize advance payments without interest under FAR 32.407...
Test-retest stability of the Task and Ego Orientation Questionnaire.
Lane, Andrew M; Nevill, Alan M; Bowes, Neal; Fox, Kenneth R
2005-09-01
Establishing stability, defined as observing minimal measurement error in a test-retest assessment, is vital to validating psychometric tools. Correlational methods, such as Pearson product-moment, intraclass, and kappa are tests of association or consistency, whereas stability or reproducibility (regarded here as synonymous) assesses the agreement between test-retest scores. Indexes of reproducibility using the Task and Ego Orientation in Sport Questionnaire (TEOSQ; Duda & Nicholls, 1992) were investigated using correlational (Pearson product-moment, intraclass, and kappa) methods, repeated measures multivariate analysis of variance, and calculating the proportion of agreement within a referent value of +/-1 as suggested by Nevill, Lane, Kilgour, Bowes, and Whyte (2001). Two hundred thirteen soccer players completed the TEOSQ on two occasions, 1 week apart. Correlation analyses indicated a stronger test-retest correlation for the Ego subscale than the Task subscale. Multivariate analysis of variance indicated stability for ego items but with significant increases in four task items. The proportion of test-retest agreement scores indicated that all ego items reported relatively poor stability statistics with test-retest scores within a range of +/-1, ranging from 82.7-86.9%. By contrast, all task items showed test-retest difference scores ranging from 92.5-99%, although further analysis indicated that four task subscale items increased significantly. Findings illustrated that correlational methods (Pearson product-moment, intraclass, and kappa) are influenced by the range in scores, and calculating the proportion of agreement of test-retest differences with a referent value of +/-1 could provide additional insight into the stability of the questionnaire. It is suggested that the item-by-item proportion of agreement method proposed by Nevill et al. (2001) should be used to supplement existing methods and could be especially helpful in identifying rogue items in the initial stages of psychometric questionnaire validation.
ERIC Educational Resources Information Center
Samejima, Fumiko; Changas, Paul S.
The methods and approaches for estimating the operating characteristics of the discrete item responses without assuming any mathematical form have been developed and expanded. It has been made possible that, even if the test information function of a given test is not constant for the interval of ability of interest, it is used as the Old Test.…
Automatic Generation of Rasch-Calibrated Items: Figural Matrices Test GEOM and Endless-Loops Test EC
ERIC Educational Resources Information Center
Arendasy, Martin
2005-01-01
The future of test construction for certain psychological ability domains that can be analyzed well in a structured manner may lie--at the very least for reasons of test security--in the field of automatic item generation. In this context, a question that has not been explicitly addressed is whether it is possible to embed an item response theory…
Evaluation of Floors and Item Gradients for Reading and Math Tests for Young Children
ERIC Educational Resources Information Center
Bradley-Johnson, Sharon; Durmusoglu, Gokce
2005-01-01
Ignoring the adequacy of floors and item gradients for tests used with young children can have serious consequences. Thus, because of the importance of early intervention for reading and math problems, we used the criteria suggested by Bracken for adequate floors and item gradients, and reviewed 15 reading tests and 12 math tests for ages 4-0…
ERIC Educational Resources Information Center
Khaksefidi, Saman
2017-01-01
This study investigates the psychological effect of a wrong question with wrong items on answering to the next question in a test of structure. Forty students selected through stratified random sampling are given 15 questions of a standardized test namely a TOEFL structure test in which questions number 7 and number 11 are wrong and their answers…
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION
de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
Identifying predictors of physics item difficulty: A linear regression approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes; Muratovic, Hasnija
2011-06-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.
Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J
2016-05-20
Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.
Expertise sensitive item selection.
Chow, P; Russell, H; Traub, R E
2000-12-01
In this paper we describe and illustrate a procedure for selecting items from a large pool for a certification test. The proposed procedure, which is intended to improve the alignment of the certification test with on-the-job performance, is based on an expertise sensitive index. This index for an item is the difference between the item's p values for experts and novices. An example is provided of the application of the index for selecting items to be used in certifying bakers.
Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi
2018-01-01
Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings. PMID:29561879
Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi; Chen, Kuan-Lin
2018-01-01
The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings.
Fabrication Control Plan for ORNL RH-LOCA ATF Test Specimens to be Irradiated in the ATR
DOE Office of Scientific and Technical Information (OSTI.GOV)
Field, Kevin G.; Howard, Richard; Teague, Michael
2014-06-01
The purpose of this fabrication plan is (1) to summarize the design of a set of rodlets that will be fabricated and then irradiated in the Advanced Test Reactor (ATR) and (2) provide requirements for fabrication and acceptance criteria for inspections of the Light Water Reactor (LWR) – Accident Tolerant Fuels (ATF) rodlet components. The functional and operational (F&OR) requirements for the ATF program are identified in the ATF Test Plan. The scope of this document only covers fabrication and inspections of rodlet components detailed in drawings 604496 and 604497. It does not cover the assembly of these items tomore » form a completed test irradiation assembly or the inspection of the final assembly, which will be included in a separate INL final test assembly specification/inspection document. The controls support the requirements that the test irradiations must be performed safely and that subsequent examinations must provide valid results.« less