difficulty index item: Topics by Science.gov

Sample records for difficulty index item

Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

PubMed

Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

2016-01-01

The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

PubMed Central

2016-01-01

Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
Item analysis of examinations in the Faculty of Medicine of Tunis.

PubMed

Hermi, Amene; Achour, Wafa

2016-04-01

Introduction Item analysis is the process of collecting, summarizing and using information from students' responses to assess test items' quality. This study used this approach to evaluate the quality of items and examinations given in the Faculty of Medicine of Tunis (FMT). Methods This study concerned the examinations of 2012-2013 (principal session). It analyzed 3138 items from 66 examinations, of which, 46 were multidisciplinary (187 disciplines). A total of 2515 students took the examinations. "AnItem.xls" file was used for the analysis that focused on difficulty, discrimination and internal consistency. Results Mean difficulty for all examinations was optimum (mean difficulty index: 0.59). Majority of items (89.17%) were either easy or of acceptable difficulty. Mean discrimination for all examinations was moderate (mean item discrimination coefficient: 0.28) with poor discrimination in 23.62% of items. Maximal discrimination occurred with disciplines of difficulty index between 0.4-0.6. « Ideal » items represented 27.02%. Mean internal consistency for all examinations was acceptable (Cronbach's alpha: 0.79). Disciplines with nonacceptable internal consistency (68.45%) contained a maximum of 33 items (each one) and a positive correlation between their alpha and the number of their questions. Distributions were mostly (72.73%) platykurtic and negatively asymmetric (89.39%). First year of studies had the best parameters. Conclusion Our examinations had an acceptable internal consistency, and a good level of difficulty and discrimination. They tended to facility and discriminated basically students of medium level. Item analysis is useful as a guide to item writers to improve the overall quality of questions in the future.
Using Reliability and Item Analysis to Evaluate a Teacher-Developed Test in Educational Measurement and Evaluation

ERIC Educational Resources Information Center

Quaigrain, Kennedy; Arhin, Ato Kwamina

2017-01-01

Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
Assessing the Conceptual Understanding about Heat and Thermodynamics at Undergraduate Level

ERIC Educational Resources Information Center

Kulkarni, Vasudeo Digambar; Tambade, Popat Savaleram

2013-01-01

In this study, a Thermodynamic Concept Test (TCT) was designed to assess student's conceptual understanding heat and thermodynamics at undergraduate level. The different statistical tests such as item difficulty index, item discrimination index, point biserial coefficient were used for assessing TCT. For each item of the test these indices were…
Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper.

PubMed

Sim, Si-Mui; Rasiah, Raja Isaiah

2006-02-01

This paper reports the relationship between the difficulty level and the discrimination power of true/false-type multiple-choice questions (MCQs) in a multidisciplinary paper for the para-clinical year of an undergraduate medical programme. MCQ items in papers taken from Year II Parts A, B and C examinations for Sessions 2001/02, and Part B examinations for 2002/03 and 2003/04, were analysed to obtain their difficulty indices and discrimination indices. Each paper consisted of 250 true/false items (50 questions of 5 items each) on topics drawn from different disciplines. The questions were first constructed and vetted by the individual departments before being submitted to a central committee, where the final selection of the MCQs was made, based purely on the academic judgement of the committee. There was a wide distribution of item difficulty indices in all the MCQ papers analysed. Furthermore, the relationship between the difficulty index (P) and discrimination index (D) of the MCQ items in a paper was not linear, but more dome-shaped. Maximal discrimination (D = 51% to 71%) occurred with moderately easy/difficult items (P = 40% to 74%). On average, about 38% of the MCQ items in each paper were "very easy" (P > or =75%), while about 9% were "very difficult" (P <25%). About two-thirds of these very easy/difficult items had "very poor" or even negative discrimination (D < or =20%). MCQ items that demonstrate good discriminating potential tend to be moderately difficult items, and the moderately-to-very difficult items are more likely to show negative discrimination. There is a need to evaluate the effectiveness of our MCQ items.
Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

PubMed Central

Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

2011-01-01

Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
An instrument to measure nurses' knowledge in palliative care: Validation of the Spanish version of Palliative Care Quiz for Nurses

PubMed Central

2017-01-01

Background Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. Purpose To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. Method The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Findings Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Discussion Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. Conclusion The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses’ knowledge in palliative care and it is adequate to establish international comparisons. PMID:28545037
An instrument to measure nurses' knowledge in palliative care: Validation of the Spanish version of Palliative Care Quiz for Nurses.

PubMed

Chover-Sierra, Elena; Martínez-Sabater, Antonio; Lapeña-Moñux, Yolanda Raquel

2017-01-01

Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses' knowledge in palliative care and it is adequate to establish international comparisons.
Can Item Analysis of MCQs Accomplish the Need of a Proper Assessment Strategy for Curriculum Improvement in Medical Education?

ERIC Educational Resources Information Center

Pawade, Yogesh R.; Diwase, Dipti S.

2016-01-01

Item analysis of Multiple Choice Questions (MCQs) is the process of collecting, summarizing and utilizing information from students' responses to evaluate the quality of test items. Difficulty Index (p-value), Discrimination Index (DI) and Distractor Efficiency (DE) are the parameters which help to evaluate the quality of MCQs used in an…
Systemic factors of errors in the case identification process of the national routine health information system: A case study of Modified Field Health Services Information System in the Philippines

PubMed Central

2011-01-01

Background The quality of data in national health information systems has been questionable in most developing countries. However, the mechanisms of errors in the case identification process are not fully understood. This study aimed to investigate the mechanisms of errors in the case identification process in the existing routine health information system (RHIS) in the Philippines by measuring the risk of committing errors for health program indicators used in the Field Health Services Information System (FHSIS 1996), and characterizing those indicators accordingly. Methods A structured questionnaire on the definitions of 12 selected indicators in the FHSIS was administered to 132 health workers in 14 selected municipalities in the province of Palawan. A proportion of correct answers (difficulty index) and a disparity of two proportions of correct answers between higher and lower scored groups (discrimination index) were calculated, and the patterns of wrong answers for each of the 12 items were abstracted from 113 valid responses. Results None of 12 items reached a difficulty index of 1.00. The average difficulty index of 12 items was 0.266 and the discrimination index that showed a significant difference was 0.216 and above. Compared with these two cut-offs, six items showed non-discrimination against lower difficulty indices of 0.035 (4/113) to 0.195 (22/113), two items showed a positive discrimination against lower difficulty indices of 0.142 (16/113) and 0.248 (28/113), and four items showed a positive discrimination against higher difficulty indices of 0.469 (53/113) to 0.673 (76/113). Conclusions The results suggest three characteristics of definitions of indicators such as those that are (1) unsupported by the current conditions in the health system, i.e., (a) data are required from a facility that cannot directly generate the data and, (b) definitions of indicators are not consistent with its corresponding program; (2) incomplete or ambiguous, which allow several interpretations; and (3) complete yet easily misunderstood by health workers. Taking systemic factors into account, the case identification step needs to be reviewed and designed to generate intended data in health information systems. PMID:21995369
Detecting unexpected variables in the MMPI 2 Social Introversion scale.

PubMed

Chang, C H; Wright, B D

2001-01-01

The standard scoring structure of the revised Minnesota Multiphasic Personality Inventory (MMPI-2) Social Introversion (Si) scale was reexamined with Rasch Measurement. The 69-item Si scale split into two distinct dimensions when their standardized residuals were factor analyzed. Items keyed "true" to Si defined one dimension and items keyed "false" defined another. Relationships between Lexile values (an index of reading difficulty and comprehension) and item difficulties were also explored. The article shows how to use Rasch Measurement to understand and improve personality assessment.
Comparing Methods for Item Analysis: The Impact of Different Item-Selection Statistics on Test Difficulty

ERIC Educational Resources Information Center

Jones, Andrew T.

2011-01-01

Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
CTTITEM: SAS macro and SPSS syntax for classical item analysis.

PubMed

Lei, Pui-Wa; Wu, Qiong

2007-08-01

This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach's alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user's manual that contains instructions and examples are downloadable from suen.ed.psu.edu/-pwlei/plei.htm.
The hierarchy of the activities of daily living in the Katz index in residents of skilled nursing facilities.

PubMed

Gerrard, Paul

2013-01-01

Nursing facility patients are a population that has not been well studied with regard to functional status and independence previously. As such, the manner in which activities of daily living (ADL) relate to one another is not well understood in this population. An understanding of ADL difficulty ordering has helped to devise systems of functional independence grading in other populations, which have value in understanding patients' global levels of independence and providing expectations regarding changes in function. This study seeks to examine the hierarchy of ADL in the nursing facility population. Data were analyzed from the 2004 National Nursing Home Survey, a cross-sectional data set of 13 507 skilled nursing facility subjects with functional independence items. The ADL difficulty hierarchy was determined using Rasch analysis. Item fit values for the Rasch model using Mean-Square infit statistics were also determined. The robustness of the hierarchy was tested for each ADL. Two grading systems were devised from the results of the item difficulty ordering. One was based on the most difficult item that he or she could perform, and the other assigned a grade based on the least difficult item that a subject could not perform. A total of 13 113 patients were included in this analysis, the majority of whom were female and white. They had an average age of 81 years. An ordered hierarchy of ADL was found with eating being the easiest and bathing the most difficult. All items in the Katz index fit the Rasch model adequately well. The majority of patients able to perform any particular ADL were also able to perform all easier ADL. Cohen's κ for the 2 grading systems was 0.73. This study is the first to show the expected hierarchy of difficulty of the 6 activities of daily proposed in the Katz index in the nursing facility population. The hierarchy found in this population matches the original hierarchy found in older adults in the community and acute care settings. It is also similar to hierarchy found in the inpatient rehabilitation setting. Patients would be expected to lose or gain function based on the order of difficulty, but this remains to be confirmed. Among the 6 activities of daily living tested here, their order from easiest to most difficult is eating, maintaining continence, transferring, toileting, dressing, and bathing. In addition, the index formed by these 6 items has construct validity in the nursing facility population.
Investigating the Performance of Omega Index According to Item Parameters and Ability Levels

ERIC Educational Resources Information Center

Sunbul, Onder; Yormaz, Seha

2018-01-01

Purpose: Several studies can be found in the literature that investigate the performance of ? under various conditions. However no study for the effects of item difficulty, item discrimination, and ability restrictions on the performance of ? could be found. The current study aims to investigate the performance of ? for the conditions given below.…
Probing University Students' Pre-Knowledge in Quantum Physics with QPCS Survey

ERIC Educational Resources Information Center

Asikainen, Mervi A.

2017-01-01

The study investigated the use of Quantum Physics Conceptual Survey (QPCS) in probing student understanding of quantum physics. Altogether 103 Finnish university students responded to QPCS. The mean scores of the student responses were calculated and the test was evaluated using common five indices: Item difficulty index, Item discrimination…
Content Validity and Psychometric Characteristics of the "Knowledge about Older Patients Quiz" for Nurses Using Item Response Theory.

PubMed

Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J

2016-11-01

To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational or quality improvement programs. © 2016, Copyright the Authors Journal compilation © 2016, The American Geriatrics Society.
The item level psychometrics of the behaviour rating inventory of executive function-adult (BRIEF-A) in a TBI sample.

PubMed

Waid-Ebbs, J Kay; Wen, Pey-Shan; Heaton, Shelley C; Donovan, Neila J; Velozo, Craig

2012-01-01

To determine whether the psychometrics of the BRIEF-A are adequate for individuals diagnosed with TBI. A prospective observational study in which the BRIEF-A was collected as part of a larger study. Informant ratings of the 75-item BRIEF-A on 89 individuals diagnosed with TBI were examined to determine items level psychometrics for each of the two BRIEF-A indexes: Behaviour Rating Index (BRI) and Metacognitive Index (MI). Patients were either outpatients or at least 1 year post-injury. Each index measured a latent trait, separating individuals into five-to-six ability levels and demonstrated good reliability (0.94 and 0.96). Four items were identified that did not meet the infit criteria. The results provide support for the use of the BRIEF-A as a supplemental assessment of executive function in TBI populations. However, further validation is needed with other measures of executive function. Recommendations include use of the index scores over the Global Executive Composite score and use of the difficulty hierarchy for setting therapy goals.
The development and validation of a test of science critical thinking for fifth graders.

PubMed

Mapeala, Ruslan; Siew, Nyet Moi

2015-01-01

The paper described the development and validation of the Test of Science Critical Thinking (TSCT) to measure the three critical thinking skill constructs: comparing and contrasting, sequencing, and identifying cause and effect. The initial TSCT consisted of 55 multiple choice test items, each of which required participants to select a correct response and a correct choice of critical thinking used for their response. Data were obtained from a purposive sampling of 30 fifth graders in a pilot study carried out in a primary school in Sabah, Malaysia. Students underwent the sessions of teaching and learning activities for 9 weeks using the Thinking Maps-aided Problem-Based Learning Module before they answered the TSCT test. Analyses were conducted to check on difficulty index (p) and discrimination index (d), internal consistency reliability, content validity, and face validity. Analysis of the test-retest reliability data was conducted separately for a group of fifth graders with similar ability. Findings of the pilot study showed that out of initial 55 administered items, only 30 items with relatively good difficulty index (p) ranged from 0.40 to 0.60 and with good discrimination index (d) ranged within 0.20-1.00 were selected. The Kuder-Richardson reliability value was found to be appropriate and relatively high with 0.70, 0.73 and 0.92 for identifying cause and effect, sequencing, and comparing and contrasting respectively. The content validity index obtained from three expert judgments equalled or exceeded 0.95. In addition, test-retest reliability showed good, statistically significant correlations ([Formula: see text]). From the above results, the selected 30-item TSCT was found to have sufficient reliability and validity and would therefore represent a useful tool for measuring critical thinking ability among fifth graders in primary science.

Developing a situational judgment test blueprint for assessing the non-cognitive skills of applicants to the University of Utah School of Medicine, the United States

PubMed Central

2015-01-01

Purpose: The situational judgment test (SJT) shows promise for assessing the non-cognitive skills of medical school applicants, but has only been used in Europe. Since the admissions processes and education levels of applicants to medical school are different in the United States and in Europe, it is necessary to obtain validity evidence of the SJT based on a sample of United States applicants. Methods: Ninety SJT items were developed and Kane’s validity framework was used to create a test blueprint. A total of 489 applicants selected for assessment/interview day at the University of Utah School of Medicine during the 2014-2015 admissions cycle completed one of five SJTs, which assessed professionalism, coping with pressure, communication, patient focus, and teamwork. Item difficulty, each item’s discrimination index, internal consistency, and the categorization of items by two experts were used to create the test blueprint. Results: The majority of item scores were within an acceptable range of difficulty, as measured by the difficulty index (0.50-0.85) and had fair to good discrimination. However, internal consistency was low for each domain, and 63% of items appeared to assess multiple domains. The concordance of categorization between the two educational experts ranged from 24% to 76% across the five domains. Conclusion: The results of this study will help medical school admissions departments determine how to begin constructing a SJT. Further testing with a more representative sample is needed to determine if the SJT is a useful assessment tool for measuring the non-cognitive skills of medical school applicants. PMID:26582629
A Rasch measure of teachers' views of teacher-student relationships in the primary school.

PubMed

Leitao, Natalie; Waugh, Russell F

2012-01-01

This study investigated teacher-student relationships from the teachers' point of view at Perth metropolitan schools in Western Australia. The study identified three key social and emotional aspects that affect teacher-student relationships, namely, Connectedness, Availability and Communication. Data were collected by questionnaire (N = 139) with stem-items answered in three perspectives: (1) Idealistic: this is what I would like to happen; (2) Capability: this is what I am capable of; and (3) Behaviour: this is what actually happens, using four ordered response categories: not at all (score 1), some of the time (score 2), most of the time (score 3), and almost always (score 4). Data were analysed with a Rasch measurement model and a uni-dimensional, linear scale with 24 items, ordered from easy to hard, was created. The data were shown to be highly reliable, so that valid inferences could be made from the scale. The Person Separation Index (akin to a reliability index) was 0.93; there was good global teacher and item fit to the measurement model; there was good item fit; the targeting of the item difficulties against the teacher measures was good, and the response categories were answered consistently and logically. Teachers said that the ideal items were all easier than their corresponding capability items which were in turn easier than the behaviour items (where the items fitted the model), as conceptualized. The easiest ideal items were: I like this child and This child and I get along well together. The hardest ideal item (but still easy) was: I am available for this child. The easiest behaviour item (but still hard) was: This child and I get along well together. The hardest behaviour item (and very hard) was: I am interested to learn about this child's personal thoughts, feelings and experiences. The difficulties of the items supported the conceptual structure of the variable.
Analysis of Validity and Reliability of the Health Literacy Index for Female Marriage Immigrants (HLI-FMI).

PubMed

Yang, Sook Ja; Chee, Yeon Kyung; An, Jisook; Park, Min Hee; Jung, Sunok

2016-05-01

The purpose of this study was to obtain an independent evaluation of the factor structure of the 12-item Health Literacy Index for Female Marriage Immigrants (HLI-FMI), the first measure for assessing health literacy for FMIs in Korea. Participants were 250 Asian women who migrated from China, Vietnam, and the Philippines to marry. The HLI-FMI was originally developed and administered in Korean, and other questionnaires were translated into participants' native languages. The HLI-FMI consisted of 2 factors: (1) Access-Understand Health Literacy (7 items) and (2) Appraise-Apply Health Literacy (5 items); Cronbach's α = .73. Confirmatory factor analysis indicated adequate fit for the 2-factor model. HLI-FMI scores were positively associated with time since immigration and Korean proficiency. Based on classical test theory and item response theory, strong support was provided for item discrimination and item difficulty. Findings suggested that the HLI-FMI is an easily administered, reliable, and valid scale. © 2016 APJPH.
Psychometric Properties of the Heart Disease Knowledge Scale: Evidence from Item and Confirmatory Factor Analyses

PubMed Central

Lim, Bee Chiu; Kueh, Yee Cheng; Arifin, Wan Nor; Ng, Kok Huan

2016-01-01

Background Heart disease knowledge is an important concept for health education, yet there is lack of evidence on proper validated instruments used to measure levels of heart disease knowledge in the Malaysian context. Methods A cross-sectional, survey design was conducted to examine the psychometric properties of the adapted English version of the Heart Disease Knowledge Questionnaire (HDKQ). Using proportionate cluster sampling, 788 undergraduate students at Universiti Sains Malaysia, Malaysia, were recruited and completed the HDKQ. Item analysis and confirmatory factor analysis (CFA) were used for the psychometric evaluation. Construct validity of the measurement model was included. Results Most of the students were Malay (48%), female (71%), and from the field of science (51%). An acceptable range was obtained with respect to both the difficulty and discrimination indices in the item analysis results. The difficulty index ranged from 0.12–0.91 and a discrimination index of ≥ 0.20 were reported for the final retained 23 items. The final CFA model showed an adequate fit to the data, yielding a 23-item, one-factor model [weighted least squares mean and variance adjusted scaled chi-square difference = 1.22, degrees of freedom = 2, P-value = 0.544, the root mean square error of approximation = 0.03 (90% confidence interval = 0.03, 0.04); close-fit P-value = > 0.950]. Conclusion Adequate psychometric values were obtained for Malaysian undergraduate university students using the 23-item, one-factor model of the adapted HDKQ. PMID:27660543
Psychometric Properties of the Heart Disease Knowledge Scale: Evidence from Item and Confirmatory Factor Analyses.

PubMed

Lim, Bee Chiu; Kueh, Yee Cheng; Arifin, Wan Nor; Ng, Kok Huan

2016-07-01

Heart disease knowledge is an important concept for health education, yet there is lack of evidence on proper validated instruments used to measure levels of heart disease knowledge in the Malaysian context. A cross-sectional, survey design was conducted to examine the psychometric properties of the adapted English version of the Heart Disease Knowledge Questionnaire (HDKQ). Using proportionate cluster sampling, 788 undergraduate students at Universiti Sains Malaysia, Malaysia, were recruited and completed the HDKQ. Item analysis and confirmatory factor analysis (CFA) were used for the psychometric evaluation. Construct validity of the measurement model was included. Most of the students were Malay (48%), female (71%), and from the field of science (51%). An acceptable range was obtained with respect to both the difficulty and discrimination indices in the item analysis results. The difficulty index ranged from 0.12-0.91 and a discrimination index of ≥ 0.20 were reported for the final retained 23 items. The final CFA model showed an adequate fit to the data, yielding a 23-item, one-factor model [weighted least squares mean and variance adjusted scaled chi-square difference = 1.22, degrees of freedom = 2, P-value = 0.544, the root mean square error of approximation = 0.03 (90% confidence interval = 0.03, 0.04); close-fit P-value = > 0.950]. Adequate psychometric values were obtained for Malaysian undergraduate university students using the 23-item, one-factor model of the adapted HDKQ.
Psychometrics of Multiple Choice Questions with Non-Functioning Distracters: Implications to Medical Education.

PubMed

Deepak, Kishore K; Al-Umran, Khalid Umran; AI-Sheikh, Mona H; Dkoli, B V; Al-Rubaish, Abdullah

2015-01-01

The functionality of distracters in a multiple choice question plays a very important role. We examined the frequency and impact of functioning and non-functioning distracters on psychometric properties of 5-option items in clinical disciplines. We analyzed item statistics of 1115 multiple choice questions from 15 summative assessments of undergraduate medical students and classified the items into five groups by their number of non-functioning distracters. We analyzed the effect of varying degree of non-functionality ranging from 0 to 4, on test reliability, difficulty index, discrimination index and point biserial correlation. The non-functionality of distracters inversely affected the test reliability and quality of items in a predictable manner. The non-functioning distracters made the items easier and lowered the discrimination index significantly. Three non-functional distracters in a 5-option MCQ significantly affected all psychometric properties (p < 0.5). The corrected point biserial correlation revealed that the items with 3 functional options were psychometrically as effective as 5-option items. Our study reveals that a multiple choice question with 3 functional options provides lower most limit of item format that has adequate psychometric property. The test containing items with less number of functioning options have significantly lower reliability. The distracter function analysis and revision of nonfunctioning distracters can serve as important methods to improve the psychometrics and reliability of assessment.
[Development of critical thinking skill evaluation scale for nursing students].

PubMed

You, So Young; Kim, Nam Cho

2014-04-01

To develop a Critical Thinking Skill Test for Nursing Students. The construct concepts were drawn from a literature review and in-depth interviews with hospital nurses and surveys were conducted among students (n=607) from nursing colleges. The data were collected from September 13 to November 23, 2012 and analyzed using the SAS program, 9.2 version. The KR 20 coefficient for reliability, difficulty index, discrimination index, item-total correlation and known group technique for validity were performed. Four domains and 27 skills were identified and 35 multiple choice items were developed. Thirty multiple choice items which had scores higher than .80 on the content validity index were selected for the pre test. From the analysis of the pre test data, a modified 30 items were selected for the main test. In the main test, the KR 20 coefficient was .70 and Corrected Item-Total Correlations range was .11-.38. There was a statistically significant difference between two academic systems (p=.001). The developed instrument is the first critical thinking skill test reflecting nursing perspectives in hospital settings and is expected to be utilized as a tool which contributes to improvement of the critical thinking ability of nursing students.
Cancer Health Literacy Test-30-Spanish (CHLT-30-DKspa), a New Spanish-Language Version of the Cancer Health Literacy Test (CHLT-30) for Spanish-Speaking Latinos.

PubMed

Echeverri, Margarita; Anderson, David; Nápoles, Anna María

2016-01-01

This article describes the adaptation and initial validation of the Cancer Health Literacy Test (CHLT) for Spanish speakers. A cross-sectional field test of the Spanish version of the CHLT (CHLT-30-DKspa) was conducted among healthy Latinos in Louisiana. Diagonally weighted least squares was used to confirm the factor structure. Item response analysis using 2-parameter logistic estimates was used to identify questions that may require modification to avoid bias. Cronbach's alpha coefficients estimated scale internal consistency reliability. Analysis of variance was used to test for significant differences in CHLT-30-DKspa scores by gender, origin, age and education. The mean CHLT-30-DKspa score (N = 400) was 17.13 (range = 0-30, SD = 6.65). Results confirmed a unidimensional structure, χ(2)(405) = 461.55, p = .027, comparative fit index = .993, Tucker-Lewis index = .992, root mean square error of approximation = .0180. Cronbach's alpha was .88. Items Q1-High Calorie and Q15-Tumor Spread had the lowest item-scale correlations (.148 and .288, respectively) and standardized factor loadings (.152 and .302, respectively). Items Q19-Smoking Risk, Q8-Palliative Care, and Q1-High Calorie had the highest item difficulty parameters (difficulty = 1.12, 1.21, and 2.40, respectively). Results generally support the applicability of the CHLT-30-DKspa for healthy Spanish-speaking populations, with the exception of 4 items that need to be deleted or revised and further studied: Q1, Q8, Q15, and Q19.
Psychometric Evaluation of a Cultural Competency Assessment Instrument for Health Professionals

PubMed Central

Haywood, Sonja H.; Goode, Tawara; Gao, Yong; Smith, Kristyn; Bronheim, Suzanne; Flocke, Susan A; Zyzanski, Steve

2012-01-01

Background Few valid and reliable measures exist for health care professionals interested in determining their levels of cultural and linguistic competence. Objective To evaluate the measurement properties of the Cultural Competence Health Practitioner Assessment (CCHPA-129). Methods The CCHPA-129 is a 129-item web-based instrument, developed by the National Center for Cultural Competence (NCCC). Responses on the CCHPA -129 were examined using factor analysis; Rasch modeling; and Differential Item Functioning (DIF) across race, ethnicity, gender, and profession. Subjects 2504 practitioners, including 1864 nurses (RN/LPN,/BSN); 341 clinicians (PA/NP); and 299 physicians (MD/DO), who completed the CCHPA-129 online between 2005 and 2008. Results Three factors representing domains of knowledge, adapting practice, and promoting health for culturally and linguistically diverse populations accounted for 46% of the variance. Among Knowledge factor items, 53% (23/43) fit the Rasch model, item difficulties ranged from −1.01 logits (least difficult) to +1.11 logits (most difficult), separation index (SI) 13.82, and Cronbach’s α 0.92. Forty-seven percent (21/44) Adapting Practice factor items fit the model, item difficulties −0.07 to +1.11 logits, SI 11.59, Cronbach’s α 0.88; and 58% (23/39). Promoting Health factor items fit the model, item difficulties −1.01 to +1.38 logits, SI 22.64, Cronbach’s α 0.92. Early evidence of validity was established by known groups having statistically different scores. Conclusion The 67-item CCHPA-67 is psychometrically sound. This shorted instrument can be used to establish associations between practitioners’ cultural and linguistic competence and health outcomes as well as to evaluate interventions to increase practitioners’ cultural and linguistic competence. PMID:22437625
Testing the Index of Problematic Online Experiences (I-POE) with a national sample of adolescents.

PubMed

Mitchell, Kimberly J; Jones, Lisa M; Wells, Melissa

2013-12-01

This article assesses the utility of the Index of Problematic Online Experiences (I-POE) in a national sample of adolescents in the United States. The study was based on a cross-sectional national telephone survey of 1560 Internet users, ages 10 through 17. Data were collected between August, 2010 and January, 2011. The I-POE is an 18-item binary response index which can be used to assess problematic internet use across multiple behaviors and activities. Exploratory and confirmatory factor analysis supported a revised index with two factors: a 9-item "excessive use" scale and a 9-item "online social and communication problems" scale among this population. The I-POE showed favorable psychometric properties including adequate internal consistency for the overall scale and for the two subscales. Scores correlate with offline emotional and behavioral difficulties and the I-POE could have value for use as a part of broad mental health assessment procedures in clinical or school settings. Copyright © 2013 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.
Item Writer Judgments of Item Difficulty versus Actual Item Difficulty: A Case Study

ERIC Educational Resources Information Center

Sydorenko, Tetyana

2011-01-01

This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners…
Factor and Rasch analysis of the Fonseca anamnestic index for the diagnosis of myogenous temporomandibular disorder.

PubMed

Rodrigues-Bigaton, Delaine; de Castro, Ester M; Pires, Paulo F

Rasch analysis has been used in recent studies to test the psychometric properties of a questionnaire. The conditions for use of the Rasch model are one-dimensionality (assessed via prior factor analysis) and local independence (the probability of getting a particular item right or wrong should not be conditioned upon success or failure in another). To evaluate the dimensionality and the psychometric properties of the Fonseca anamnestic index (FAI), such as the fit of the data to the model, the degree of difficulty of the items, and the ability to respond in patients with myogenous temporomandibular disorder (TMD). The sample consisted of 94 women with myogenous TMD, diagnosed by the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD), who answered the FAI. For the factor analysis, we applied the Kaiser-Meyer-Olkin test, Bartlett's sphericity, Spearman's correlation, and the determinant of the correlation matrix. For extraction of the factors/dimensions, an eigenvalue >1.0 was used, followed by oblique oblimin rotation. The Rasch analysis was conducted on the dimension that showed the highest proportion of variance explained. Adequate sample "n" and FAI multidimensionality were observed. Dimension 1 (primary) consisted of items 1, 2, 3, 6, and 7. All items of dimension 1 showed adequate fit to the model, being observed according to the degree of difficulty (from most difficult to easiest), respectively, items 2, 1, 3, 6, and 7. The FAI presented multidimensionality with its main dimension consisting of five reliable items with adequate fit to the composition of its structure. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.
Development and psychometric characteristics of the SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks and short forms and the SCI-QOL Bladder Complications scale.

PubMed

Tulsky, David S; Kisala, Pamela A; Tate, Denise G; Spungen, Ann M; Kirshblum, Steven C

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Bladder Management Difficulties and Bowel Management Difficulties item banks and Bladder Complications scale. Using a mixed-methods design, a pool of items assessing bladder and bowel-related concerns were developed using focus groups with individuals with spinal cord injury (SCI) and SCI clinicians, cognitive interviews, and item response theory (IRT) analytic approaches, including tests of model fit and differential item functioning. Thirty-eight bladder items and 52 bowel items were tested at the University of Michigan, Kessler Foundation Research Center, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters VA Medical Center, Bronx, NY. Seven hundred fifty-seven adults with traumatic SCI. The final item banks demonstrated unidimensionality (Bladder Management Difficulties CFI=0.965; RMSEA=0.093; Bowel Management Difficulties CFI=0.955; RMSEA=0.078) and acceptable fit to a graded response IRT model. The final calibrated Bladder Management Difficulties bank includes 15 items, and the final Bowel Management Difficulties item bank consists of 26 items. Additionally, 5 items related to urinary tract infections (UTI) did not fit with the larger Bladder Management Difficulties item bank but performed relatively well independently (CFI=0.992, RMSEA=0.050) and were thus retained as a separate scale. The SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks are psychometrically robust and are available as computer adaptive tests or short forms. The SCI-QOL Bladder Complications scale is a brief, fixed-length outcomes instrument for individuals with a UTI.
Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

PubMed

Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

2017-01-01

The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
Anesthesiology Journal club assessment by means of semantic changes.

PubMed

Vieira, Joaquim Edson; Torres, Marcelo Luís Abramides; Pose, Regina Albanese; Auler, José Otávio Costa Junior

2014-01-01

the interactive approach of a journal club has been described in the medical education literature. The aim of this investigation is to present an assessment of journal club as a tool to address the question whether residents read more and critically. this study reports the performance of medical residents in anesthesiology from the Clinics Hospital - University of São Paulo Medical School. All medical residents were invited to answer five questions derived from discussed papers. The answer sheet consisted of an affirmative statement with a Likert type scale (totally disagree-disagree-not sure-agree-totally agree), each related to one of the chosen articles. The results were evaluated by means of item analysis - difficulty index and discrimination power. residents filled one hundred and seventy three evaluations in the months of December 2011 (n=51), July 2012 (n=66) and December 2012 (n=56). The first exam presented all items with straight statement, second and third exams presented mixed items. Separating "totally agree" from "agree" increased the difficulty indices, but did not improve the discrimination power. the use of a journal club assessment with straight and inverted statements and by means of five points scale for agreement has been shown to increase its item difficulty and discrimination power. This may reflect involvement either with the reading or the discussion during the journal meeting. Copyright © 2013 Sociedade Brasileira de Anestesiologia. Published by Elsevier Editora Ltda. All rights reserved.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

PubMed

Schweizer, Karl; Troche, Stefan

2018-02-01

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.
Critical success factors in awareness of and choice towards low vision rehabilitation.

PubMed

Fraser, Sarah A; Johnson, Aaron P; Wittich, Walter; Overbury, Olga

2015-01-01

The goal of the current study was to examine the critical factors indicative of an individual's choice to access low vision rehabilitation services. Seven hundred and forty-nine visually impaired individuals, from the Montreal Barriers Study, completed a structured interview and questionnaires (on visual function, coping, depression, satisfaction with life). Seventy-five factors from the interview and questionnaires were entered into a data-driven Classification and Regression Tree Analysis in order to determine the best predictors of awareness group: positive personal choice (I knew and I went), negative personal choice (I knew and did not go), and lack of information (Nobody told me, and I did not know). Having a response of moderate to no difficulty on item 6 (reading signs) of the Visual Function Index 14 (VF-14) indicated that the person had made a positive personal choice to seek rehabilitation, whereas reporting a great deal of difficulty on this item was associated with a lack of information on low vision rehabilitation. In addition to this factor, symptom duration of under nine years, moderate difficulty or less on item 5 (seeing steps or curbs) of the VF-14, and an indication of little difficulty or less on item 3 (reading large print) of the VF-14 further identified those who were more likely to have made a positive personal choice. Individuals in the lack of information group also reported greater difficulty on items 3 and 5 of the VF-14 and were more likely to be male. The duration-of-symptoms factor suggests that, even in the positive choice group, it may be best to offer rehabilitation services early. Being male and responding moderate difficulty or greater to the VF-14 questions about far, medium-distance and near situations involving vision was associated with individuals that lack information. Consequently, these individuals may need additional education about the benefits of low vision services in order to make a positive personal choice. © 2014 The Authors Ophthalmic & Physiological Optics © 2014 The College of Optometrists.
The Standardization of the Concepts about Print into Greek

ERIC Educational Resources Information Center

Tafa, Eufimia

2009-01-01

The purpose of this study was to translate and standardize Concepts About Print (C.A.P.) into Greek, and to assess its psychometric properties. Particularly, this study evaluated the reliability and validity of the Greek version of C.A.P., and item difficulty and discrimination index and examined whether there were differences between boys and…
Psychometric properties of the neck disability index amongst patients with chronic neck pain using item response theory.

PubMed

Saltychev, Mikhail; Mattie, Ryan; McCormick, Zachary; Laimi, Katri

2017-05-13

The Neck Disability Index (NDI) is commonly used for clinical and research assessment for chronic neck pain, yet the original version of this tool has not undergone significant validity testing, and in particular, there has been minimal assessment using Item Response Theory. The goal of the present study was to investigate the psychometric properties of the original version of the NDI in a large sample of individuals with chronic neck pain by defining its internal consistency, construct structure and validity, and its ability to discriminate between different degrees of functional limitation. This is a cross-sectional cohort study of 585 consecutive patients with chronic neck pain seen in a university hospital rehabilitation clinic. Internal consistency was evaluated using Cronbach's alpha, construct structure was evaluated by exploratory factor analysis, and discrimination ability was determined by Item Response Theory. The NDI demonstrated good internal consistency assessed by Cronbach's alpha (0.87). The exploratory factor analysis identified only one factor with eigenvalue considered significant (cutoff 1.0). When analyzed by Item Response Theory, eight out of 10 items demonstrated almost ideal difficulty parameter estimates. In addition, eight out of 10 items showed high to perfect estimates of discrimination ability (overall range 0.8 to 2.9). Amongst patients with chronic neck pain, the NDI was found to have good internal consistency, have unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. Implications for Rehabilitation The Neck Disability Index has good internal consistency, unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. The Neck Disability Index is recommended for use when selecting patients for rehabilitation, setting rehabilitation goals, and measuring the outcome of intervention.
Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns.

PubMed

Wolfe, Edward W; McGill, Michael T

2011-01-01

This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.

Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Exploring Item Characteristics That Are Related to the Difficulty of TOEFL Dialogue Items. Research Reports. RR-79. RR-04-11

ERIC Educational Resources Information Center

Kostin, Irene

2004-01-01

The purpose of this study is to explore the relationship between a set of item characteristics and the difficulty of TOEFL[R] dialogue items. Identifying characteristics that are related to item difficulty has the potential to improve the efficiency of the item-writing process The study employed 365 TOEFL dialogue items, which were coded on 49…
Statistical Approaches to the Study of Item Difficulty.

ERIC Educational Resources Information Center

Olson, John F.; And Others

Traditionally, item difficulty has been defined in terms of the performance of examinees. For test development purposes, a more useful concept would be some kind of intrinsic item difficulty, defined in terms of the item's content, context, or characteristics and the task demands set by the item. In this investigation, the measurement literature…
Modelling Question Difficulty in an A Level Physics Examination

ERIC Educational Resources Information Center

Crisp, Victoria; Grayson, Rebecca

2013-01-01

"Item difficulty modelling" is a technique used for a number of purposes such as to support future item development, to explore validity in relation to the constructs that influence difficulty and to predict the difficulty of items. This research attempted to explore the factors influencing question difficulty in a general qualification…
Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne

2017-01-01

Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…
An Improved Internal Consistency Reliability Estimate.

ERIC Educational Resources Information Center

Cliff, Norman

1984-01-01

The proposed coefficient is derived by assuming that the average Goodman-Kruskal gamma between items of identical difficulty would be the same for items of different difficulty. An estimate of covariance between items of identical difficulty leads to an estimate of the correlation between two tests with identical distributions of difficulty.…
The Confounding Effects of Ability, Item Difficulty, and Content Balance within Multiple Dimensions on the Estimation of Unidimensional Thetas

ERIC Educational Resources Information Center

Matlock, Ki Lynn

2013-01-01

When test forms that have equal total test difficulty and number of items vary in difficulty and length within sub-content areas, an examinee's estimated score may vary across equivalent forms, depending on how well his or her true ability in each sub-content area aligns with the difficulty of items and number of items within these areas.…
Development and Psychometric Evaluation of the Gay Male Sexual Difficulties Scale.

PubMed

McDonagh, Lorraine K; Stewart, Ian; Morrison, Melanie A; Morrison, Todd G

2016-08-01

Sexual difficulties (i.e., disturbances in normal sexual responding) have the potential to significantly and negatively affect men's social and psychological well-being. However, a review of published measurement tools indicates that most have limited applicability to gay men, and none offer a nuanced understanding of sexual difficulties, as experienced by members of this population. To address this omission, the Gay Male Sexual Difficulties Scale (GMSDS) was developed using a sequential mixed-methods approach. The 25-item GMSDS uses a 6-point frequency Likert-type response format and examines: difficulties with receptive and insertive anal intercourse (5 items each); erectile difficulties (4 items); foreskin difficulties (4 items); body embarrassment (4 items); and seminal fluid concerns (3 items). The measure's scale score dimensionality, assessed using both exploratory and confirmatory factor analyses, as well as scale score reliability and validity (e.g., known-groups and convergent) was tested and deemed to be satisfactory. Limitations of the current series of studies and directions for future research are discussed.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment.

PubMed

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-03-01

The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.
The Utrecht questionnaire (U-CEP) measuring knowledge on clinical epidemiology proved to be valid.

PubMed

Kortekaas, Marlous F; Bartelink, Marie-Louise E L; de Groot, Esther; Korving, Helen; de Wit, Niek J; Grobbee, Diederick E; Hoes, Arno W

2017-02-01

Knowledge on clinical epidemiology is crucial to practice evidence-based medicine. We describe the development and validation of the Utrecht questionnaire on knowledge on Clinical epidemiology for Evidence-based Practice (U-CEP); an assessment tool to be used in the training of clinicians. The U-CEP was developed in two formats: two sets of 25 questions and a combined set of 50. The validation was performed among postgraduate general practice (GP) trainees, hospital trainees, GP supervisors, and experts. Internal consistency, internal reliability (item-total correlation), item discrimination index, item difficulty, content validity, construct validity, responsiveness, test-retest reliability, and feasibility were assessed. The questionnaire was externally validated. Internal consistency was good with a Cronbach alpha of 0.8. The median item-total correlation and mean item discrimination index were satisfactory. Both sets were perceived as relevant to clinical practice. Construct validity was good. Both sets were responsive but failed on test-retest reliability. One set took 24 minutes and the other 33 minutes to complete, on average. External GP trainees had comparable results. The U-CEP is a valid questionnaire to assess knowledge on clinical epidemiology, which is a prerequisite for practicing evidence-based medicine in daily clinical practice. Copyright © 2016 Elsevier Inc. All rights reserved.
An Analysis of Factors Affecting the Difficulty of Dialogue Items in TOEFL Listening Comprehension. TOEFL Research Reports, 51.

ERIC Educational Resources Information Center

Nissan, Susan; And Others

One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Sources of difficulty in assessment: example of PISA science items

NASA Astrophysics Data System (ADS)

Le Hebel, Florence; Montpied, Pascale; Tiberghien, Andrée; Fontanieu, Valérie

2017-03-01

The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item's proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item's proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students' low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.
Identifying predictors of physics item difficulty: A linear regression approach

NASA Astrophysics Data System (ADS)

Mesic, Vanes; Muratovic, Hasnija

2011-06-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.
The Utility of the Family Empowerment Scale With Custodial Grandmothers

PubMed Central

Hayslip, Bert; Smith, Gregory C.; Montoro-Rodriguez, Julian; Streider, Frederick H.; Merchant, William

2016-01-01

The Family Empowerment Scale (FES) was developed specifically to assess empowerment in families with emotional disorders. Its relevance to custodial grandfamilies is reflected in the difficulties in grandchildren's social, emotional, and behavioral functioning, wherein such difficulties may be explained via either reactions to changes in their family structure or in their responses to the newly formed family unit. Utilizing 27 items derived from the 34-item version of the FES, which had represented differential levels of empowerment (family, service system, community) as indexed by one's attitudes, knowledge, and behavior, we explored the factor structure, internal consistency, construct, and convergent validity of the FES with grandparent caregivers. Three-hundred forty-three (M age = 58.45, SD = 8.22, n Caucasian = 152, n African American = 149, n Hispanic = 38) custodial grandmothers caring for grandchildren between ages 4 and 12 years completed the 27 FES items and various measures of their psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. Factor analysis revealed three factors that differed slightly from the originally proposed FES subscales: Parental Self-Efficacy/Self-Confidence, Service Activism, and Service Knowledge. Each of the factors was internally consistent, and derived factor scores were moderately interrelated, speaking to the question of convergent validity. The construct validity of these three factors was evidenced by meaningful patterns of statistically significant correlations with grandmothers’ psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. These factor scores were independent of grandmother age, health, and education. These findings suggest the newly identified FES factors to be valuable in understanding empowerment among grandmother caregivers. PMID:26452627
The Utility of the Family Empowerment Scale With Custodial Grandmothers.

PubMed

Hayslip, Bert; Smith, Gregory C; Montoro-Rodriguez, Julian; Streider, Frederick H; Merchant, William

2017-03-01

The Family Empowerment Scale (FES) was developed specifically to assess empowerment in families with emotional disorders. Its relevance to custodial grandfamilies is reflected in the difficulties in grandchildren's social, emotional, and behavioral functioning, wherein such difficulties may be explained via either reactions to changes in their family structure or in their responses to the newly formed family unit. Utilizing 27 items derived from the 34-item version of the FES, which had represented differential levels of empowerment (family, service system, community) as indexed by one's attitudes, knowledge, and behavior, we explored the factor structure, internal consistency, construct, and convergent validity of the FES with grandparent caregivers. Three-hundred forty-three ( M age = 58.45, SD = 8.22, n Caucasian = 152, n African American = 149, n Hispanic = 38) custodial grandmothers caring for grandchildren between ages 4 and 12 years completed the 27 FES items and various measures of their psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. Factor analysis revealed three factors that differed slightly from the originally proposed FES subscales: Parental Self-Efficacy/Self-Confidence, Service Activism, and Service Knowledge. Each of the factors was internally consistent, and derived factor scores were moderately interrelated, speaking to the question of convergent validity. The construct validity of these three factors was evidenced by meaningful patterns of statistically significant correlations with grandmothers' psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. These factor scores were independent of grandmother age, health, and education. These findings suggest the newly identified FES factors to be valuable in understanding empowerment among grandmother caregivers.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

ERIC Educational Resources Information Center

Schweizer, Karl; Troche, Stefan

2018-01-01

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of…
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment

PubMed Central

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-01-01

Background The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Methods Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. Results None of the five items from the alternate versions matched the difficulty level of their corresponding original items. Conclusions This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time. PMID:27076861
The Assessment of Physiotherapy Practice (APP) is a valid measure of professional competence of physiotherapy students: a cross-sectional study with Rasch analysis.

PubMed

Dalton, Megan; Davidson, Megan; Keating, Jenny

2011-01-01

Is the Assessment of Physiotherapy Practice (APP) a valid instrument for the assessment of entry-level competence in physiotherapy students? Cross-sectional study with Rasch analysis of initial (n=326) and validation samples (n=318). Students were assessed on completion of 4, 5, or 6-week clinical placements across one university semester. 298 clinical educators and 456 physiotherapy students at nine universities in Australia and New Zealand provided 644 completed APP instruments. APP data in both samples showed overall fit to a Rasch model of expected item functioning for interval scale measurement. Item 6 (Written communication) exhibited misfit in both samples, but was retained as an important element of competence. The hierarchy of item difficulty was the same in both samples with items related to professional behaviour and communication the easiest to achieve and items related to clinical reasoning the most difficult. Item difficulty was well targeted to person ability. No Differential Item Functioning was identified, indicating that the scale performed in a comparable way regardless of the student's age, gender or amount of prior clinical experience, and the educator's age, gender, or experience as an educator, or the type of facility, university, or clinical area. The instrument demonstrated unidimensionality confirming the appropriateness of summing the scale scores on each item to provide an overall score of clinical competence and was able to discriminate four levels of professional competence (Person Separation Index=0.96). Person ability and raw APP scores had a linear relationship (r(2)=0.99). Rasch analysis supports the interpretation that a student's APP score is an indication of their underlying level of professional competence in workplace practice. Copyright © 2011 Australian Physiotherapy Association. Published by .. All rights reserved.
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd; Gerritz, Kalle

1990-01-01

Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)

Item Structural Properties as Predictors of Item Difficulty and Item Association.

ERIC Educational Resources Information Center

Solano-Flores, Guillermo

1993-01-01

Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.

ERIC Educational Resources Information Center

Perkins, Kyle; And Others

1995-01-01

This article reports the results of using a three-layer back propagation artificial neural network to predict item difficulty in a reading comprehension test. Three classes of variables were examined: text structure, propositional analysis, and cognitive demand. Results demonstrate that the networks can consistently predict item difficulty. (JL)
Multiple choice questions can be designed or revised to challenge learners' critical thinking.

PubMed

Tractenberg, Rochelle E; Gushta, Matthew M; Mulroney, Susan E; Weissinger, Peggy A

2013-12-01

Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging-perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item's difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.
Measuring impairments of functioning and health in patients with axial spondyloarthritis by using the ASAS Health Index and the Environmental Item Set: translation and cross-cultural adaptation into 15 languages.

PubMed

Kiltz, U; van der Heijde, D; Boonen, A; Bautista-Molano, W; Burgos-Vargas, R; Chiowchanwisawakit, P; Duruoz, T; El-Zorkany, B; Essers, I; Gaydukova, I; Géher, P; Gossec, L; Grazio, S; Gu, J; Khan, M A; Kim, T J; Maksymowych, W P; Marzo-Ortega, H; Navarro-Compán, V; Olivieri, I; Patrikos, D; Pimentel-Santos, F M; Schirmer, M; van den Bosch, F; Weber, U; Zochling, J; Braun, J

2016-01-01

The Assessments of SpondyloArthritis international society Health Index (ASAS HI) measures functioning and health in patients with spondyloarthritis (SpA) across 17 aspects of health and 9 environmental factors (EF). The objective was to translate and adapt the original English version of the ASAS HI, including the EF Item Set, cross-culturally into 15 languages. Translation and cross-cultural adaptation has been carried out following the forward-backward procedure. In the cognitive debriefing, 10 patients/country across a broad spectrum of sociodemographic background, were included. The ASAS HI and the EF Item Set were translated into Arabic, Chinese, Croatian, Dutch, French, German, Greek, Hungarian, Italian, Korean, Portuguese, Russian, Spanish, Thai and Turkish. Some difficulties were experienced with translation of the contextual factors indicating that these concepts may be more culturally-dependent. A total of 215 patients with axial SpA across 23 countries (62.3% men, mean (SD) age 42.4 (13.9) years) participated in the field test. Cognitive debriefing showed that items of the ASAS HI and EF Item Set are clear, relevant and comprehensive. All versions were accepted with minor modifications with respect to item wording and response option. The wording of three items had to be adapted to improve clarity. As a result of cognitive debriefing, a new response option 'not applicable' was added to two items of the ASAS HI to improve appropriateness. This study showed that the items of the ASAS HI including the EFs were readily adaptable throughout all countries, indicating that the concepts covered were comprehensive, clear and meaningful in different cultures.
Measuring impairments of functioning and health in patients with axial spondyloarthritis by using the ASAS Health Index and the Environmental Item Set: translation and cross-cultural adaptation into 15 languages

PubMed Central

Kiltz, U; van der Heijde, D; Boonen, A; Bautista-Molano, W; Burgos-Vargas, R; Chiowchanwisawakit, P; Duruoz, T; El-Zorkany, B; Essers, I; Gaydukova, I; Géher, P; Gossec, L; Grazio, S; Gu, J; Khan, M A; Kim, T J; Maksymowych, W P; Marzo-Ortega, H; Navarro-Compán, V; Olivieri, I; Patrikos, D; Pimentel-Santos, F M; Schirmer, M; van den Bosch, F; Weber, U; Zochling, J; Braun, J

2016-01-01

Introduction The Assessments of SpondyloArthritis international society Health Index (ASAS HI) measures functioning and health in patients with spondyloarthritis (SpA) across 17 aspects of health and 9 environmental factors (EF). The objective was to translate and adapt the original English version of the ASAS HI, including the EF Item Set, cross-culturally into 15 languages. Methods Translation and cross-cultural adaptation has been carried out following the forward–backward procedure. In the cognitive debriefing, 10 patients/country across a broad spectrum of sociodemographic background, were included. Results The ASAS HI and the EF Item Set were translated into Arabic, Chinese, Croatian, Dutch, French, German, Greek, Hungarian, Italian, Korean, Portuguese, Russian, Spanish, Thai and Turkish. Some difficulties were experienced with translation of the contextual factors indicating that these concepts may be more culturally-dependent. A total of 215 patients with axial SpA across 23 countries (62.3% men, mean (SD) age 42.4 (13.9) years) participated in the field test. Cognitive debriefing showed that items of the ASAS HI and EF Item Set are clear, relevant and comprehensive. All versions were accepted with minor modifications with respect to item wording and response option. The wording of three items had to be adapted to improve clarity. As a result of cognitive debriefing, a new response option ‘not applicable’ was added to two items of the ASAS HI to improve appropriateness. Discussion This study showed that the items of the ASAS HI including the EFs were readily adaptable throughout all countries, indicating that the concepts covered were comprehensive, clear and meaningful in different cultures. PMID:27752358
Development and initial evaluation of the SCI-FI/AT

PubMed Central

Jette, Alan M.; Slavin, Mary D.; Ni, Pengsheng; Kisala, Pamela A.; Tulsky, David S.; Heinemann, Allen W.; Charlifue, Susie; Tate, Denise G.; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve

2015-01-01

Objectives To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Design Cross sectional survey followed by computerized adaptive test (CAT) simulations. Setting Inpatient and community settings. Participants A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. Interventions None Main outcome measure SCI-FI/AT Results Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. Conclusion With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI. PMID:26010975
Development and initial evaluation of the SCI-FI/AT.

PubMed

Jette, Alan M; Slavin, Mary D; Ni, Pengsheng; Kisala, Pamela A; Tulsky, David S; Heinemann, Allen W; Charlifue, Susie; Tate, Denise G; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve

2015-05-01

To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Cross sectional survey followed by computerized adaptive test (CAT) simulations. Inpatient and community settings. A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. None SCI-FI/AT RESULTS: Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI.
Classical test theory and Rasch analysis validation of the Upper Limb Functional Index in subjects with upper limb musculoskeletal disorders.

PubMed

Bravini, Elisabetta; Franchignoni, Franco; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano; Foti, Calogero

2015-01-01

To perform a comprehensive analysis of the psychometric properties and dimensionality of the Upper Limb Functional Index (ULFI) using both classical test theory and Rasch analysis (RA). Prospective, single-group observational design. Freestanding rehabilitation center. Convenience sample of Italian-speaking subjects with upper limb musculoskeletal disorders (N=174). Not applicable. The Italian version of the ULFI. Data were analyzed using parallel analysis, exploratory factor analysis, and RA for evaluating dimensionality, functioning of rating scale categories, item fit, hierarchy of item difficulties, and reliability indices. Parallel analysis revealed 2 factors explaining 32.5% and 10.7% of the response variance. RA confirmed the failure of the unidimensionality assumption, and 6 items out of the 25 misfitted the Rasch model. When the analysis was rerun excluding the misfitting items, the scale showed acceptable fit values, loading meaningfully to a single factor. Item separation reliability and person separation reliability were .98 and .89, respectively. Cronbach alpha was .92. RA revealed weakness of the scale concerning dimensionality and internal construct validity. However, a set of 19 ULFI items defined through the statistical process demonstrated a unidimensional structure, good psychometric properties, and clinical meaningfulness. These findings represent a useful starting point for further analyses of the tool (based on modern psychometric approaches and confirmatory factor analysis) in larger samples, including different patient populations and nationalities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The Effects of Judgment-Based Stratum Classifications on the Efficiency of Stratum Scored CATs.

ERIC Educational Resources Information Center

Finney, Sara J.; Smith, Russell W.; Wise, Steven L.

Two operational item pools were used to investigate the performance of stratum computerized adaptive tests (CATs) when items were assigned to strata based on empirical estimates of item difficulty or human judgments of item difficulty. Items from the first data set consisted of 54 5-option multiple choice items from a form of the ACT mathematics…
Repeated retrieval practice and item difficulty: does criterion learning eliminate item difficulty effects?

PubMed

Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A

2013-12-01

A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
The Effect of the Position of an Item within a Test on the Item Difficulty Value.

ERIC Educational Resources Information Center

Rubin, Lois S.; Mott, David E. W.

An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
Internet-based survey of factors associated with subjective feeling of insomnia, depression, and low health-related quality of life among Japanese adults with sleep difficulty.

PubMed

Aritake, Sayaka; Asaoka, Shoichi; Kagimura, Tatsuo; Shimura, Akiyoshi; Futenma, Kunihiro; Komada, Yoko; Inoue, Yuichi

2015-04-01

This study was conducted to determine what symptom components or conditions of insomnia are related to subjective feelings of insomnia, low health-related quality of life (HRQOL), or depression. Data from 7,027 Japanese adults obtained using an Internet-based questionnaire survey was analyzed to examine associations between demographic variables and each sleep difficulty symptom item on the Pittsburgh Sleep Quality Index (PSQI) with the presence/absence of subjective insomnia and scores on the Short Form-8 (SF-8) and Center for Epidemiologic Studies Depression Scale (CES-D). Prevalence of subjective insomnia was 12.2% (n = 860). Discriminant function analysis revealed that item scores for sleep quality, sleep latency, and sleep medication use on the PSQI and CES-D showed relatively high discriminant function coefficients for identifying positivity for the subjective feeling of insomnia. Among respondents with subjective insomnia, a low SF-8 physical component summary score was associated with higher age, depressive state, and PSQI items for sleep difficulty and daytime dysfunction, whereas a low SF-8 mental component summary score was associated with depressive state, PSQI sleep latency, sleeping medication use, and daytime dysfunction. Depressive state was significantly associated with sleep latency, sleeping medication use, and daytime dysfunction. Among insomnia symptom components, disturbed sleep quality and sleep onset insomnia may be specifically associated with subjective feelings of the disorder. The existence of a depressive state could be significantly associated with not only subjective insomnia but also mental and physical QOL. Our results also suggest that different components of sleep difficulty, as measured by the PSQI, might be associated with mental and physical QOL and depressive status.
Item Response Theory Modeling of the Philadelphia Naming Test.

PubMed

Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

2015-06-01

In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
Using the Nudge and Shove Methods to Adjust Item Difficulty Values.

PubMed

Royal, Kenneth D

2015-01-01

In any examination, it is important that a sufficient mix of items with varying degrees of difficulty be present to produce desirable psychometric properties and increase instructors' ability to make appropriate and accurate inferences about what a student knows and/or can do. The purpose of this "teaching tip" is to demonstrate how examination items can be affected by the quality of distractors, and to present a simple method for adjusting items to meet difficulty specifications.
Component Identification and Item Difficulty of Raven's Matrices Items.

ERIC Educational Resources Information Center

Green, Kathy E.; Kluever, Raymond C.

Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…
Rasch Measurement and Item Banking: Theory and Practice.

ERIC Educational Resources Information Center

Nakamura, Yuji

The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Differential Item Functioning in Primary Healthcare Evaluation Instruments by French/English Version, Educational Level and Urban/Rural Location

PubMed Central

Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.

2011-01-01

Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
When Listening Is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions.

PubMed

Short, Kathleen; Bucak, S Deniz; Rosenthal, Francine; Raymond, Mark R

2018-05-01

In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
Enhancing the Equating of Item Difficulty Metrics: Estimation of Reference Distribution. Research Report. ETS RR-14-07

ERIC Educational Resources Information Center

Ali, Usama S.; Walker, Michael E.

2014-01-01

Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…
Do Images Influence Assessment in Anatomy? Exploring the Effect of Images on Item Difficulty and Item Discrimination

ERIC Educational Resources Information Center

Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.

2013-01-01

Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…

Validating an Agency-based Tool for Measuring Women's Empowerment in a Complex Public Health Trial in Rural Nepal.

PubMed

Gram, Lu; Morrison, Joanna; Sharma, Neha; Shrestha, Bhim; Manandhar, Dharma; Costello, Anthony; Saville, Naomi; Skordis-Worrall, Jolene

2017-01-02

Despite the rising popularity of indicators of women's empowerment in global development programmes, little work has been done on the validity of existing measures of such a complex concept. We present a mixed methods validation of the use of the Relative Autonomy Index for measuring Amartya Sen's notion of agency freedom in rural Nepal. Analysis of think-aloud interviews ( n = 7) indicated adequate respondent understanding of questionnaire items, but multiple problems of interpretation including difficulties with the four-point Likert scale, questionnaire item ambiguity and difficulties with translation. Exploratory Factor Analysis of a calibration sample ( n = 511) suggested two positively correlated factors ( r = 0.64) loading on internally and externally motivated behaviour. Both factors increased with decreasing education and decision-making power on large expenditures and food preparation. Confirmatory Factor Analysis on a validation sample ( n = 509) revealed good fit (Root Mean Square Error of Approximation 0.05-0.08, Comparative Fit Index 0.91-0.99). In conclusion, we caution against uncritical use of agency-based quantification of women's empowerment. While qualitative and quantitative analysis revealed overall satisfactory construct and content validity, the positive correlation between external and internal motivations suggests the existence of adaptive preferences. High scores on internally motivated behaviour may reflect internalized oppression rather than agency freedom.
Validating an Agency-based Tool for Measuring Women’s Empowerment in a Complex Public Health Trial in Rural Nepal

PubMed Central

Gram, Lu; Morrison, Joanna; Sharma, Neha; Shrestha, Bhim; Manandhar, Dharma; Costello, Anthony; Saville, Naomi; Skordis-Worrall, Jolene

2017-01-01

Abstract Despite the rising popularity of indicators of women’s empowerment in global development programmes, little work has been done on the validity of existing measures of such a complex concept. We present a mixed methods validation of the use of the Relative Autonomy Index for measuring Amartya Sen’s notion of agency freedom in rural Nepal. Analysis of think-aloud interviews (n = 7) indicated adequate respondent understanding of questionnaire items, but multiple problems of interpretation including difficulties with the four-point Likert scale, questionnaire item ambiguity and difficulties with translation. Exploratory Factor Analysis of a calibration sample (n = 511) suggested two positively correlated factors (r = 0.64) loading on internally and externally motivated behaviour. Both factors increased with decreasing education and decision-making power on large expenditures and food preparation. Confirmatory Factor Analysis on a validation sample (n = 509) revealed good fit (Root Mean Square Error of Approximation 0.05–0.08, Comparative Fit Index 0.91–0.99). In conclusion, we caution against uncritical use of agency-based quantification of women’s empowerment. While qualitative and quantitative analysis revealed overall satisfactory construct and content validity, the positive correlation between external and internal motivations suggests the existence of adaptive preferences. High scores on internally motivated behaviour may reflect internalized oppression rather than agency freedom. PMID:28303173
Outcome-based self-assessment on a team-teaching subject in the medical school

PubMed Central

Cho, Sa Sun

2014-01-01

We attempted to investigate the reason why the students got a worse grade in gross anatomy and the way how we can improve upon the teaching method since there were gaps between teaching and learning under recently changed integration curriculum. General characteristics of students and exploratory factors to testify the validity were compared between year 2011 and 2012. Students were asked to complete a short survey with a Likert scale. The results were as follows: although the percentage of acceptable items was similar between professors, professor C preferred questions with adequate item discrimination and inappropriate item difficulty whereas professor Y preferred adequate item discrimination and appropriate item difficulty with statistical significance (P<0.01). The survey revealed that 26.5% of total students gave up the exam on gross anatomy of professor Y irrespective of years. These results suggested that students were affected by the corrected item difficulty rather than item discrimination in order to obtain academic achievement. Therefore, professors in a team-teaching subject should reach a consensus on an item difficulty with proper teaching methods. PMID:25548724
A Comparison of Three Test Formats to Assess Word Difficulty

ERIC Educational Resources Information Center

Culligan, Brent

2015-01-01

This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…
The Relationship between Older Adults’ Risk for a Future Fall and Difficulty Performing Activities of Daily Living

PubMed Central

Mamikonian-Zarpas, Ani; Laganá, Luciana

2016-01-01

Functional status is often defined by cumulative scores across indices of independence in performing basic and instrumental activities of daily living (ADL/IADL), but little is known about the unique relationship of each daily activity item with the fall outcome. The purpose of this retrospective study was to examine the level of relative risk for a future fall associated with difficulty with performing various tasks of normal daily functioning among older adults who had fallen at least once in the past 12 months. The sample was comprised of community-dwelling individuals 70 years and older from the 1984–1990 Longitudinal Study of Aging by Kovar, Fitti, and Chyba (1992). Risk analysis was performed on individual items quantifying 6 ADLs and 7 IADLs, as well as 10 items related to mobility limitations. Within a subsample of 1,675 older adults with a history of at least one fall within the past year, the responses of individuals who reported multiple falls were compared to the responses of participants who had a single fall and reported 1) difficulty with walking and/or balance (FRAIL group, n = 413) vs. 2) no difficulty with walking or dizziness (NDW+ND group, n = 415). The items that had the strongest relationships and highest risk ratios for the FRAIL group (which had the highest probabilities for a future fall) included difficulty with: eating (73%); managing money (70%); biting or chewing food (66%); walking a quarter of a mile (65%); using fingers to grasp (65%); and dressing without help (65%). For the NDW+ND group, the most noteworthy items included difficulty with: bathing or showering (79%); managing money (77%); shopping for personal items (75%); walking up 10 steps without rest (72%); difficulty with walking a quarter of a mile (72%); and stooping/crouching/kneeling (70%). These findings suggest that individual items quantifying specific ADLs and IADLs have substantive relationships with the fall outcome among older adults who have difficulty with walking and balance, as well as among older individuals without dizziness or difficulty with walking. Furthermore, the examination of the relationships between items that are related to more challenging activities and the fall outcome revealed that higher functioning older adults who reported difficulty with the 6 items that yielded the highest risk ratios may also be at elevated risk for a fall. PMID:27200366
Rasch analysis of the participation scale (P-scale): usefulness of the P-scale to a rehabilitation services network.

PubMed

Souza, Mariana Angélica Peixoto; Coster, Wendy Jane; Mancini, Marisa Cotta; Dutra, Fabiana Caetano Martins Silva; Kramer, Jessica; Sampaio, Rosana Ferreira

2017-12-08

A person's participation is acknowledged as an important outcome of the rehabilitation process. The Participation Scale (P-Scale) is an instrument that was designed to assess the participation of individuals with a health condition or disability. The scale was developed in an effort to better describe the participation of people living in middle-income and low-income countries. The aim of this study was to use Rasch analysis to examine whether the Participation Scale is suitable to assess the perceived ability to take part in participation situations by patients with diverse levels of function. The sample was comprised by 302 patients from a public rehabilitation services network. Participants had orthopaedic or neurological health conditions, were at least 18 years old, and completed the Participation Scale. Rasch analysis was conducted using the Winsteps software. The mean age of all participants was 45.5 years (standard deviation = 14.4), 52% were male, 86% had orthopaedic conditions, and 52% had chronic symptoms. Rasch analysis was performed using a dichotomous rating scale, and only one item showed misfit. Dimensionality analysis supported the existence of only one Rasch dimension. The person separation index was 1.51, and the item separation index was 6.38. Items N2 and N14 showed Differential Item Functioning between men and women. Items N6 and N12 showed Differential Item Functioning between acute and chronic conditions. The item difficulty range was -1.78 to 2.09 logits, while the sample ability range was -2.41 to 4.61 logits. The P-Scale was found to be useful as a screening tool for participation problems reported by patients in a rehabilitation context, despite some issues that should be addressed to further improve the scale.
Factors Affecting Item Difficulty in English Listening Comprehension Tests

ERIC Educational Resources Information Center

Sung, Pei-Ju; Lin, Su-Wei; Hung, Pi-Hsia

2015-01-01

Task difficulty is a critical issue affecting test developers. Controlling or balancing the item difficulty of an assessment improves its validity and discrimination. Test developers construct tests from the cognitive perspective, by making the test constructing process more scientific and efficient; thus, the scores obtained more precisely…
Comparison of university students' understanding of graphs in different contexts

NASA Astrophysics Data System (ADS)

Planinic, Maja; Ivanjek, Lana; Susac, Ana; Milin-Sipus, Zeljka

2013-12-01

This study investigates university students’ understanding of graphs in three different domains: mathematics, physics (kinematics), and contexts other than physics. Eight sets of parallel mathematics, physics, and other context questions about graphs were developed. A test consisting of these eight sets of questions (24 questions in all) was administered to 385 first year students at University of Zagreb who were either prospective physics or mathematics teachers or prospective physicists or mathematicians. Rasch analysis of data was conducted and linear measures for item difficulties were obtained. Average difficulties of items in three domains (mathematics, physics, and other contexts) and over two concepts (graph slope, area under the graph) were computed and compared. Analysis suggests that the variation of average difficulty among the three domains is much smaller for the concept of graph slope than for the concept of area under the graph. Most of the slope items are very close in difficulty, suggesting that students who have developed sufficient understanding of graph slope in mathematics are generally able to transfer it almost equally successfully to other contexts. A large difference was found between the difficulty of the concept of area under the graph in physics and other contexts on one side and mathematics on the other side. Comparison of average difficulty of the three domains suggests that mathematics without context is the easiest domain for students. Adding either physics or other context to mathematical items generally seems to increase item difficulty. No significant difference was found between the average item difficulty in physics and contexts other than physics, suggesting that physics (kinematics) remains a difficult context for most students despite the received instruction on kinematics in high school.
An Investigation of the Impact of Guessing on Coefficient α and Reliability

PubMed Central

2014-01-01

Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
Clinical vs. Self-report Versions of the Quick Inventory of Depressive Symptomatology in a Public Sector Sample

PubMed Central

Bernstein, Ira H.; Rush, A. John; Carmody, Thomas J.; Woo, Ada; Trivedi, Madhukar H.

2007-01-01

Objectives Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR16) and clinician-rated (QIDS-C16) versions of the 16-item Quick Inventory of Depressive Symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. Methods The QIDS-SR16 and QIDS-C16 were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR16 and QIDS-C16. Results The nine symptom domains in the QIDS-SR16 and QIDS-C16 related well to overall depression. The slopes of the item response functions a), which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, bi) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C16 and QIDS-SR16. Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. Conclusion In this less educated, socially disadvantaged sample, differences between the QIDS-C16 and QIDS-SR16 were minor. The QIDS-SR16 is a satisfactory substitute for the more time-consuming QIDS-C16 in a broad range of adult, nonpsychotic, depressed outpatients. PMID:16716351
Clinical vs. self-report versions of the quick inventory of depressive symptomatology in a public sector sample.

PubMed

Bernstein, Ira H; Rush, A John; Carmody, Thomas J; Woo, Ada; Trivedi, Madhukar H

2007-01-01

Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR(16)) and clinician-rated (QIDS-C(16)) versions of the 16-item quick inventory of depressive symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. The QIDS-SR(16) and QIDS-C(16) were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR(16) and QIDS-C(16). The nine symptom domains in the QIDS-SR(16) and QIDS-C(16) related well to overall depression. The slopes of the item response functions, a, which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, b(i) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C(16) and QIDS-SR(16). Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. In this less educated, socially disadvantaged sample, differences between the QIDS-C(16) and QIDS-SR(16) were minor. The QIDS-SR(16) is a satisfactory substitute for the more time-consuming QIDS-C(16) in a broad range of adult, nonpsychotic, depressed outpatients.
Maternal Antenatal Attachment Scale (MAAS): adaptation to Spanish and proposal for a brief version of 12 items.

PubMed

Navarro-Aresti, Lucía; Iraurgi, Ioseba; Iriarte, Leire; Martínez-Pampliega, Ana

2016-02-01

The psychometric properties of the adapted Spanish version of the Maternal Antenatal Attachment Scale were examined. The main goal was to investigate the reliability and construct validity of the conceptual structure of Condon's proposal. Five hundred twenty-five pregnant women, attending maternal education classes in Bizkaia (Spain), answered the translated and back-translated version of the Maternal Antenatal Attachment Scale. This scale comprises 19 items with five answer choices divided into two subscales: quality of attachment and intensity of attachment. Participants also answered a questionnaire about the reproductive history that was developed ad hoc for the present study. The Spanish adaptation of the Maternal Antenatal Attachment Scale final version comprises 12 items: seven items have been removed due to their inadequate psychometric properties. Internal consistency of the inventory is moderate-high (.73) and it ranges from .68 (intensity of attachment) to .75 (quality of attachment) for the dimensions. Three alternative structural models were proven using a confirmatory factor analysis. Lastly, the two-related-factor model was chosen, as it obtained suitable fit indexes (χ (2) = 102.28; p < .001; goodness-of-fit index (GFI) = .92; comparative fit index (CFI) = .95; root mean square error of approximation (RMSEA) = .042, 90 % CI [.030-.054]). Due to its adequate psychometric properties, the Spanish version of the Maternal Antenatal Attachment Scale can be proposed as a suitable instrument for the purpose of measuring antenatal attachment. The study of antenatal attachment helps to detect possible difficulties for the mother in establishing an affective relationship with the foetus. This may affect the foetus growth, delivery and the future mother-child relationship.
A Study of Inference in Standardized Reading Test Items and Its Relationship to Difficulty.

ERIC Educational Resources Information Center

Marzano, Robert J.

To study the relationship between inferences made on standardized reading tests and item difficulty, 50 items on the reading comprehension section of the Metropolitan Achievement Test were analyzed independently in this study by two raters using four general categories of inferences: (1) reference inferences, (2) between proposition inferences,…
The Definition of Difficulty and Discrimination for Multidimensional Item Response Theory Models.

ERIC Educational Resources Information Center

Reckase, Mark D.; McKinley, Robert L.

A study was undertaken to develop guidelines for the interpretation of the parameters of three multidimensional item response theory models and to determine the relationship between the parameters and traditional concepts of item difficulty and discrimination. The three models considered were multidimensional extensions of the one-, two-, and…
An opportunity in difficulty: Japan-Korea-Taiwan expert Delphi consensus on surgical difficulty during laparoscopic cholecystectomy.

PubMed

Iwashita, Yukio; Hibi, Taizo; Ohyama, Tetsuji; Honda, Goro; Yoshida, Masahiro; Miura, Fumihiko; Takada, Tadahiro; Han, Ho-Seong; Hwang, Tsann-Long; Shinya, Satoshi; Suzuki, Kenji; Umezawa, Akiko; Yoon, Yoo-Seok; Choi, In-Seok; Huang, Wayne Shih-Wei; Chen, Kuo-Hsin; Watanabe, Manabu; Abe, Yuta; Misawa, Takeyuki; Nagakawa, Yuichi; Yoon, Dong-Sup; Jang, Jin-Young; Yu, Hee Chul; Ahn, Keun Soo; Kim, Song Cheol; Song, In Sang; Kim, Ji Hoon; Yun, Sung Su; Choi, Seong Ho; Jan, Yi-Yin; Shan, Yan-Shen; Ker, Chen-Guo; Chan, De-Chuan; Wu, Cheng-Chung; Lee, King-Teh; Toyota, Naoyuki; Higuchi, Ryota; Nakamura, Yoshiharu; Mizuguchi, Yoshiaki; Takeda, Yutaka; Ito, Masahiro; Norimizu, Shinji; Yamada, Shigetoshi; Matsumura, Naoki; Shindoh, Junichi; Sunagawa, Hiroki; Gocho, Takeshi; Hasegawa, Hiroshi; Rikiyama, Toshiki; Sata, Naohiro; Kano, Nobuyasu; Kitano, Seigo; Tokumura, Hiromi; Yamashita, Yuichi; Watanabe, Goro; Nakagawa, Kunitoshi; Kimura, Taizo; Yamakawa, Tatsuo; Wakabayashi, Go; Mori, Rintaro; Endo, Itaru; Miyazaki, Masaru; Yamamoto, Masakazu

2017-04-01

We previously identified 25 intraoperative findings during laparoscopic cholecystectomy (LC) as potential indicators of surgical difficulty per nominal group technique. This study aimed to build a consensus among expert LC surgeons on the impact of each item on surgical difficulty. Surgeons from Japan, Korea, and Taiwan (n = 554) participated in a Delphi process and graded the 25 items on a seven-stage scale (range, 0-6). Consensus was defined as (1) the interquartile range (IQR) of overall responses ≤2 and (2) ≥66% of the responses concentrated within a median ± 1 after stratification by workplace and LC experience level. Response rates for the first and the second-round Delphi were 92.6% and 90.3%, respectively. Final consensus was reached for all the 25 items. 'Diffuse scarring in the Calot's triangle area' in the 'Factors related to inflammation of the gallbladder' category had the strongest impact on surgical difficulty (median, 5; IQR, 1). Surgeons agreed that the surgical difficulty increases as more fibrotic change and scarring develop. The median point for each item was set as the difficulty score. A Delphi consensus was reached among expert LC surgeons on the impact of intraoperative findings on surgical difficulty. © 2017 Japanese Society of Hepato-Biliary-Pancreatic Surgery.
Item analysis of the Spanish version of the Boston Naming Test with a Spanish speaking adult population from Colombia.

PubMed

Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos

2018-02-23

The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.
The Standardization of the Clock Drawing Test (CDT) for People with Stroke Using Rasch Analysis

PubMed Central

Yoo, Doo Han; Hong, Deok Gi; Lee, Jae Shin

2014-01-01

[Purpose] The aim of this study was to standardize the clock drawing test (CDT) for people with stroke using Rasch analysis. [Subjects and Methods] Seventeen items of the CDT identified through a literature review were performed by 159 stroke patients. The data was analyzed with Winstep version 3.57 using the Rasch model to examine the unidimensionality of the items’ fit, the distribution of the items’ difficulty, and the reliability and appropriateness of the rating scale. [Result] Ten out of the 159 participations (6.2%) were considered misfit subjects, and one item of the CDT was determined to be a misfit item based on Rasch analysis. The rating scales were judged as suitable because the observed average showed an array of vertical orders and MNSQ values < 2. The separate index and reliability of the subject (1.98, 0.80) and item (6.45, 0.97) showed relatively high values. [Conclusion] This study is the first to examine the CDT scale in stroke patients by Rasch analysis. The CDT is expected to be useful for screening stroke patients with cognitive problems. PMID:24409026
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.

ERIC Educational Resources Information Center

Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne

Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…
On Maximizing Item Information and Matching Difficulty with Ability.

ERIC Educational Resources Information Center

Bickel, Peter; Buyske, Steven; Chang, Huahua; Ying, Zhiliang

2001-01-01

Examined the assumption that matching difficulty levels of test items with an examinee's ability makes a test more efficient and challenged this assumption through a class of one-parameter item response theory models. Found the validity of the fundamental assumption to be closely related to the van Zwet tail ordering of symmetric distributions (W.…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam

2014-01-01

The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…

International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

ERIC Educational Resources Information Center

Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey

2016-01-01

We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…
An Investigation of Gender Differences in the Components Influencing the Difficulty of Spatial Ability Items.

ERIC Educational Resources Information Center

Kramer, Gene A.; Smith, Richard M.

2001-01-01

Examined the role that gender differences play in the determination of the components influencing the difficulty of spatial ability items. Results for 2,245 examinees taking a spatial ability test that is part of the Dental School Admission Battery show that component difficulties show little variation across gender. (SLD)
Development of multiple choice pictorial test for measuring the dimensions of knowledge

NASA Astrophysics Data System (ADS)

Nahadi, Siswaningsih, Wiwi; Erna

2017-05-01

This study aims to develop a multiple choice pictorial test as a tool to measure dimension of knowledge in chemical equilibrium subject. The method used is Research and Development and validation that was conducted in the preliminary studies and model development. The product is multiple choice pictorial test. The test was developed by 22 items and tested to 64 high school students in XII grade. The quality of test was determined by value of validity, reliability, difficulty index, discrimination power, and distractor effectiveness. The validity of test was determined by CVR calculation using 8 validators (4 university teachers and 4 high school teachers) with average CVR value 0,89. The reliability of test has very high category with value 0,87. Discrimination power of items with a very good category is 32%, 59% as good category, and 20% as sufficient category. This test has a varying level of difficulty, item with difficult category is 23%, the medium category is 50%, and the easy category is 27%. The distractor effectiveness of items with a very poor category is 1%, poor category is 1%, medium category is 4%, good category is 39%, and very good category is 55%. The dimension of knowledge that was measured consist of factual knowledge, conceptual knowledge, and procedural knowledge. Based on the questionnaire, students responded quite well to the developed test and most of the students like this kind of multiple choice pictorial test that include picture as evaluation tool compared to the naration tests was dominated by text.
Item validity vs. item discrimination index: a redundancy?

NASA Astrophysics Data System (ADS)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
Psychometric properties of the Global Operative Assessment of Laparoscopic Skills (GOALS) using item response theory.

PubMed

Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C

2017-02-01

The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.
Psychometric evaluation of a new instrument in Spanish to measure the wellness of university nursing faculty.

PubMed

Hurtado-Pardos, Barbara; Casas, Irma; Lluch-Canut, Teresa; Moreno-Arroyo, Carmen; Nebot-Bergua, Carlos; Roldán-Merino, Juan

2018-01-02

The aim of this study was to design and validate an instrument to measure the wellness among university nursing faculty. The study was performed in two phases. Phase I consisted of the development of the instrument with discussion groups and participant consensus. We designed an instrument including the 21 items or psychosocial risk factors identified and estimated an index by evaluating the frequency and intensity of each item. The items were grouped into 3 dimensions: teaching work demands, curricular demands, and organizational difficulties. Phase II, we evaluated the psychometric properties of the tool in a sample of 263 participants. Exploratory factor analysis showed a 3-factor structure that explained 53% of the total variance. The internal consistency of the instrument was 0.91 for the whole instrument. The results indicate that the tool developed is valid and reliable and may be a good instrument to monitor the wellness of university nursing faculty.
Why Are the Mathematics National Examination Items Difficult and What Is Teachers' Strategy to Overcome It?

ERIC Educational Resources Information Center

Retnawati, Heri; Kartowagiran, Badrun; Arlinwibowo, Janu; Sulistyaningsih, Eny

2017-01-01

The quality of national examination items plays an enormous role in identifying students' competencies mastery and their difficulties. This study aims to identify the difficult items in the Junior High School Mathematics National Examination, to find the factors that cause students' difficulty and to reveal the strategies that the teachers and the…
Faster on Easy Items, More Accurate on Difficult Ones: Cognitive Ability and Performance on a Task of Varying Difficulty

ERIC Educational Resources Information Center

Dodonova, Yulia A.; Dodonov, Yury S.

2013-01-01

Using more complex items than those commonly employed within the information-processing approach, but still easier than those used in intelligence tests, this study analyzed how the association between processing speed and accuracy level changes as the difficulty of the items increases. The study involved measuring cognitive ability using Raven's…
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.

ERIC Educational Resources Information Center

Perkins, Kyle; And Others

This paper reports the results of using a three-layer backpropagation artificial neural network to predict item difficulty in a reading comprehension test. Two network structures were developed, one with and one without a sigmoid function in the output processing unit. The data set, which consisted of a table of coded test items and corresponding…
Some factors underlying individual differences in speech recognition on PRESTO: a first report.

PubMed

Tamati, Terrin N; Gilbert, Jaimie L; Pisoni, David B

2013-01-01

Previous studies investigating speech recognition in adverse listening conditions have found extensive variability among individual listeners. However, little is currently known about the core underlying factors that influence speech recognition abilities. To investigate sensory, perceptual, and neurocognitive differences between good and poor listeners on the Perceptually Robust English Sentence Test Open-set (PRESTO), a new high-variability sentence recognition test under adverse listening conditions. Participants who fell in the upper quartile (HiPRESTO listeners) or lower quartile (LoPRESTO listeners) on key word recognition on sentences from PRESTO in multitalker babble completed a battery of behavioral tasks and self-report questionnaires designed to investigate real-world hearing difficulties, indexical processing skills, and neurocognitive abilities. Young, normal-hearing adults (N = 40) from the Indiana University community participated in the current study. Participants' assessment of their own real-world hearing difficulties was measured with a self-report questionnaire on situational hearing and hearing health history. Indexical processing skills were assessed using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Neurocognitive abilities were measured with the Auditory Digit Span Forward (verbal short-term memory) and Digit Span Backward (verbal working memory) tests, the Stroop Color and Word Test (attention/inhibition), the WordFam word familiarity test (vocabulary size), the Behavioral Rating Inventory of Executive Function-Adult Version (BRIEF-A) self-report questionnaire on executive function, and two performance subtests of the Wechsler Abbreviated Scale of Intelligence (WASI) Performance Intelligence Quotient (IQ; nonverbal intelligence). Scores on self-report questionnaires and behavioral tasks were tallied and analyzed by listener group (HiPRESTO and LoPRESTO). The extreme groups did not differ overall on self-reported hearing difficulties in real-world listening environments. However, an item-by-item analysis of questions revealed that LoPRESTO listeners reported significantly greater difficulty understanding speakers in a public place. HiPRESTO listeners were significantly more accurate than LoPRESTO listeners at gender discrimination and regional dialect categorization, but they did not differ on talker discrimination accuracy or response time, or gender discrimination response time. HiPRESTO listeners also had longer forward and backward digit spans, higher word familiarity ratings on the WordFam test, and lower (better) scores for three individual items on the BRIEF-A questionnaire related to cognitive load. The two groups did not differ on the Stroop Color and Word Test or either of the WASI performance IQ subtests. HiPRESTO listeners and LoPRESTO listeners differed in indexical processing abilities, short-term and working memory capacity, vocabulary size, and some domains of executive functioning. These findings suggest that individual differences in the ability to encode and maintain highly detailed episodic information in speech may underlie the variability observed in speech recognition performance in adverse listening conditions using high-variability PRESTO sentences in multitalker babble. American Academy of Audiology.
Some Factors Underlying Individual Differences in Speech Recognition on PRESTO: A First Report

PubMed Central

Tamati, Terrin N.; Gilbert, Jaimie L.; Pisoni, David B.

2013-01-01

Background Previous studies investigating speech recognition in adverse listening conditions have found extensive variability among individual listeners. However, little is currently known about the core, underlying factors that influence speech recognition abilities. Purpose To investigate sensory, perceptual, and neurocognitive differences between good and poor listeners on PRESTO, a new high-variability sentence recognition test under adverse listening conditions. Research Design Participants who fell in the upper quartile (HiPRESTO listeners) or lower quartile (LoPRESTO listeners) on key word recognition on sentences from PRESTO in multitalker babble completed a battery of behavioral tasks and self-report questionnaires designed to investigate real-world hearing difficulties, indexical processing skills, and neurocognitive abilities. Study Sample Young, normal-hearing adults (N = 40) from the Indiana University community participated in the current study. Data Collection and Analysis Participants’ assessment of their own real-world hearing difficulties was measured with a self-report questionnaire on situational hearing and hearing health history. Indexical processing skills were assessed using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Neurocognitive abilities were measured with the Auditory Digit Span Forward (verbal short-term memory) and Digit Span Backward (verbal working memory) tests, the Stroop Color and Word Test (attention/inhibition), the WordFam word familiarity test (vocabulary size), the BRIEF-A self-report questionnaire on executive function, and two performance subtests of the WASI Performance IQ (non-verbal intelligence). Scores on self-report questionnaires and behavioral tasks were tallied and analyzed by listener group (HiPRESTO and LoPRESTO). Results The extreme groups did not differ overall on self-reported hearing difficulties in real-world listening environments. However, an item-by-item analysis of questions revealed that LoPRESTO listeners reported significantly greater difficulty understanding speakers in a public place. HiPRESTO listeners were significantly more accurate than LoPRESTO listeners at gender discrimination and regional dialect categorization, but they did not differ on talker discrimination accuracy or response time, or gender discrimination response time. HiPRESTO listeners also had longer forward and backward digit spans, higher word familiarity ratings on the WordFam test, and lower (better) scores for three individual items on the BRIEF-A questionnaire related to cognitive load. The two groups did not differ on the Stroop Color and Word Test or either of the WASI performance IQ subtests. Conclusions HiPRESTO listeners and LoPRESTO listeners differed in indexical processing abilities, short-term and working memory capacity, vocabulary size, and some domains of executive functioning. These findings suggest that individual differences in the ability to encode and maintain highly detailed episodic information in speech may underlie the variability observed in speech recognition performance in adverse listening conditions using high-variability PRESTO sentences in multitalker babble. PMID:24047949
An Alternate Definition of the ETS Delta Scale of Item Difficulty. Program Statistics Research.

ERIC Educational Resources Information Center

Holland, Paul W.; Thayer, Dorothy T.

An alternative definition has been developed of the delta scale of item difficulty used at Educational Testing Service. The traditional delta scale uses an inverse normal transformation based on normal ogive models developed years ago. However, no use is made of this fact in typical uses of item deltas. It is simply one way to make the probability…
The Genetics Concept Assessment: a new concept inventory for gauging student understanding of genetics.

PubMed

Smith, Michelle K; Wood, William B; Knight, Jennifer K

2008-01-01

We have designed, developed, and validated a 25-question Genetics Concept Assessment (GCA) to test achievement of nine broad learning goals in majors and nonmajors undergraduate genetics courses. Written in everyday language with minimal jargon, the GCA is intended for use as a pre- and posttest to measure student learning gains. The assessment was reviewed by genetics experts, validated by student interviews, and taken by >600 students at three institutions. Normalized learning gains on the GCA were positively correlated with averaged exam scores, suggesting that the GCA measures understanding of topics relevant to instructors. Statistical analysis of our results shows that differences in the item difficulty and item discrimination index values between different questions on pre- and posttests can be used to distinguish between concepts that are well or poorly learned during a course.
The Genetics Concept Assessment: A New Concept Inventory for Gauging Student Understanding of Genetics

PubMed Central

Wood, William B.; Knight, Jennifer K.

2008-01-01

We have designed, developed, and validated a 25-question Genetics Concept Assessment (GCA) to test achievement of nine broad learning goals in majors and nonmajors undergraduate genetics courses. Written in everyday language with minimal jargon, the GCA is intended for use as a pre- and posttest to measure student learning gains. The assessment was reviewed by genetics experts, validated by student interviews, and taken by >600 students at three institutions. Normalized learning gains on the GCA were positively correlated with averaged exam scores, suggesting that the GCA measures understanding of topics relevant to instructors. Statistical analysis of our results shows that differences in the item difficulty and item discrimination index values between different questions on pre- and posttests can be used to distinguish between concepts that are well or poorly learned during a course. PMID:19047428
The role of difficulty and gender in numbers, algebra, geometry and mathematics achievement

NASA Astrophysics Data System (ADS)

Rabab'h, Belal Sadiq Hamed; Veloo, Arsaythamby; Perumal, Selvan

2015-05-01

This study aims to identify the role of difficulty and gender in numbers, algebra, geometry and mathematics achievement among secondary schools students in Jordan. The respondent of the study were 337 students from eight public secondary school in Alkoura district by using stratified random sampling. The study comprised of 179 (53%) males and 158 (47%) females students. The mathematics test comprises of 30 items which has eight items for numbers, 14 items for algebra and eight items for geometry. Based on difficulties among male and female students, the findings showed that item 4 (fractions - 0.34) was most difficult for male students and item 6 (square roots - 0.39) for females in numbers. For the algebra, item 11 (inequality - 0.23) was most difficult for male students and item 6 (algebraic expressions - 0.35) for female students. In geometry, item 3 (reflection - 0.34) was most difficult for male students and item 8 (volume - 0.33) for female students. Based on gender differences, female students showed higher achievement in numbers and algebra compare to male students. On the other hand, there was no differences between male and female students achievement in geometry test. This study suggest that teachers need to give more attention on numbers and algebra when teaching mathematics.
Vocal Problems in Sports and Fitness Instructors: A Study of Prevalence, Risk Factors, and Need for Prevention in France.

PubMed

Fontan, Lionel; Fraval, Marie; Michon, Anne; Déjean, Sébastien; Welby-Gieusse, Muriel

2017-03-01

Sports and fitness instructors (SFIs) are known for being a high-risk population for voice difficulties (VD). However, past studies have encountered various methodological difficulties in determining prevalence and risk factors for VD in SFIs, such as limited population, gender and selection biases, or poor statistical power, because VD were studied as a binary variable. The present research work addresses these issues and aims at studying the prevalence of vocal problems and risk factors in French SFIs, a population in which no such study was conducted yet. Another objective is to survey the French SFIs' habits and expectations regarding vocal prevention and care. This is a cross-sectional study. Three hundred and twenty SFIs answered a questionnaire, whether in an online (n = 267) or a paper (n = 53) version. The questionnaire consisted of 31 items addressing self-reported vocal difficulties, supposed risk factors, and personal health-care history, followed by the Voice Handicap Index assessment. Prevalence of self-reported vocal difficulties is 55%. The Voice Handicap Index is significantly associated with gender, age, and variables related to work environment (noise and music) and habits (shouting, frequency of classes), as well as with daily sleeping time. Results also indicate that a minority of the SFIs (37%) received information on vocal difficulties, whereas a majority (80%) declares being interested in participating in prevention programs. This work confirms that SFIs are a high-risk population for VD, underlines the need for specific information programs in France, and provides relevant data for driving such preventive actions. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Stereotype threat in classroom settings: the interactive effect of domain identification, task difficulty and stereotype threat on female students' maths performance.

PubMed

Keller, Johannes

2007-06-01

Stereotype threat research revealed that negative stereotypes can disrupt the performance of persons targeted by such stereotypes. This paper contributes to stereotype threat research by providing evidence that domain identification and the difficulty level of test items moderate stereotype threat effects on female students' maths performance. The study was designed to test theoretical ideas derived from stereotype threat theory and assumptions outlined in the Yerkes-Dodson law proposing a nonlinear relationship between arousal, task difficulty and performance. Participants were 108 high school students attending secondary schools. Participants worked on a test comprising maths problems of different difficulty levels. Half of the participants learned that the test had been shown to produce gender differences (stereotype threat). The other half learned that the test had been shown not to produce gender differences (no threat). The degree to which participants identify with the domain of maths was included as a quasi-experimental factor. Maths-identified female students showed performance decrements under conditions of stereotype threat. Moreover, the stereotype threat manipulation had different effects on low and high domain identifiers' performance depending on test item difficulty. On difficult items, low identifiers showed higher performance under threat (vs. no threat) whereas the reverse was true in high identifiers. This interaction effect did not emerge on easy items. Domain identification and test item difficulty are two important factors that need to be considered in the attempt to understand the impact of stereotype threat on performance.
Psychometrics of the Fitness-to-Drive Screening Measure.

PubMed

Classen, Sherrilene; Velozo, Craig A; Winter, Sandra M; Bédard, Michel; Wang, Yanning

2015-01-01

We employed item response theory (IRT), specifically using Rasch modeling, to determine the measurement precision of the Fitness-to-Drive Screening Measure (FTDS), a tool that can be used by caregivers and occupational therapists to help detect at-risk drivers. We examined unidimensionality through the factor structure (how items contribute to the central construct of fitness to drive), rating scale (use of the categories of the rating scale), item/person-level separation (distinguishing between items with different difficulty levels or persons with different ability levels) and reliability, item hierarchy (easier driving items advancing to more difficult driving items), rater reliability, rater effects (severity vs. leniency of a rater), and criterion validity of the FTDS to an on-road assessment, via three rater groups (n = 200 older drivers; n = 200 caregivers; n = 2 evaluators). The FTDS is unidimensional, the rating scale performed well, has good person (> 3.07) and item (> 5.43) separation, good person (> 0.90) and item reliability (> 0.97), with < 10% misfitting items for two rater groups (caregivers and drivers). The intraclass correlation (ICC) coefficient among the three rater groups was significant (.253, p < .001) and the evaluators were the most severe raters. When comparing the caregivers' FTDS rating with the drivers' on-road assessment, the areas under the curve (index of discriminability; caregivers .726, p < .001) suggested concurrent validity between the FTDS and the on-road assessment. Despite limitations, the FTDS is a reliable and accurate screening measure for caregivers to help identify at-risk older drivers and for occupational therapy practitioners to start conversations about driving.
The Arabic Version of The Depression Anxiety Stress Scale-21: Cumulative scaling and discriminant-validation testing.

PubMed

Ali, Amira Mohammed; Ahmed, Anwar; Sharaf, Amira; Kawakami, Norito; Abdeldayem, Samia M; Green, Joseph

2017-12-01

This study aimed to examine the validity of the Arabic version of the Depression Anxiety Stress Scale-21 (DASS-21) in 149 illicit drug users. We calculated α coefficient, inter-item and item-total correlations, coefficients of reproducibility and scalability (CR and CS), item difficulty and discrimination indices. The DASS-21 had an acceptable reliability; but values of the CR and the CS were less than acceptable. Items varied in difficulty and discrimination; some items are candidates for elimination. The DASS-21 is a probabilistic and not a deterministic measure of distress; it has problematic items and needs further investigations. Copyright © 2017 Elsevier B.V. All rights reserved.
Incorporation of core competency questions into an annual national self-assessment examination for residents in physical medicine and rehabilitation: results and implications.

PubMed

Webster, Joseph B

2009-03-01

To determine the performance and change over time when incorporating questions in the core competency domains of practice-based learning and improvement (PBLI), systems-based practice (SBP), and professionalism (PROF) into the national PM&R Self-Assessment Examination for Residents (SAER). Prospective, longitudinal analysis. The national Self-Assessment Examination for Residents (SAER) in Physical Medicine and Rehabilitation, which is administered annually. Approximately 1100 PM&R residents who take the examination annually. Inclusion of progressively more challenging questions in the core competency domains of PBLI, SBP, and PROF. Individual test item level of difficulty (P value) and discrimination (point biserial index). Compared with the overall test, questions in the subtopic areas of PBLI, SBP, and PROF were relatively easier and less discriminating (correlation of resident performance on these domains compared with that on the total test). These differences became smaller during the 3-year time period. The difficulty level of the questions in each of the subtopic domains was raised during the 3 year period to a level close to the overall exam. Discrimination of the test items improved or remained stable. This study demonstrates that, with careful item writing and review, multiple-choice items in the PBLI, SBP, and PROF domains can be successfully incorporated into an annual, national self-assessment examination for residents. The addition of these questions had value in assessing competency while not compromising the overall validity and reliability of the exam. It is yet to be determined if resident performance on these questions corresponds to performance on other measures of competency in the areas of PBLI, SBP, and PROF.

What Aspect of Dependence Does the Fagerström Test for Nicotine Dependence Measure?

PubMed Central

DiFranza, Joseph R.; Wellman, Robert J.; Savageau, Judith A.; Beccia, Ariel; Ursprung, W. W. Sanouri A.; McMillen, Robert

2013-01-01

Although the Fagerström Test for Nicotine Dependence (FTND) and the Heaviness of Smoking Index (HSI) are widely used, there is a uncertainty regarding what is measured by these scales. We examined associations between these instruments and items assessing different aspects of dependence. Adult current smokers (n = 422, mean age 33.3 years, 61.9% female) completed a web-based survey comprised of items related to demographics and smoking behavior plus (1) the FTND and HSI; (2) the Autonomy over Tobacco Scale (AUTOS) with subscales measuring Withdrawal, Psychological Dependence, and Cue-Induced Cravings; (3) 6 questions tapping smokers' wanting, craving, or needing experiences in response to withdrawal and the latency to each experience during abstinence; (4) 3 items concerning how smokers prepare to cope with periods of abstinence. In regression analyses the Withdrawal subscale of the AUTOS was the strongest predictor of FTND and HSI scores, followed by taking precautions not to run out of cigarettes or smoking extra to prepare for abstinence. The FTND and its six items, including the HSI, consistently showed the strongest correlations with withdrawal, suggesting that the behaviors described by the items of the FTND are primarily indicative of a difficulty maintaining abstinence because of withdrawal symptoms. PMID:25969829
What aspect of dependence does the fagerström test for nicotine dependence measure?

PubMed

DiFranza, Joseph R; Wellman, Robert J; Savageau, Judith A; Beccia, Ariel; Ursprung, W W Sanouri A; McMillen, Robert

2013-01-01

Although the Fagerström Test for Nicotine Dependence (FTND) and the Heaviness of Smoking Index (HSI) are widely used, there is a uncertainty regarding what is measured by these scales. We examined associations between these instruments and items assessing different aspects of dependence. Adult current smokers (n = 422, mean age 33.3 years, 61.9% female) completed a web-based survey comprised of items related to demographics and smoking behavior plus (1) the FTND and HSI; (2) the Autonomy over Tobacco Scale (AUTOS) with subscales measuring Withdrawal, Psychological Dependence, and Cue-Induced Cravings; (3) 6 questions tapping smokers' wanting, craving, or needing experiences in response to withdrawal and the latency to each experience during abstinence; (4) 3 items concerning how smokers prepare to cope with periods of abstinence. In regression analyses the Withdrawal subscale of the AUTOS was the strongest predictor of FTND and HSI scores, followed by taking precautions not to run out of cigarettes or smoking extra to prepare for abstinence. The FTND and its six items, including the HSI, consistently showed the strongest correlations with withdrawal, suggesting that the behaviors described by the items of the FTND are primarily indicative of a difficulty maintaining abstinence because of withdrawal symptoms.
Do item-writing flaws reduce examinations psychometric quality?

PubMed

Pais, João; Silva, Artur; Guimarães, Bruno; Povo, Ana; Coelho, Elisabete; Silva-Pereira, Fernanda; Lourinho, Isabel; Ferreira, Maria Amélia; Severo, Milton

2016-08-11

The psychometric characteristics of multiple-choice questions (MCQ) changed when taking into account their anatomical sites and the presence of item-writing flaws (IWF). The aim is to understand the impact of the anatomical sites and the presence of IWF in the psychometric qualities of the MCQ. 800 Clinical Anatomy MCQ from eight examinations were classified as standard or flawed items and according to one of the eight anatomical sites. An item was classified as flawed if it violated at least one of the principles of item writing. The difficulty and discrimination indices of each item were obtained. 55.8 % of the MCQ were flawed items. The anatomical site of the items explained 6.2 and 3.2 % of the difficulty and discrimination parameters and the IWF explained 2.8 and 0.8 %, respectively. The impact of the IWF was heterogeneous, the Writing the Stem and Writing the Choices categories had a negative impact (higher difficulty and lower discrimination) while the other categories did not have any impact. The anatomical site effect was higher than IWF effect in the psychometric characteristics of the examination. When constructing MCQ, the focus should be in the topic/area of the items and only after in the presence of IWF.
Selecting Items for Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
YOCAS©® Yoga Reduces Self-reported Memory Difficulty in Cancer Survivors in a Nationwide Randomized Clinical Trial: Investigating Relationships Between Memory and Sleep.

PubMed

Janelsins, Michelle C; Peppone, Luke J; Heckler, Charles E; Kesler, Shelli R; Sprod, Lisa K; Atkins, James; Melnik, Marianne; Kamen, Charles; Giguere, Jeffrey; Messino, Michael J; Mohile, Supriya G; Mustian, Karen M

2016-09-01

Background Interventions are needed to alleviate memory difficulty in cancer survivors. We previously showed in a phase III randomized clinical trial that YOCAS©® yoga-a program that consists of breathing exercises, postures, and meditation-significantly improved sleep quality in cancer survivors. This study assessed the effects of YOCAS©® on memory and identified relationships between memory and sleep. Survivors were randomized to standard care (SC) or SC with YOCAS©® . 328 participants who provided data on the memory difficulty item of the MD Anderson Symptom Inventory are included. Sleep quality was measured using the Pittsburgh Sleep Quality Index. General linear modeling (GLM) determined the group effect of YOCAS©® on memory difficulty compared with SC. GLM also determined moderation of baseline memory difficulty on postintervention sleep and vice versa. Path modeling assessed the mediating effects of changes in memory difficulty on YOCAS©® changes in sleep and vice versa. YOCAS©® significantly reduced memory difficulty at postintervention compared with SC (mean change: yoga=-0.60; SC=-0.16; P<.05). Baseline memory difficulty did not moderate the effects of postintervention sleep quality in YOCAS©® compared with SC. Baseline sleep quality did moderate the effects of postintervention memory difficulty in YOCAS©® compared with SC (P<.05). Changes in sleep quality was a significant mediator of reduced memory difficulty in YOCAS©® compared with SC (P<.05); however, changes in memory difficulty did not significantly mediate improved sleep quality in YOCAS©® compared with SC. In this large nationwide trial, YOCAS©® yoga significantly reduced patient-reported memory difficulty in cancer survivors. © The Author(s) 2015.
76 FR 31991 - All Items Consumer Price Index for All Urban Consumers; United States City Average

Federal Register 2010, 2011, 2012, 2013, 2014

2011-06-02

... DEPARTMENT OF LABOR Office of the Secretary All Items Consumer Price Index for All Urban Consumers... United States City Average All Items Consumer Price Index for All Urban Consumers (1967=100) increased... 1974 as a base (1974=100), I certify that the United States City Average All Items Consumer Price Index...
78 FR 35054 - All Items Consumer Price Index for All Urban Consumers United States City Average

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-11

... DEPARTMENT OF LABOR Office of the Secretary All Items Consumer Price Index for All Urban Consumers... United States City Average All Items Consumer Price Index for All Urban Consumers (1967=100) increased... 1974 as a base (1974=100), I certify that the United States City Average All Items Consumer Price Index...
75 FR 22164 - All Items Consumer Price Index for All Urban Consumers United States City Average

Federal Register 2010, 2011, 2012, 2013, 2014

2010-04-27

... DEPARTMENT OF LABOR Office of the Secretary All Items Consumer Price Index for All Urban Consumers... United States City Average All Items Consumer Price Index for All Urban Consumers (1967=100) increased... 1974 as a base (1974=100), I certify that the United States City Average All Items Consumer Price Index...
Item Difficulty Modeling of Paragraph Comprehension Items

ERIC Educational Resources Information Center

Gorin, Joanna S.; Embretson, Susan E.

2006-01-01

Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…
Measurement Equivalence in ADL and IADL Difficulty Across International Surveys of Aging: Findings From the HRS, SHARE, and ELSA

PubMed Central

Kasper, Judith D.; Brandt, Jason; Pezzin, Liliana E.

2012-01-01

Objective. To examine the measurement equivalence of items on disability across three international surveys of aging. Method. Data for persons aged 65 and older were drawn from the Health and Retirement Survey (HRS, n = 10,905), English Longitudinal Study of Aging (ELSA, n = 5,437), and Survey of Health, Ageing and Retirement in Europe (SHARE, n = 13,408). Differential item functioning (DIF) was assessed using item response theory (IRT) methods for activities of daily living (ADL) and instrumental activities of daily living (IADL) items. Results. HRS and SHARE exhibited measurement equivalence, but 6 of 11 items in ELSA demonstrated meaningful DIF. At the scale level, this item-level DIF affected scores reflecting greater disability. IRT methods also spread out score distributions and shifted scores higher (toward greater disability). Results for mean disability differences by demographic characteristics, using original and DIF-adjusted scores, were the same overall but differed for some subgroup comparisons involving ELSA. Discussion. Testing and adjusting for DIF is one means of minimizing measurement error in cross-national survey comparisons. IRT methods were used to evaluate potential measurement bias in disability comparisons across three international surveys of aging. The analysis also suggested DIF was mitigated for scales including both ADL and IADL and that summary indexes (counts of limitations) likely underestimate mean disability in these international populations. PMID:22156662
Item difficulty and item validity for the Children's Group Embedded Figures Test.

PubMed

Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S

1994-02-01

The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
North American Veterinary Licensing Examination pacing study.

PubMed

Subhiyah, Raja G; Boyce, John R

2010-01-01

The National Board of Veterinary Medical Examiners was interested in the possible effects of word count on the outcomes of the North American Veterinary Licensing Examination. In this study, the authors investigated the effects of increasing word count on the pacing of examinees during each section of the examination and on the performance of examinees on the items. Specifically, the authors analyzed the effect of item word count on the average time spent on each item within a section of the examination, the average number of items omitted at the end of a section, and the average difficulty of items as a function of presentation order. The average word count per item increased from 2001 to 2008. As expected, there was a relationship between word count and time spent on the item. No significant relationship was found between word count and item difficulty, and an analysis of omitted items and pacing patterns showed no indication of overall pacing problems.
A Comparison between Discrimination Indices and Item-Response Theory Using the Rasch Model in a Clinical Course Written Examination of a Medical School.

PubMed

Park, Jong Cook; Kim, Kwang Sig

2012-03-01

The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Difficulty and Discriminability of Introductory Psychology Test Items.

ERIC Educational Resources Information Center

Scialfa, Charles; Legare, Connie; Wenger, Larry; Dingley, Louis

2001-01-01

Analyzes multiple-choice questions provided in test banks for introductory psychology textbooks. Study 1 offered a consistent picture of the objective difficulty of multiple-choice tests for introductory psychology students, while both studies 1 and 2 indicated that test items taken from commercial test banks have poor psychometric properties.…
Psychometric Properties of Difficulties of Working with Patients with Personality Disorders and Attitudes Towards Patients with Personality Disorders Scales.

PubMed

Eren, Nurhan

2014-12-01

In this study, we aimed to develop two reliable and valid assessment instruments for investigating the level of difficulties mental health workers experience while working with patients with personality disorders and the attitudes they develop tt the patients. The research was carried out based on the general screening model. The study sample consisted of 332 mental health workers in several mental health clinics of Turkey, with a certain amount of experience in working with personality disorders, who were selected with a random assignment method. In order to collect data, the Personal Information Questionnaire, Difficulty of Working with Personality Disorders Scale (PD-DWS), and Attitudes Towards Patients with Personality Disorders Scale (PD-APS), which are being examined for reliability and validity, were applied. To determine construct validity, the Adjective Check List, Maslach Burnout Inventory, and State and Trait Anxiety Inventory were used. Explanatory factor analysis was used for investigating the structural validity, and Cronbach alpha, Spearman-Brown, Guttman Split-Half reliability analyses were utilized to examine the reliability. Also, item reliability and validity computations were carried out by investigating the corrected item-total correlations and discriminative indexes of the items in the scales. For the PD-DWS KMO test, the value was .946; also, a significant difference was found for the Bartlett sphericity test (p<.001). The computed test-retest coefficient reliability was .702; the Cronbach alpha value of the total test score was .952. For PD-APS KMO, the value was .925; a significant difference was found in Bartlett sphericity test (p<.001); the computed reliability coefficient based on continuity was .806; and the Cronbach alpha value of the total test score was .913. Analyses on both scales were based on total scores. It was found that PD-DWS and PD-APS have good psychometric properties, measuring the structure that is being investigated, are compatible with other scales, have high levels of internal reliability between their items, and are consistent across time. Therefore, it was concluded that both scales are valid and reliable instruments.
Development and validity of a questionnaire to test the knowledge of primary care personnel regarding nutrition in obese adolescents.

PubMed

de Pinho, Lucinéia; Moura, Paulo Henrique Tolentino; Silveira, Marise Fagundes; de Botelho, Ana Cristina Carvalho; Caldeira, Antônio Prates

2013-07-18

In light of its epidemic proportions in developed and developing countries, obesity is considered a serious public health issue. In order to increase knowledge concerning the ability of health care professionals in caring for obese adolescents and adopt more efficient preventive and control measures, a questionnaire was developed and validated to assess non-dietitian health professionals regarding their Knowledge of Nutrition in Obese Adolescents (KNOA). The development and evaluation of a questionnaire to assess the knowledge of primary care practitioners with respect to nutrition in obese adolescents was carried out in five phases, as follows: 1) definition of study dimensions 2) development of 42 questions and preliminary evaluation of the questionnaire by a panel of experts; 3) characterization and selection of primary care practitioners (35 dietitians and 265 non-dietitians) and measurement of questionnaire criteria by contrasting the responses of dietitians and non-dietitians; 4) reliability assessment by question exclusion based on item difficulty (too easy and too difficult for non-dietitian practitioners), item discrimination, internal consistency and reproducibility index determination; and 5) scoring the completed questionnaires. Dietitians obtained higher scores than non-dietitians (Mann-Whitney U test, P < 0.05), confirming the validity of the questionnaire criteria. Items were discriminated by correlating the score for each item with the total score, using a minimum of 0.2 as a correlation coefficient cutoff value. Item difficulty was controlled by excluding questions answered correctly by more than 90% of the non-dietitian subjects (too easy) or by less than 10% of them (too difficult). The final questionnaire contained 26 of the original 42 questions, increasing Cronbach's α value from 0.788 to 0.807. Test-retest agreement between respondents was classified as good to very good (Kappa test, >0.60). The KNOA questionnaire developed for primary care practitioners is a valid, consistent and suitable instrument that can be applied over time, making it a promising tool for developing and guiding public health policies.
Working memory capacity and fluid abilities: the more difficult the item, the more more is better.

PubMed

Little, Daniel R; Lewandowsky, Stephan; Craig, Stewart

2014-01-01

The relationship between fluid intelligence and working memory is of fundamental importance to understanding how capacity-limited structures such as working memory interact with inference abilities to determine intelligent behavior. Recent evidence has suggested that the relationship between a fluid abilities test, Raven's Progressive Matrices, and working memory capacity (WMC) may be invariant across difficulty levels of the Raven's items. We show that this invariance can only be observed if the overall correlation between Raven's and WMC is low. Simulations of Raven's performance revealed that as the overall correlation between Raven's and WMC increases, the item-wise point bi-serial correlations involving WMC are no longer constant but increase considerably with item difficulty. The simulation results were confirmed by two studies that used a composite measure of WMC, which yielded a higher correlation between WMC and Raven's than reported in previous studies. As expected, with the higher overall correlation, there was a significant positive relationship between Raven's item difficulty and the extent of the item-wise correlation with WMC.
Item selection via Bayesian IRT models.

PubMed

Arima, Serena

2015-02-10

With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
Bilingual health literacy assessment using the Talking Touchscreen/la Pantalla Parlanchina: Development and pilot testing.

PubMed

Yost, Kathleen J; Webster, Kimberly; Baker, David W; Choi, Seung W; Bode, Rita K; Hahn, Elizabeth A

2009-06-01

Current health literacy measures are too long, imprecise, or have questionable equivalence of English and Spanish versions. The purpose of this paper is to describe the development and pilot testing of a new bilingual computer-based health literacy assessment tool. We analyzed literacy data from three large studies. Using a working definition of health literacy, we developed new prose, document and quantitative items in English and Spanish. Items were pilot tested on 97 English- and 134 Spanish-speaking participants to assess item difficulty. Items covered topics relevant to primary care patients and providers. English- and Spanish-speaking participants understood the tasks involved in answering each type of question. The English Talking Touchscreen was easy to use and the English and Spanish items provided good coverage of the difficulty continuum. Qualitative and quantitative results provided useful information on computer acceptability and initial item difficulty. After the items have been administered on the Talking Touchscreen (la Pantalla Parlanchina) to 600 English-speaking (and 600 Spanish-speaking) primary care patients, we will develop a computer adaptive test. This health literacy tool will enable clinicians and researchers to more precisely determine the level at which low health literacy adversely affects health and healthcare utilization.
A set of high quality colour images with Spanish norms for seven relevant psycholinguistic variables: the Nombela naming test.

PubMed

Moreno-Martinez, Francisco Javier; Montoro, Pedro R; Laws, Keith R

2011-05-01

This paper presents a new corpus of 140 high quality colour images belonging to 14 subcategories and covering a range of naming difficulty. One hundred and six Spanish speakers named the items and provided data for several psycholinguistic variables: age of acquisition, familiarity, manipulability, name agreement, typicality and visual complexity. Furthermore, we also present lexical frequency data derived internet search hits. Apart from the large number of variables evaluated, these stimuli present an important advantage with respect to other comparable image corpora in so far as naming performance in healthy individuals is less prone to ceiling effect problems. Reliability and validity indexes showed that our items display similar psycholinguistic characteristics to those of other corpora. In sum, this set of ecologically valid stimuli provides a useful tool for scientists engaged in cognitive and neuroscience-based research.

Mokken scaling of the Myocardial Infarction Dimensional Assessment Scale (MIDAS).

PubMed

Thompson, David R; Watson, Roger

2011-02-01

The purpose of this study was to examine the hierarchical and cumulative nature of the 35 items of the Myocardial Infarction Dimensional Assessment Scale (MIDAS), a disease-specific health-related quality of life measure. Data from 668 participants who completed the MIDAS were analysed using the Mokken Scaling Procedure, which is a computer program that searches polychotomous data for hierarchical and cumulative scales on the basis of a range of diagnostic criteria. Fourteen MIDAS items were retained in a Mokken scale and these items included physical activity, insecurity, emotional reaction and dependency items but excluded items related to diet, medication or side-effects. Item difficulty, in item response theory terms, ran from physical activity items (low difficulty) to insecurity, suggesting that the most severe quality of life effect of myocardial infarction is loneliness and isolation. Items from the MIDAS form a strong and reliable Mokken scale, which provides new insight into the relationship between items in the MIDAS and the measurement of quality of life after myocardial infarction. © 2010 Blackwell Publishing Ltd.
Validation of an instrument to assess visual ability in children with visual impairment in China.

PubMed

Huang, Jinhai; Khadka, Jyoti; Gao, Rongrong; Zhang, Sifang; Dong, Wenpeng; Bao, Fangjun; Chen, Haisi; Wang, Qinmei; Chen, Hao; Pesudovs, Konrad

2017-04-01

To validate a visual ability instrument for school-aged children with visual impairment in China by translating, culturally adopting and Rasch scaling the Cardiff Visual Ability Questionnaire for Children (CVAQC). The 25-item CVAQC was translated into Mandarin using a standard protocol. The translated version (CVAQC-CN) was subjected to cognitive testing to ensure a proper cultural adaptation of its content. Then, the CVAQC-CN was interviewer-administered to 114 school-aged children and young people with visual impairment. Rasch analysis was carried out to assess its psychometric properties. The correlation between the CVAQC-CN visual ability scores and clinical measure of vision (visual acuity; VA and contrast sensitivity, CS) were assessed using Spearman's r. Based on cultural adaptation exercise, cognitive testing, missing data and Rasch metrics-based iterative item removal, three items were removed from the original 25. The 22-item CVAQC-CN demonstrated excellent measurement precision (person separation index, 3.08), content validity (item separation, 10.09) and item reliability (0.99). Moreover, the CVAQC-CN was unidimensional and had no item bias. The person-item map indicated good targeting of item difficulty to person ability. The CVAQC-CN had moderate correlations between CS (-0.53, p<0.00001) and VA (0.726, p<0.00001), respectively, indicating its validity. The 22-item CVAQC-CN is a psychometrically robust and valid instrument to measure visual ability in children with visual impairment in China. The instrument can be used as a clinical and research outcome measure to assess the change in visual ability after low vision rehabilitation intervention. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Interpretation of the Rasch Ability and Difficulty Scales for Educational Purposes.

ERIC Educational Resources Information Center

Woodcock, Richard W.

Though many test developers have utilized item response theory in their work, few have taken advantage of the potential of item response theory for providing new interpretation procedures that accentuate the educational implications to be drawn from test scores. This paper describes several features, based upon the Rasch difficulty and ability…
The Effect of Anchor Test Construction on Scale Drift

ERIC Educational Resources Information Center

Antal, Judit; Proctor, Thomas P.; Melican, Gerald J.

2014-01-01

In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…
A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.

PubMed

Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily

2018-02-23

The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.
Simple mental addition in children with and without mild mental retardation.

PubMed

Janssen, R; De Boeck, P; Viaene, M; Vallaeys, L

1999-11-01

The speeded performance on simple mental addition problems of 6- and 7-year-old children with and without mild mental retardation is modeled from a person perspective and an item perspective. On the person side, it was found that a single cognitive dimension spanned the performance differences between the two ability groups. However, a discontinuity, or "jump," was observed in the performance of the normal ability group on the easier items. On the item side, the addition problems were almost perfectly ordered in difficulty according to their problem size. Differences in difficulty were explained by factors related to the difficulty of executing nonretrieval strategies. All findings were interpreted within the framework of Siegler's (e.g., R. S. Siegler & C. Shipley, 1995) model of children's strategy choices in arithmetic. Models from item response theory were used to test the hypotheses. Copyright 1999 Academic Press.
Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Assisting Australians with mental health problems and financial difficulties: a Delphi study to develop guidelines for financial counsellors, financial institution staff, mental health professionals and carers.

PubMed

Bond, Kathy S; Chalmers, Kathryn J; Jorm, Anthony F; Kitchener, Betty A; Reavley, Nicola J

2015-06-03

There is a strong association between mental health problems and financial difficulties. Therefore, people who work with those who have financial difficulties (financial counsellors and financial institution staff) need to have knowledge and helping skills relevant to mental health problems. Conversely, people who support those with mental health problems (mental health professionals and carers) may need to have knowledge and helping skills relevant to financial difficulties. The Delphi expert consensus method was used to develop guidelines for people who work with or support those with mental health problems and financial difficulties. A systematic review of websites, books and journal articles was conducted to develop a questionnaire containing items about the knowledge, skills and actions relevant to working with or supporting someone with mental health problems and financial difficulties. These items were rated over three rounds by five Australian expert panels comprising of financial counsellors (n = 33), financial institution staff (n = 54), mental health professionals (n = 31), consumers (n = 20) and carers (n = 24). A total of 897 items were rated, with 462 items endorsed by at least 80 % of members of each of the expert panels. These endorsed statements were used to develop a set of guidelines for financial counsellors, financial institution staff, mental health professionals and carers about how to assist someone with mental health problems and financial difficulties. A diverse group of expert panel members were able to reach substantial consensus on the knowledge, skills and actions needed to work with and support people with mental health problems and financial difficulties. These guidelines can be used to inform policy and practice in the financial and mental health sectors.
Assessment of Differential Item Functioning in the Experiences of Discrimination Index

PubMed Central

Cunningham, Timothy J.; Berkman, Lisa F.; Gortmaker, Steven L.; Kiefe, Catarina I.; Jacobs, David R.; Seeman, Teresa E.; Kawachi, Ichiro

2011-01-01

The psychometric properties of instruments used to measure self-reported experiences of discrimination in epidemiologic studies are rarely assessed, especially regarding construct validity. The authors used 2000–2001 data from the Coronary Artery Risk Development in Young Adults (CARDIA) Study to examine differential item functioning (DIF) in 2 versions of the Experiences of Discrimination (EOD) Index, an index measuring self-reported experiences of racial/ethnic and gender discrimination. DIF may confound interpretation of subgroup differences. Large DIF was observed for 2 of 7 racial/ethnic discrimination items: White participants reported more racial/ethnic discrimination for the “at school” item, and black participants reported more racial/ethnic discrimination for the “getting housing” item. The large DIF by race/ethnicity in the index for racial/ethnic discrimination probably reflects item impact and is the result of valid group differences between blacks and whites regarding their respective experiences of discrimination. The authors also observed large DIF by race/ethnicity for 3 of 7 gender discrimination items. This is more likely to have been due to item bias. Users of the EOD Index must consider the advantages and disadvantages of DIF adjustment (omitting items, constructing separate measures, and retaining items). The EOD Index has substantial usefulness as an instrument that can assess self-reported experiences of discrimination. PMID:22038104
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.

PubMed

Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri

2017-03-01

Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
Using Student Ability and Item Difficulty for Making Defensible Pass/Fail Decisions for Borderline Grades

ERIC Educational Resources Information Center

Shulruf, Boaz; Jones, Phil; Turner, Rolf

2015-01-01

The determination of Pass/Fail decisions over Borderline grades, (i.e., grades which do not clearly distinguish between the competent and incompetent examinees) has been an ongoing challenge for academic institutions. This study utilises the Objective Borderline Method (OBM) to determine examinee ability and item difficulty, and from that…
Psychometric Properties of the Chinese Version of the Beck Depression Inventory-II Using the Rasch Model

ERIC Educational Resources Information Center

Wu, Pei-Chen; Chang, Lily

2008-01-01

The authors investigated the Chinese version of the Beck Depression Inventory-II (BDI-II-C; Chinese Behavioral Science Corporation, 2000) within the Rasch framework in terms of dimensionality, item difficulty, and category functioning. Two underlying scale dimensions, relatively high item difficulties, and a need for collapsing 2 response…
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.

ERIC Educational Resources Information Center

Benson, Jeri; Wilson, Michael

Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Effects of Anchor Item Methods on the Detection of Differential Item Functioning within the Family of Rasch Models

ERIC Educational Resources Information Center

Wang, Wen-Chung

2004-01-01

Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
Classical Item Analysis Using Latent Variable Modeling: A Note on a Direct Evaluation Procedure

ERIC Educational Resources Information Center

Raykov, Tenko; Marcoulides, George A.

2011-01-01

A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits…
Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?

ERIC Educational Resources Information Center

DeMars, Christine

Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…
Fostering a student's skill for analyzing test items through an authentic task

NASA Astrophysics Data System (ADS)

Setiawan, Beni; Sabtiawan, Wahyu Budi

2017-08-01

Analyzing test items is a skill that must be mastered by prospective teachers, in order to determine the quality of test questions which have been written. The main aim of this research was to describe the effectiveness of authentic task to foster the student's skill for analyzing test items involving validity, reliability, item discrimination index, level of difficulty, and distractor functioning through the authentic task. The participant of the research is students of science education study program, science and mathematics faculty, Universitas Negeri Surabaya, enrolled for assessment course. The research design was a one-group posttest design. The treatment in this study is that the students were provided an authentic task facilitating the students to develop test items, then they analyze the items like a professional assessor using Microsoft Excel and Anates Software. The data of research obtained were analyzed descriptively, such as the analysis was presented by displaying the data of students' skill, then they were associated with theories or previous empirical studies. The research showed the task facilitated the students to have the skills. Thirty-one students got a perfect score for the analyzing, five students achieved 97% mastery, two students had 92% mastery, and another two students got 89% and 79% of mastery. The implication of the finding was the students who get authentic tasks forcing them to perform like a professional, the possibility of the students for achieving the professional skills will be higher at the end of learning.
Conflict and metacognitive control: the mismatch-monitoring hypothesis of how others' knowledge states affect recall.

PubMed

Fraundorf, Scott H; Benjamin, Aaron S

2016-09-01

Information about others' success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent's accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent's performance and once afterwards. Participants reconsidered their responses least often when the opponent's accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent's accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent's performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others' knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall.
Intervention for children with word-finding difficulties: a parallel group randomised control trial.

PubMed

Best, Wendy; Hughes, Lucy Mari; Masterson, Jackie; Thomas, Michael; Fedor, Anna; Roncoli, Silvia; Fern-Pollak, Liory; Shepherd, Donna-Lynn; Howard, David; Shobbrook, Kate; Kapikian, Anna

2017-07-31

The study investigated the outcome of a word-web intervention for children diagnosed with word-finding difficulties (WFDs). Twenty children age 6-8 years with WFDs confirmed by a discrepancy between comprehension and production on the Test of Word Finding-2, were randomly assigned to intervention (n = 11) and waiting control (n = 9) groups. The intervention group had six sessions of intervention which used word-webs and targeted children's meta-cognitive awareness and word-retrieval. On the treated experimental set (n = 25 items) the intervention group gained on average four times as many items as the waiting control group (d = 2.30). There were also gains on personally chosen items for the intervention group. There was little change on untreated items for either group. The study is the first randomised control trial to demonstrate an effect of word-finding therapy with children with language difficulties in mainstream school. The improvement in word-finding for treated items was obtained following a clinically realistic intervention in terms of approach, intensity and duration.
Identification of technical item flaws leads to improvement of the quality of single best Multiple Choice Questions.

PubMed

Fayyaz Khan, Humaira; Farooq Danish, Khalid; Saeed Awan, Azra; Anwar, Masood

2013-05-01

The purpose of the study was to identify technical item flaws in the multiple choice questions submitted for the final exams for the years 2009, 2010 and 2011. This descriptive analytical study was carried out in Islamic International Medical College (IIMC). The Data was collected from the MCQ's submitted by the faculty for the final exams for the year 2009, 2010 and 2011. The data was compiled and evaluated by a three member assessment committee. The data was analyzed for frequency and percentages the categorical data was analyzed by chi-square test. Overall percentage of flawed item was 67% for the year 2009 of which 21% were for testwiseness and 40% were for irrelevant difficulty. In year 2010 the total item flaws were 36% and 11% testwiseness and 22% were for irrelevant difficulty. The year 2011 data showed decreased overall flaws of 21%. The flaws of testwisness were 7%, irrelevant difficulty were 11%. Technical item flaws are frequently encountered during MCQ construction, and the identification of flaws leads to improved quality of the single best MCQ's.

An enhanced functional ability questionnaire (faVIQ) to measure the impact of rehabilitation services on the visually impaired

PubMed Central

Wolffsohn, James Stuart; Jackson, Jonathan; Hunt, Olivia Anne; Cottriall, Charles; Lindsay, Jennifer; Gilmour, Richard; Sinclair, Anne; Harper, Robert

2014-01-01

AIM To develop a short, enhanced functional ability Quality of Vision (faVIQ) instrument based on previous questionnaires employing comprehensive modern statistical techniques to ensure the use of an appropriate response scale, items and scoring of the visual related difficulties experienced by patients with visual impairment. METHODS Items in current quality-of-life questionnaires for the visually impaired were refined by a multi-professional group and visually impaired focus groups. The resulting 76 items were completed by 293 visually impaired patients with stable vision on two occasions separated by a month. The faVIQ scores of 75 patients with no ocular pathology were compared to 75 age and gender matched patients with visual impairment. RESULTS Rasch analysis reduced the faVIQ items to 27. Correlation to standard visual metrics was moderate (r=0.32-0.46) and to the NEI-VFQ was 0.48. The faVIQ was able to clearly discriminate between age and gender matched populations with no ocular pathology and visual impairment with an index of 0.983 and 95% sensitivity and 95% specificity using a cut off of 29. CONCLUSION The faVIQ allows sensitive assessment of quality-of-life in the visually impaired and should support studies which evaluate the effectiveness of low vision rehabilitation services. PMID:24634868
An enhanced functional ability questionnaire (faVIQ) to measure the impact of rehabilitation services on the visually impaired.

PubMed

Wolffsohn, James Stuart; Jackson, Jonathan; Hunt, Olivia Anne; Cottriall, Charles; Lindsay, Jennifer; Gilmour, Richard; Sinclair, Anne; Harper, Robert

2014-01-01

To develop a short, enhanced functional ability Quality of Vision (faVIQ) instrument based on previous questionnaires employing comprehensive modern statistical techniques to ensure the use of an appropriate response scale, items and scoring of the visual related difficulties experienced by patients with visual impairment. Items in current quality-of-life questionnaires for the visually impaired were refined by a multi-professional group and visually impaired focus groups. The resulting 76 items were completed by 293 visually impaired patients with stable vision on two occasions separated by a month. The faVIQ scores of 75 patients with no ocular pathology were compared to 75 age and gender matched patients with visual impairment. Rasch analysis reduced the faVIQ items to 27. Correlation to standard visual metrics was moderate (r=0.32-0.46) and to the NEI-VFQ was 0.48. The faVIQ was able to clearly discriminate between age and gender matched populations with no ocular pathology and visual impairment with an index of 0.983 and 95% sensitivity and 95% specificity using a cut off of 29. The faVIQ allows sensitive assessment of quality-of-life in the visually impaired and should support studies which evaluate the effectiveness of low vision rehabilitation services.
77 FR 23282 - All Items Consumer Price Index for All Urban Consumers; United States City Average

Federal Register 2010, 2011, 2012, 2013, 2014

2012-04-18

... DEPARTMENT OF LABOR Office of the Secretary All Items Consumer Price Index for All Urban Consumers... the United States City Average All Items Consumer Price Index for All Urban Consumers (1967 = 100... Price Index for All Urban Consumers thus increased 356.2 percent from its 1974 annual average of 100 to...
A Review of Classical Methods of Item Analysis.

ERIC Educational Resources Information Center

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah

2011-01-01

Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Adaptable Learning Assistant for Item Bank Management

ERIC Educational Resources Information Center

Nuntiyagul, Atorn; Naruedomkul, Kanlaya; Cercone, Nick; Wongsawang, Damras

2008-01-01

We present PKIP, an adaptable learning assistant tool for managing question items in item banks. PKIP is not only able to automatically assist educational users to categorize the question items into predefined categories by their contents but also to correctly retrieve the items by specifying the category and/or the difficulty level. PKIP adapts…
Expertise sensitive item selection.

PubMed

Chow, P; Russell, H; Traub, R E

2000-12-01

In this paper we describe and illustrate a procedure for selecting items from a large pool for a certification test. The proposed procedure, which is intended to improve the alignment of the certification test with on-the-job performance, is based on an expertise sensitive index. This index for an item is the difference between the item's p values for experts and novices. An example is provided of the application of the index for selecting items to be used in certifying bakers.
Two-item same/different discrimination in rhesus monkeys (Macaca mulatta).

PubMed

Basile, Benjamin M; Moylan, Emily J; Charles, David P; Murray, Elisabeth A

2015-11-01

Almost all nonhuman animals can recognize when one item is the same as another item. It is less clear whether nonhuman animals possess abstract concepts of "same" and "different" that can be divorced from perceptual similarity. Pigeons and monkeys show inconsistent performance, and often surprising difficulty, in laboratory tests of same/different learning that involve only two items. Previous results from tests using multi-item arrays suggest that nonhumans compute sameness along a continuous scale of perceptual variability, which would explain the difficulty of making two-item same/different judgments. Here, we provide evidence that rhesus monkeys can learn a two-item same/different discrimination similar to those on which monkeys and pigeons have previously failed. Monkeys' performance transferred to novel stimuli and was not affected by perceptual variations in stimulus size, rotation, view, or luminance. Success without the use of multi-item arrays, and the lack of effect of perceptual variability, suggests a computation of sameness that is more categorical, and perhaps more abstract, than previously thought.
Item analysis of university-wide multiple choice objective examinations: the experience of a Nigerian private university.

PubMed

Odukoya, Jonathan A; Adekeye, Olajide; Igbinoba, Angie O; Afolabi, A

2018-01-01

Teachers and Students worldwide often dance to the tune of tests and examinations. Assessments are powerful tools for catalyzing the achievement of educational goals, especially if done rightly. One of the tools for 'doing it rightly' is item analysis. The core objectives for this study, therefore, were: ascertaining the item difficulty and distractive indices of the university wide courses. A range of 112-1956 undergraduate students participated in this study. With the use of secondary data, the ex-post facto design was adopted for this project. In virtually all cases, majority of the items (ranging between 65% and 97% of the 70 items fielded in each course) did not meet psychometric standard in terms of difficulty and distractive indices and consequently needed to be moderated or deleted. Considering the importance of these courses, the need to apply item analyses when developing these tests was emphasized.
Relevance of Item Analysis in Standardizing an Achievement Test in Teaching of Physical Science in B.Ed Syllabus

ERIC Educational Resources Information Center

Marie, S. Maria Josephine Arokia; Edannur, Sreekala

2015-01-01

This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Fitting the Rasch Model to Account for Variation in Item Discrimination

ERIC Educational Resources Information Center

Weitzman, R. A.

2009-01-01

Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.

ERIC Educational Resources Information Center

Hertz, Norman R.; Chinn, Roberta N.

This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
Efforts Toward the Development of Unbiased Selection and Assessment Instruments.

ERIC Educational Resources Information Center

Rudner, Lawrence M.

Investigations into item bias provide an empirical basis for the identification and elimination of test items which appear to measure different traits across populations or cultural groups. The Psychometric rationales for six approaches to the identification of biased test items are reviewed: (1) Transformed item difficulties: within-group…
Automatic Item Generation of Probability Word Problems

ERIC Educational Resources Information Center

Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina

2009-01-01

Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…
Screening of Cognitive Impairment in Schizophrenia: Reliability, Sensitivity, and Specificity of the Repeatable Battery for the Assessment of Neuropsychological Status in a Spanish Sample.

PubMed

De la Torre, Gabriel G; Perez, Maria J; Ramallo, Miguel A; Randolph, Christopher; González-Villegas, Macarena Bernal

2016-04-01

In recent years, a number of studies focusing on the evaluation of neuropsychological deficits in individuals with schizophrenia have shown deficits that include several cognitive functions. Attention deficits as well as memory or executive function deficits are common in this kind of disorder together with sustained attention problems, working memory deficiencies, and problem-solving difficulties, among many others. Currently, the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) is gaining special importance in the evaluation of the cognitive deficits associated with schizophrenia. In this article, we describe an RBANS screening in a sample of 88 Spanish patients diagnosed with schizophrenia. We also aimed to check the battery's reliability, sensitivity, and specificity in the studied sample. We performed a comparative study with 88 healthy participants. The results showed a reliability index value of α = .795 and an item value of α = .762. For total test reliability, we obtained an index value of α = .761 and an item value of α = .762. Sensitivity score was 87.5% and specificity 86.4%. RBANS obtained good reliability, sensitivity, and specificity scores and represents a good screening tool in detecting cognitive deficits associated with schizophrenia. © The Author(s) 2015.
The Strengths and Difficulties Questionnaire (SDQ) Revisited in a French-Speaking Population: Proposition of a Reduced Version of the Parent SDQ

ERIC Educational Resources Information Center

Chauvin, Bruno; Leonova, Tamara

2016-01-01

Key concerns about the psychometric properties of the 25-item version of the Strengths and Difficulties Questionnaire (SDQ) have consistently been raised in the literature. The present study aimed at examining the meaningfulness of an alternative model to the SDQ in which 7 problematic items are excluded. French-speaking parents of 262 boys and…
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

PubMed

Smolen, Tomasz; Chuderski, Adam

2015-01-01

Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
Conflict and metacognitive control: The mismatch-monitoring hypothesis of how others’ knowledge states affect recall

PubMed Central

Fraundorf, Scott H.; Benjamin, Aaron S.

2015-01-01

Information about others’ success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent’s accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent’s performance and once afterwards. Participants reconsidered their responses least often when the opponent’s accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent’s accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent’s performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others’ knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall. PMID:26247369
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.

PubMed

Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro

2013-01-01

the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Development and validity of a questionnaire to test the knowledge of primary care personnel regarding nutrition in obese adolescents

PubMed Central

2013-01-01

Background In light of its epidemic proportions in developed and developing countries, obesity is considered a serious public health issue. In order to increase knowledge concerning the ability of health care professionals in caring for obese adolescents and adopt more efficient preventive and control measures, a questionnaire was developed and validated to assess non-dietitian health professionals regarding their Knowledge of Nutrition in Obese Adolescents (KNOA). Methods The development and evaluation of a questionnaire to assess the knowledge of primary care practitioners with respect to nutrition in obese adolescents was carried out in five phases, as follows: 1) definition of study dimensions 2) development of 42 questions and preliminary evaluation of the questionnaire by a panel of experts; 3) characterization and selection of primary care practitioners (35 dietitians and 265 non-dietitians) and measurement of questionnaire criteria by contrasting the responses of dietitians and non-dietitians; 4) reliability assessment by question exclusion based on item difficulty (too easy and too difficult for non-dietitian practitioners), item discrimination, internal consistency and reproducibility index determination; and 5) scoring the completed questionnaires. Results Dietitians obtained higher scores than non-dietitians (Mann–Whitney U test, P < 0.05), confirming the validity of the questionnaire criteria. Items were discriminated by correlating the score for each item with the total score, using a minimum of 0.2 as a correlation coefficient cutoff value. Item difficulty was controlled by excluding questions answered correctly by more than 90% of the non-dietitian subjects (too easy) or by less than 10% of them (too difficult). The final questionnaire contained 26 of the original 42 questions, increasing Cronbach’s α value from 0.788 to 0.807. Test-retest agreement between respondents was classified as good to very good (Kappa test, >0.60). Conclusion The KNOA questionnaire developed for primary care practitioners is a valid, consistent and suitable instrument that can be applied over time, making it a promising tool for developing and guiding public health policies. PMID:23865564

Causal attribution for success and failure in mathematics among MDAB pre-diploma students

NASA Astrophysics Data System (ADS)

Maidinsah, Hamidah; Embong, Rokiah; Wahab, Zubaidah Abd

2014-07-01

The Program Mengubah Destini Anak Bangsa (MDAB) is a pre-diploma programme catering to SPM school leavers who do not meet the minimum requirement to enter any of UiTM diploma programmes. The study aims to evaluate the perceptions of MDAB students toward the main causal attribution factors underlying students' success and failure in mathematics. Research sample comprised of 482 students from five UiTM branch campuses. Research instrument used was a set of GALUS questionnaire consisting of 36 items based on the Weiner Attribution Theory. Four causal attributions factors for success and failures evaluated are ability, effort, question difficulty and environment. GALUS reliability index was 0.93. The research found that effort appears to be the main causal attribution factor in students' success and failure in mathematics, followed by environment, question difficulty and ability. High achiever students strongly agree that the ability factor influenced their success while low achiever students strongly agree that all attributing factors influenced their failures in mathematics.
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?

ERIC Educational Resources Information Center

Jackson, Evelyn W.; And Others

1994-01-01

Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
Measuring Student Learning with Item Response Theory

ERIC Educational Resources Information Center

Lee, Young-Jin; Palazzo, David J.; Warnakulasooriya, Rasil; Pritchard, David E.

2008-01-01

We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT) to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics…
Combining the Best of Two Standard Setting Methods: The Ordered Item Booklet Angoff

ERIC Educational Resources Information Center

Smith, Russell W.; Davis-Becker, Susan L.; O'Leary, Lisa S.

2014-01-01

This article describes a hybrid standard setting method that combines characteristics of the Angoff (1971) and Bookmark (Mitzel, Lewis, Patz & Green, 2001) methods. The proposed approach utilizes strengths of each method while addressing weaknesses. An ordered item booklet, with items sorted based on item difficulty, is used in combination…
Comparative Racial Analysis of Enlisted Advancement Exams: Item- Difficulty.

DTIC Science & Technology

1975-07-01

11cm-ana lysis Promotion Racial comparison Equal opportunity 1 20. ABSTRACT (Continue on reveree aide 11 neceeemry mnd Identity by block...improving equal oppor- tunity in career growth for minority groups. The study of exam item- difficulty levels is the first of a series of technical reports...under Exploratory Development Task Area PF55.521.032 (Contemporary Social Issues). J. J. CLARKIN Commanding Officer SUMMARY Purpose A number of
Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

ERIC Educational Resources Information Center

Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

2015-01-01

Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
Understanding Orgasmic Difficulty in Women.

PubMed

Rowland, David L; Kolba, Tiffany N

2016-08-01

Women's primary issue with the orgasmic phase is usually difficulty reaching orgasm. To identify predictors of orgasmic difficulty in women within the context of a partnered sexual experience; to assess the relation between orgasmic difficulty and self-reported levels of sexual desire or interest and arousal in women; and to assess the interrelations among three dimensions of orgasmic response during partnered sex: self-reported time to reach orgasm, general difficulty or ease of reaching orgasm, and level of distress or concern. Drawing from a community-based sample using the Internet, 866 women were queried on a 26-item survey regarding their difficulty reaching orgasm during partnered sex. Four hundred sixteen women who indicated difficulty also responded to items assessing arousal and desire difficulties, level of distress about their condition, and their estimated time to reach orgasm. Answers to a 26-item survey on surveyed women's difficulty reaching orgasm during partnered sex. Age, arousal difficulty, and lubrication difficulty predicted difficulty reaching orgasm in the overall sample. In the subsample of women reporting difficulty, approximately half reported issues with arousal. Women with arousal problems reported greater difficulty reaching orgasm but did not differ from those without arousal problems on measurements of orgasm latency or levels of distress. Slightly more than half the women experiencing difficulty reaching orgasm were distressed by their condition; distressed women reported greater difficulty reaching orgasm and longer latencies to orgasm than non-distressed counterparts. They also reported lower satisfaction with their sexual relationship. This study indicates the importance of assessing multiple parameters when investigating orgasmic problems in women, including arousal issues, levels of distress, and latency to orgasm. Results also clarify that women with arousal problems do not differ substantially from those without arousal problems; in contrast, women distressed by their condition differ from non-distressed women along some critical dimensions. Although orgasmic problems decreased with age, the overall relation of this variable to distress, arousal, and latency to orgasm was essentially unchanged across age groups. Copyright © 2016 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
Item response theory-based validation of a short form of the Eating Behavior Scale for Japanese adults

PubMed Central

Tayama, Jun; Ogawa, Sayaka; Takeoka, Atsushi; Kobayashi, Masakazu; Shirabe, Susumu

2017-01-01

Abstract Obesity has become a serious social problem in industrialized countries in recent years. Clinically, although the evaluation of dietary behavior abnormalities is as important as any method of risk assessment for obesity, almost all the existing scales with many items may have numerous practical clinical difficulties. In this study, we aimed to prepare a short questionnaire to assess the dietary behavior abnormalities related to obesity. A total of 1032 individuals aged 20 to 59 years participated in the present study. Using item response theory (IRT), we selected the items for a short version from among 30 items of Sakata Eating Behavior Scale (EBS), which is widely used in Japan. As a result of the IRT-based analysis on the original 30-item version, 7 items were adopted as the short version. The correlation between the total score of the original EBS and the EBS short form was extremely high (r = 0.93, P = .001). In examining the criterion validity, for all participants (n = 1032), male (n = 516), and female (n = 516), the correlation coefficients between the total score of the EBS short form and body mass index (BMI) were r = 0.26, r = 0.28, and r = 0.28, respectively. The results of the receiver operating characteristic analysis was performed with obesity BMI > 25 kg/m2 as a dependent variable, the value of the area under the curve in the ROC was significantly higher in the 7-item version than in the total score of the original items (P = .0005). In conclusion, the 7-item EBS short form was created. Furthermore, it was found that the EBS short form is a reliable and valid measure that can be used as an indicator of obesity in both clinical and research settings. PMID:29049248
Separating relational from item load effects in paired recognition: temporoparietal and middle frontal gyral activity with increased associates, but not items during encoding and retention.

PubMed

Phillips, Steven; Niki, Kazuhisa

2002-10-01

Working memory is affected by items stored and the relations between them. However, separating these factors has been difficult, because increased items usually accompany increased associations/relations. Hence, some have argued, relational effects are reducible to item effects. We overcome this problem by manipulating index length: the fewest number of item positions at which there is a unique item, or tuple of items (if length >1), for every instance in the relational (memory) set. Longer indexes imply greater similarity (number of shared items) between instances and higher load on encoding processes. Subjects were given lists of study pairs and asked to make a recognition judgement. The number of unique items and index length in the three list conditions were: (1) AB, CD: four/one; (2) AB, CD, EF: six/one; and (3) AB, AD, CB: four/two, respectively. Japanese letters were used in Experiments 1 (kanji-ideograms) and 2 (hiragana-phonograms); numbers in Experiment 3; and shapes generated from Fourier descriptors in Experiment 4. Across all materials, right dominant temporoparietal and middle frontal gyral activity was found with increased index length, but not items during study. In Experiment 5, a longer delay was used to isolate retention effects in the absence of visual stimuli. Increased left hemispheric activity was observed in the precuneus, middle frontal gyrus, and superior temporal gyrus with increased index length for the delay period. These results show that relational load is not reducible to item load.
Estimating the Number of Examinees Who Did Not Reach the Last Item of a Section.

ERIC Educational Resources Information Center

Wainer, Howard

It is important to estimate the number of examinees who reached a test item, because item difficulty is defined by the number who answered correctly divided by the number who reached the item. A new method is presented and compared to the previously used definition of three categories of response to an item: (1) answered; (2) omitted--a…
A Comparison of Traditional Test Blueprinting and Item Development to Assessment Engineering in a Licensure Context

ERIC Educational Resources Information Center

Masters, James S.

2010-01-01

With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…
The PROactive instruments to measure physical activity in patients with chronic obstructive pulmonary disease.

PubMed

Gimeno-Santos, Elena; Raste, Yogini; Demeyer, Heleen; Louvaris, Zafeiris; de Jong, Corina; Rabinovich, Roberto A; Hopkinson, Nicholas S; Polkey, Michael I; Vogiatzis, Ioannis; Tabberer, Maggie; Dobbels, Fabienne; Ivanoff, Nathalie; de Boer, Willem I; van der Molen, Thys; Kulich, Karoly; Serra, Ignasi; Basagaña, Xavier; Troosters, Thierry; Puhan, Milo A; Karlsson, Niklas; Garcia-Aymerich, Judith

2015-10-01

No current patient-centred instrument captures all dimensions of physical activity in chronic obstructive pulmonary disease (COPD). Our objective was item reduction and initial validation of two instruments to measure physical activity in COPD.Physical activity was assessed in a 6-week, randomised, two-way cross-over, multicentre study using PROactive draft questionnaires (daily and clinical visit versions) and two activity monitors. Item reduction followed an iterative process including classical and Rasch model analyses, and input from patients and clinical experts.236 COPD patients from five European centres were included. Results indicated the concept of physical activity in COPD had two domains, labelled "amount" and "difficulty". After item reduction, the daily PROactive instrument comprised nine items and the clinical visit contained 14. Both demonstrated good model fit (person separation index >0.7). Confirmatory factor analysis supported the bidimensional structure. Both instruments had good internal consistency (Cronbach's α>0.8), test-retest reliability (intraclass correlation coefficient ≥0.9) and exhibited moderate-to-high correlations (r>0.6) with related constructs and very low correlations (r<0.3) with unrelated constructs, providing evidence for construct validity.Daily and clinical visit "PROactive physical activity in COPD" instruments are hybrid tools combining a short patient-reported outcome questionnaire and two activity monitor variables which provide simple, valid and reliable measures of physical activity in COPD patients. Copyright ©ERS 2015.
A Comparison of Alternate-Choice and True-False Item Forms Used in Classroom Examinations.

ERIC Educational Resources Information Center

Maihoff, N. A.; Mehrens, Wm. A.

A comparison is presented of alternate-choice and true-false item forms used in an undergraduate natural science course. The alternate-choice item is a modified two-choice multiple-choice item in which the two responses are included within the question stem. This study (1) compared the difficulty level, discrimination level, reliability, and…
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.

ERIC Educational Resources Information Center

Brutten, Sheila R.; And Others

A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
Explaining and Controlling for the Psychometric Properties of Computer-Generated Figural Matrix Items

ERIC Educational Resources Information Center

Freund, Philipp Alexander; Hofer, Stefan; Holling, Heinz

2008-01-01

Figural matrix items are a popular task type for assessing general intelligence (Spearman's g). Items of this kind can be constructed rationally, allowing the implementation of computerized generation algorithms. In this study, the influence of different task parameters on the degree of difficulty in matrix items was investigated. A sample of N =…
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

ERIC Educational Resources Information Center

Chariker, Julia H.; Naaz, Farah; Pani, John R.

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of…
Estimation of Item Response Theory Parameters in the Presence of Missing Data

ERIC Educational Resources Information Center

Finch, Holmes

2008-01-01

Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
The consequences of language proficiency and difficulty of lexical access for translation performance and priming.

PubMed

Francis, Wendy S; Tokowicz, Natasha; Kroll, Judith F

2014-01-01

Repetition priming was used to assess how proficiency and the ease or difficulty of lexical access influence bilingual translation. Two experiments, conducted at different universities with different Spanish-English bilingual populations and materials, showed repetition priming in word translation for same-direction and different-direction repetitions. Experiment 1, conducted in an English-dominant environment, revealed an effect of translation direction but not of direction match, whereas Experiment 2, conducted in a more balanced bilingual environment, showed an effect of direction match but not of translation direction. A combined analysis on the items common to both studies revealed that bilingual proficiency was negatively associated with response time (RT), priming, and the degree of translation asymmetry in RTs and priming. An item analysis showed that item difficulty was positively associated with RTs, priming, and the benefit of same-direction over different-direction repetition. Thus, although both participant accuracy and item accuracy are indices of learning, they have distinct effects on translation RTs and on the learning that is captured by the repetition-priming paradigm.
The second version of the L. V. Prasad-functional vision questionnaire.

PubMed

Gothwal, Vijaya K; Sumalini, Rebecca; Bharani, Seelam; Reddy, Shailaja P; Bagga, Deepak K

2012-11-01

The L. V. Prasad-Functional Vision Questionnaire (LVP-FVQ) was developed using Rasch analysis to assess self-reported difficulties in performing daily tasks in school children with visual impairment (VI) in India. However, the LVP-FVQ has psychometric problems of inadequate measurement precision and lack of detailed assessment of dimensionality. Furthermore, items pertaining to use of technology are lacking. The aim of this study was to present the development and validation of the second version of LVP-FVQ (LVP-FVQ II). Development of LVP-FVQ II involved extracting items from other similar questionnaires (albeit developed for Western populations) and focus group discussions of children with VI and their parents that resulted in a 32-item pilot questionnaire. Overall, six items from the LVP-FVQ were retained. The questionnaire underwent pilot testing in 25 such children, following which a 27-item LVP-FVQ II emerged, and this was administered to 150 children with VI. Response to each item was rated on a three-category scale. Rasch analysis was used to validate the LVP-FVQ II. Rating scale was used by participants as was intended to. Four mobility-related items required deletion, as these did not contribute toward measurement of a single construct, indicating a secondary dimension. Deletion of the four items resulted in the 23-item unidimensional LVP-FVQ II, with good measurement precision, effective targeting of item difficulty to participant ability, and lack of notable differential item functioning. The LVP-FVQ II has high reliability, indicating that it is effectively able to discriminate between visual disability of school children in India, and is valid across age, gender, duration of VI, and location of residence. Given the superior measurement properties and the interval-level scores, the LVP-FVQ II appears to offer advantages over LVP-FVQ in assessment of difficulties in performing daily tasks in this population. It can be adapted for use in other developing countries.
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

PubMed Central

Chariker, Julia H.; Naaz, Farah; Pani, John R.

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present paper demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. PMID:22231801

Item difficulty in the evaluation of computer-based instruction: an example from neuroanatomy.

PubMed

Chariker, Julia H; Naaz, Farah; Pani, John R

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present article demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. Copyright © 2011 American Association of Anatomists.
Redintegration, task difficulty, and immediate serial recall tasks.

PubMed

Ritchie, Gabrielle; Tolan, Georgina Anne; Tehan, Gerald

2015-03-01

While current theoretical models remain somewhat inconclusive in their explanation of short-term memory (STM), many theories suggest at least a contribution of long-term memory (LTM) to the short-term system. A number of researchers refer to this process as redintegration (e.g., Schweickert, 1993). Under short-term recall conditions, the current study investigated the effects of redintegration and task difficulty in order to extend research conducted by Neale and Tehan (2007). Thirty participants in Experiment 1 and 26 participants in Experiment 2 completed a serial recall task in which retention interval, presentation rate, and articulatory suppression were used to modify task difficulty. Redintegration was examined by manipulating the characteristics of the to-be-remembered items; lexicality in Experiment 1 and wordlikeness in Experiment 2. Responses were scored based on correct-in-position recall, item scoring, and order accuracy scoring. In line with the Neale and Tehan results, as the difficulty of the task increased so did the effects of redintegration. This was evident in that the advantage for words in Experiment 1 and wordlikeness in Experiment 2 decreased as task difficulty increased. This relationship was observed for item but not order memory, and findings were discussed in relation to the theory of redintegration. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
An index for evaluating difficulty of Chewing Index for chewable tablets.

PubMed

Gupta, Abhay; Chidambaram, Nallaperumal; Khan, Mansoor A

2015-02-01

Chewing difficulty index, a potential measure of difficulty in chewing the chewable tablets, has been described herein as the product of tablet thickness and tablet hardness measured under the diametral loading. The proposed index was evaluated by measuring the dimensions and mechanical strength of commercial and in-house prepared chewable tablets. Data collected on tablets with different thickness but same hardness or tensile strength suggests that the proposed index provides a good assessment of the force needed to chew the chewable tablets. Influence of brief exposure to salivary fluid during chewing on the mechanical strength of the chewable tablets was also evaluated. Thirty seconds exposure to the simulated salivary fluid was also found to significantly reduce (p < 0.05) the hardness and the chewing difficulty index of a number of evaluated chewable tablet drug products.
Item Information in the Rasch Model. Project Psychometric Aspects of Item Banking No. 34. Research Report 88-7.

ERIC Educational Resources Information Center

Engelen, Ron J. H.; And Others

Fisher's information measure for the item difficulty parameter in the Rasch model and its marginal and conditional formulations are investigated. It is shown that expected item information in the unconditional model equals information in the marginal model, provided the assumption of sampling examinees from an ability distribution is made. For the…
Physics 30 Program Machine-Scorable Open-Ended Questions: Unit 2: Electric and Magnetic Forces. Diploma Examinations Program.

ERIC Educational Resources Information Center

Alberta Dept. of Education, Edmonton.

This document outlines the use of machine-scorable open-ended questions for the evaluation of Physics 30 in Alberta. Contents include: (1) an introduction to the questions; (2) sample instruction sheet; (3) fifteen sample items; (4) item information including the key, difficulty, and source of each item; (5) solutions to items having multiple…
Revised multicultural perspective index and measures of depression, life satisfaction, shyness, and self-esteem.

PubMed

Mowrer, Robert R; Parker, Keesha N

2004-12-01

In a 2002 publication, Mowrer and McCarver reported weak but significant correlations (r =.24) between scores on the Multicultural Perspective Index and scores on Neugarten, Havighurst, and Tobin's 1961 Life Satisfaction Index-A and the Life Satisfaction Scale developed in 1985 by Diener, Emmons, Larsen, and Griffin. Using 382 undergraduate students the present study reduced the Index from 42 to 29 items based on each item's correlation with total items. An additional 104 undergraduate students then completed the modified 29-item version, Rosenberg's Self-esteem Scale, Cheek and Buss's Shyness Scale, the Self-rating Depression Scale by Zung, and the Neugarten, et al. Life Satisfaction Index-A. Scores on the modified Index were negatively correlated with those on the Depression and Shyness scales and positively correlated with scores on the Self-esteem and Life Satisfaction scales (p< .05).
Fractionating the Neural Substrates of Incidental Recognition Memory

ERIC Educational Resources Information Center

Greene, Ciara M.; Vidaki, Kleio; Soto, David

2015-01-01

Familiar stimuli are typically accompanied by decreases in neural response relative to the presentation of novel items, but these studies often include explicit instructions to discriminate old and new items; this creates difficulties in partialling out the contribution of top-down intentional orientation to the items based on recognition goals.…
[Difference analysis among majors in medical parasitology exam papers by test item bank proposition].

PubMed

Jia, Lin-Zhi; Ya-Jun, Ma; Cao, Yi; Qian, Fen; Li, Xiang-Yu

2012-04-30

The quality index among "Medical Parasitology" exam papers and measured data for students in three majors from the university in 2010 were compared and analyzed. The exam papers were formed from the test item bank. The alpha reliability coefficients of the three exam papers were above 0.70. The knowledge structure and capacity structure of the exam papers were basically balanced. But the alpha reliability coefficients of the second major was the lowest, mainly due to quality of test items in the exam paper and the failure of revising the index of test item bank in time. This observation demonstrated that revising the test items and their index in the item bank according to the measured data can improve the quality of test item bank proposition and reduce the difference among exam papers.
Teacher Perceived Difficulty in Implementing Differentiated Instructional Strategies in Primary School

ERIC Educational Resources Information Center

Gaitas, Sérgio; Alves Martins, Margarida

2017-01-01

This study analyses teacher perceived difficulty in implementing differentiated instructional strategies in regular classes. The participants were 273 Portuguese primary school teachers with teaching experience ranging from 1 to 33 years. A 39-item questionnaire was used to evaluate teacher perceived difficulty in relation to different…
Measuring and Predicting Graded Reader Difficulty

ERIC Educational Resources Information Center

Holster, Trevor A.; Lake, J. W.; Pellowe, William R.

2017-01-01

This study used many-faceted Rasch measurement to investigate the difficulty of graded readers using a 3-item survey. Book difficulty was compared with Kyoto Level, Yomiyasusa Level, Lexile Level, book length, mean sentence length, and mean word frequency. Word frequency and Kyoto Level were found to be ineffective in predicting students'…
Japanese version of the Dermatology Life Quality Index: validity and reliability in patients with acne.

PubMed

Takahashi, Natsuko; Suzukamo, Yoshimi; Nakamura, Motonobu; Miyachi, Yoshiki; Green, Joseph; Ohya, Yukihiro; Finlay, Andrew Y; Fukuhara, Shunichi

2006-08-03

Patient-reported quality of life is strongly affected by some dermatologic conditions. We developed a Japanese version of the Dermatology Life Quality Index (DLQI-J) and used psychometric methods to examine its validity and reliability. The Japanese version of the DLQI was created from the original (English) version, using a standard method. The DLQI-J was then completed by 197 people, to examine its validity and reliability. Some participants completed the DLQI-J a second time, 3 days later, to examine the reproducibility of their responses. In addition to the DLQI-J, the participants completed parts of the SF-36 and gave data on their demographic and clinical characteristics. Their physicians provided information on the location and clinical severity of the skin disease. The participants reported no difficulties in answering the DLQI-J items. Their mean age was 24.8 years, 77.2% were female, and 78.7% had acne vulgaris. The mean score of DLQI was 3.99(SD: 3.99). The responses were found to be reproducible and stable. Results of principal-component and factor analysis suggested that this scale measured one construct. The correlations of DLQI-J scores with sex or age were very poor, but those with SF-36 scores and with clinical severity were high. The DLQI-J provides valid and reliable data despite having only a small number of items.
Subjective Sleep Related to Post Traumatic Stress Disorder Symptoms among Trauma-Exposed Men and Women.

PubMed

Gibson, Carolyn J; Richards, Anne; Villanueva, Cynthia; Barrientos, Maureen; Neylan, Thomas C; Inslicht, Sabra S

2017-11-27

Sleep difficulty is both a common symptom of posttraumatic stress disorder (PTSD) and a risk factor for the development and maintenance of PTSD symptomatology. Gender differences in sleep following trauma exposure have been posited to contribute to the increased risk for the development of PTSD among women, but the persistence and long-term contributions of these potential differences to the maintenance and severity of PTSD symptoms is unclear. Men and women reporting a history of trauma exposure (n = 112, 63% female) participated in this study. Subjective sleep complaints and PTSD symptom severity were assessed using well-validated measures (Pittsburgh Sleep Quality Index, PTSD Symptom Checklist). Multivariable regression models (full sample and gender-stratified) were used to predict PTSD symptom severity from global, subscale, and individual item sleep parameters, adjusted for gender, age, race/ethnicity, education, and body mass index. In the full sample, traditional measures of sleep quality and sleep disturbance were associated with PTSD symptom severity. Difficulty falling asleep, poor sleep quality, and sleep disturbance from a variety of sources were related to higher PTSD symptom severity in men, while self-reported sleep disturbance related to nightmares and emotional regulation were associated with PTSD symptom severity among women. These findings add to the limited literature on gender-specific risk factors related to sleep and PTSD, and may inform intervention development and implementation related to PTSD severity among vulnerable adults.
Impact of sociodemographic attributes and dental caries on quality of life of intellectual disabled children using ECOHIS

PubMed Central

Aggarwal, Vikram Pal; Mathur, Anmol; Dileep, C.L; Batra, Manu; Makkar, Diljot Kaur

2016-01-01

Objectives To assess the impact of oral health outcomes on Oral Health-Related Quality of Life (OHRQoL) among intellectual disabled children and their families. Methodology OHRQoL based study was conducted among 150 intellectual disabled children students in the North West part of the country, Rajasthan, India. Guardians were asked to complete questionnaire on socioeconomic status and the Early Childhood Oral Health Impact Scale (ECOHIS) on their perception of the children’s OHRQoL. Clinical assessment included dental caries and OHI-S INDEX. Univariate regression analysis was fitted to assess covariates for the prevalence of impacts on OHRQoL. Results 54% of the caregivers reported that their child had an impact on at least one ECOHIS item. Negative impacts were more prevalent on items related to difficulty in eating some foods, difficulty in pronouncing any words and missed preschool, day-care or school. The univariate Poisson regression analysis showed that dental caries was significantly associated with the outcome. The prevalence of any impact on OHRQoL was approximately 1.32 and 2.84 times higher for children with low and higher severity of dental caries respectively when compared with those who were free of caries. Conclusion Patient-oriented outcomes like OHRQoL will enhance our understanding of the relationship between oral health and general health and demonstrate to clinical researchers and practitioners that improving the quality of patient’s well-being go beyond simply treating dental disease and disorders. PMID:27833512
Cross-cultural adaptation and construct validity of the Korean version of a physical activity measure for community-dwelling elderly.

PubMed

Choi, Bongsam

2018-01-01

[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
Increased susceptibility to proactive interference in adults with dyslexia?

PubMed

Bogaerts, Louisa; Szmalec, Arnaud; Hachmann, Wibke M; Page, Mike P A; Woumans, Evy; Duyck, Wouter

2015-01-01

Recent findings show that people with dyslexia have an impairment in serial-order memory. Based on these findings, the present study aimed to test the hypothesis that people with dyslexia have difficulties dealing with proactive interference (PI) in recognition memory. A group of 25 adults with dyslexia and a group of matched controls were subjected to a 2-back recognition task, which required participants to indicate whether an item (mis)matched the item that had been presented 2 trials before. PI was elicited using lure trials in which the item matched the item in the 3-back position instead of the targeted 2-back position. Our results demonstrate that the introduction of lure trials affected 2-back recognition performance more severely in the dyslexic group than in the control group, suggesting greater difficulty in resisting PI in dyslexia.
Psychometric properties of the medical outcomes study sleep scale in Spanish postmenopausal women.

PubMed

Zagalaz-Anula, Noelia; Hita-Contreras, Fidel; Martínez-Amat, Antonio; Cruz-Díaz, David; Lomas-Vega, Rafael

2017-07-01

This study aimed to analyze the reliability and validity of the Spanish version of the Medical Outcomes Study Sleep Scale (MOS-SS), and its ability to discriminate between poor and good sleepers among a Spanish population with vestibular disorders. In all, 121 women (50-76 years old) completed the Spanish version of the MOS-SS. Internal consistency, test-retest reliability, and construct validity (exploratory factor analysis) were analyzed. Concurrent validity was evaluated using the Pittsburgh Sleep Quality Index and the 36-item Short Form Health Survey. To analyze the ability of the MOS-SS scores to discriminate between poor and good sleepers, a receiver-operating characteristic curve analysis was performed. The Spanish version of the MOS-SS showed excellent and substantial reliability in Sleep Problems Index I (two sleep disturbance items, one somnolence item, two sleep adequacy items, and awaken short of breath or with headache) and Sleep Problems Index II (four sleep disturbance items, two somnolence items, two sleep adequacy items, and awaken short of breath or with headache), respectively, and good internal consistency with optimal Cronbach's alpha values in all domains and indexes (0.70-0.90). Factor analysis suggested a coherent four-factor structure (explained variance 70%). In concurrent validity analysis, MOS-SS indexes showed significant and strong correlation with the Pittsburgh Sleep Quality Index total score, and moderate with the 36-item Short Form Health Survey component summaries. Several domains and the two indexes were significantly able to discriminate between poor and good sleepers (P < 0.05). Optimal cut-off points were above 20 for "sleep disturbance" domain, with above 22.22 and above 33.33 for Sleep Problems Index I and II. The Spanish version of the MOS-SS is a valid and reliable instrument, suitable to assess sleep quality in Spanish postmenopausal women, with satisfactory general psychometric properties. It discriminates well between good and poor sleepers.
Task-based learning versus problem-oriented lecture in neurology continuing medical education.

PubMed

Vakani, Farhan; Jafri, Wasim; Ahmad, Amina; Sonawalla, Aziz; Sheerani, Mughis

2014-01-01

To determine whether general practitioners learned better with task-based learning or problem-oriented lecture in a Continuing Medical Education (CME) set-up. Quasi-experimental study. The Aga Khan University, Karachi campus, from April to June 2012. Fifty-nine physicians were given a choice to opt for either Task-based Learning (TBL) or Problem Oriented Lecture (PBL) in a continuing medical education set-up about headaches. The TBL group had 30 participants divided into 10 small groups, and were assigned case-based tasks. The lecture group had 29 participants. Both groups were given a pre and a post-test. Pre/post assessment was done using one-best MCQs. The reliability coefficient of scores for both the groups was estimated through Cronbach's alpha. An item analysis for difficulty and discriminatory indices was calculated for both the groups. Paired t-test was used to determine the difference between pre- and post-test scores of both groups. Independent t-test was used to compare the impact of the two teaching methods in terms of learning through scores produced by MCQ test. Cronbach's alpha was 0.672 for the lecture group and 0.881 for TBL group. Item analysis for difficulty (p) and discriminatory indexes (d) was obtained for both groups. The results for the lecture group showed pre-test (p) = 42% vs. post-test (p) = 43%; pre- test (d) = 0.60 vs. post-test (d) = 0.40. The TBL group showed pre -test (p) = 48% vs. post-test (p) = 70%; pre-test (d) = 0.69 vs. post-test (d) = 0.73. Lecture group pre-/post-test mean scores were (8.52 ± 2.95 vs. 12.41 ± 2.65; p < 0.001), where TBL group showed (9.70 ± 3.65 vs. 14 ± 3.99; p < 0.001). Independent t-test exhibited an insignificant difference at baseline (lecture 8.52 ± 2.95 vs. TBL 9.70 ± 3.65; p = 0.177). The post-scores were not statistically different lecture 12.41 ± 2.65 vs. TBL 14 ± 3.99; p = 0.07). Both delivery methods were found to be equally effective, showing statistically insignificant differences. However, TBL groups' post-test higher mean scores and radical increase in the post-test difficulty index demonstrated improved learning through TBL delivery and calls for further exploration of longitudinal studies in the context of CME.
The use of Subject Matter Experts in Validating an Oral Health-Related Quality of Life measure in Korean.

PubMed

Seo, Jaesung; MacEntee, Michael; Brondani, Mario

2015-09-04

This paper aimed to employ subject matter experts (SMEs) to assess the extent to which the Korean version of the short-form of the OHIP (OHIP-14 K) is culturally valid and equivalent in Korean. We approached 17 bilingual Korean SMEs from which 10 independently rated the clarity, relevance, and cultural equivalence of the OHIP-14 K. SME's varied between 10 and 41 years of clinical experience and were mostly males (# 7). We used Item-level Content Validity Index (I-CVI) to gauge the proportion of SMEs who considered the content of OHIP items (e.g., instruction, response format, etc.) to be culturally valid. We also performed additional analysis to determine the level of agreement between the SMEs. The experts rated most of the items to be clear (S-CVI = 0.93) while having difficulties in assigning relevance of the questions to the expected domains (S-CVI = 0.42). Moreover, considerable disagreement existed among the experts in regard to the relevance (Kfree = 0.19 to 1.00) and the cultural equivalence indexes (ADM = 0.36 to 0.96). The content of the OHIP-14 K for the most part clearly reproduced the language of the original OHIP-14. However, experts disagreed on the relevance and conceptual equivalence of the OHIP-14 K for a Korean population. Patient-oriented outcome measures such as the OHIP can be used across cultures once there are indeed assessing the same domains and constructs of interest. The CVI technique seems to be an alternative tool for evaluating content validity and equivalency of an OHQoL measure. A more refined, culturally relevant version of OHIP-14 K was proposed although there is no available data yet to support a better score validity, reliability and responsiveness of this proposed version.
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

ERIC Educational Resources Information Center

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10

ERIC Educational Resources Information Center

Livingston, Samuel A.; Dorans, Neil J.

2004-01-01

This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…

Psychometric Properties of the Children's Depression Inventory: An Item Response Theory Analysis across Age in a Nonclinical, Longitudinal, Adolescent Sample

ERIC Educational Resources Information Center

Lee, Young-Sun; Krishnan, Anita; Park, Yoon Soo

2012-01-01

The purpose of this study was to investigate psychometric properties of the Children's Depression Inventory within a nonclinical and longitudinal sample (8th and 12th grades). Using the Rasch rating scale, most items represented one dimension. There was adequate separation among items and no overlap between ranges of item difficulties with latent…
Do the Guideline Violations Influence Test Difficulty of High-Stake Test?: An Investigation on University Entrance Examination in Turkey

ERIC Educational Resources Information Center

Atalmis, Erkan Hasan

2016-01-01

Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
Defining Resilience and Vulnerability Based on Ontology Engineering Approach

NASA Astrophysics Data System (ADS)

Kumazawa, T.; Matsui, T.; Endo, A.

2014-12-01

It is necessary to reflect the concepts of resilience and vulnerability into the assessment framework of "Human-Environmental Security", but it is also in difficulty to identify the linkage between both concepts because of the difference of the academic community which has discussed each concept. The authors have been developing the ontology which deals with the sustainability of the social-ecological systems (SESs). Resilience and vulnerability are also the concepts in the target world which this ontology covers. Based on this point, this paper aims at explicating the semantic relationship between the concepts of resilience and vulnerability based on ontology engineering approach. For this purpose, we first examine the definitions of resilience and vulnerability which the existing literatures proposed. Second, we incorporate the definitions in the ontology dealing with sustainability of SESs. Finally, we focus on the "Water-Energy-Food Nexus Index" to assess Human-Environmental Security, and clarify how the concepts of resilience and vulnerability are linked semantically through the concepts included in these index items.
Differential Gender Effects in the Relationship between Perceived Immune Functioning and Autistic Traits.

PubMed

Mackus, Marlou; Kruijff, Deborah de; Otten, Leila S; Kraneveld, Aletta D; Garssen, Johan; Verster, Joris C

2017-04-12

Altered immune functioning has been demonstrated in individuals with autism spectrum disorder (ASD). The current study explores the relationship between perceived immune functioning and experiencing ASD traits in healthy young adults. N = 410 students from Utrecht University completed a survey on immune functioning and autistic traits. In addition to a 1-item perceived immune functioning rating, the Immune Function Questionnaire (IFQ) was completed to assess perceived immune functioning. The Dutch translation of the Autism-Spectrum Quotient (AQ) was completed to examine variation in autistic traits, including the domains "social insights and behavior", "difficulties with change", "communication", "phantasy and imagination", and "detail orientation". The 1-item perceived immune functioning score did not significantly correlate with the total AQ score. However, a significant negative correlation was found between perceived immune functioning and the AQ subscale "difficulties with change" (r = -0.119, p = 0.019). In women, 1-item perceived immune functioning correlated significantly with the AQ subscales "difficulties with change" (r = -0.149, p = 0.029) and "communication" (r = -0.145, p = 0.032). In men, none of the AQ subscales significantly correlated with 1-item perceived immune functioning. In conclusion, a modest relationship between perceived immune functioning and several autistic traits was found.
Modeling Booklet Effects for Nonequivalent Group Designs in Large-Scale Assessment

ERIC Educational Resources Information Center

Hecht, Martin; Weirich, Sebastian; Siegle, Thilo; Frey, Andreas

2015-01-01

Multiple matrix designs are commonly used in large-scale assessments to distribute test items to students. These designs comprise several booklets, each containing a subset of the complete item pool. Besides reducing the test burden of individual students, using various booklets allows aligning the difficulty of the presented items to the assumed…
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties

ERIC Educational Resources Information Center

Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.

2010-01-01

This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…
Regression Effects in Angoff Ratings: Examples from Credentialing Exams

ERIC Educational Resources Information Center

Wyse, Adam E.

2018-01-01

This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…
A Five-Year Evaluation of Examination Structure in a Cardiovascular Pharmacotherapy Course

PubMed Central

Kolar, Claire; Janke, Kristin K.

2015-01-01

Objective. To evaluate the composition and effectiveness as an assessment tool of a criterion-referenced examination comprised of clinical cases tied to practice decisions, to examine the effect of varying audience response system (ARS) questions on student examination preparation, and to articulate guidelines for structuring examinations to maximize evaluation of student learning. Design. Multiple-choice items developed over 5 years were evaluated using Bloom’s Taxonomy classification, point biserial correlation, item difficulty, and grade distribution. In addition, examination items were classified into categories based on similarity to items used in ARS preparation. Assessment. As the number of items directly tied to clinical practice rose, Bloom’s Taxonomy level and item difficulty also rose. In examination years where Bloom’s levels were high but preparation was minimal, average grade distribution was lower compared with years in which student preparation was higher. Conclusion. Criterion-referenced examinations can benefit from systematic evaluation of their composition and effectiveness as assessment tools. Calculated design and delivery of classroom preparation is an asset in improving examination performance on rigorous, practice-relevant examinations. PMID:27168611
Performance of the Generalized S-X[Superscript 2] Item Fit Index for Polytomous IRT Models

ERIC Educational Resources Information Center

Kang, Taehoon; Chen, Troy T.

2008-01-01

Orlando and Thissen's S-X[superscript 2] item fit index has performed better than traditional item fit statistics such as Yen' s Q[subscript 1] and McKinley and Mill' s G[superscript 2] for dichotomous item response theory (IRT) models. This study extends the utility of S-X[superscript 2] to polytomous IRT models, including the generalized partial…
Effects of spacing of item repetitions in continuous recognition memory: does item retrieval difficulty promote item retention in older adults?

PubMed

Kılıç, Aslı; Hoyer, William J; Howard, Marc W

2013-01-01

BACKGROUND/STUDY CONTEXT: Older adults exhibit an age-related deficit in item memory as a function of the length of the retention interval, but older adults and young adults usually show roughly equivalent benefits due to the spacing of item repetitions in continuous memory tasks. The current experiment investigates the seemingly paradoxical effects of retention interval and spacing in young and older adults using a continuous recognition memory procedure. Fifty young adults and 52 older adults gave memory confidence ratings to words that were presented once (P1), twice (P2), or three times (P3), and the effects of the lag length and retention interval were assessed at P2 and at P3, respectively. Response times at P2 were disproportionately longer for older adults than for younger adults as a function of the number of items occurring between P1 and P2, suggestive of age-related loss in item memory. Ratings of confidence in memory responses revealed that older adults remembered fewer items at P2 with a high degree of certainty. Confidence ratings given at P3 suggested that young and older adults derived equivalent benefits from the spacing between P1 and P2. Findings of this study support theoretical accounts that suggest that recursive reminding and/or item retrieval difficulty promote item retention in older adults.
The Social, Emotional and Behavioural Difficulties of Primary School Children with Poor Attendance Records

ERIC Educational Resources Information Center

Carroll, H. C. M.

2013-01-01

Two complementary studies of poor and better attenders are presented. To measure emotional and behavioural difficulties (EBD) different teacher-completed rating scales were employed, and to determine social difficulties, the studies used sociometry and some items from the scales. One study had a longitudinal design. It revealed that, after…
The consumer quality index anthroposophic healthcare: a construction and validation study.

PubMed

Koster, Evi B; Ong, Rob R S; Heybroek, Rachel; Delnoij, Diana M J; Baars, Erik W

2014-04-02

Accounting for the patients' perspective on quality of care has become increasingly important in the development of Evidence Based Medicine as well as in governmental policies. In the Netherlands the Consumer Quality (CQ) Index has been developed to measure the quality of care from the patients' perspective in different healthcare sectors in a standardized manner. Although the scientific accountability of anthroposophic healthcare as a form of integrative medicine is growing, patient experiences with anthroposophic healthcare have not been measured systematically. In addition, the specific anthroposophic aspects are not measured by means of existing CQ Indexes. To enable accountability of quality of the anthroposophic healthcare from the patients' perspective the aim of this study is the construction and validation of a CQ Index for anthroposophic healthcare. Construction in three phases: Phase 1. Determining anthroposophic quality aspects: literature study and focus groups. Phase 2. Adding new questions and validating the new questionnaire. Research population: random sample from 7910 patients of 22 anthroposophic GPs. survey, mixed mode by means of the Dillman method. Measuring instrument: experience questionnaire: CQ Index General Practice (56 items), added with 27 new anthroposophic items added and an item-importance questionnaire (anthroposophic items only). Factor analysis, scale construction, internal consistency (Chronbach's Alpha), inter-item-correlation, discriminative ability (Intra Class Correlation) and inter-factor-correlations. Phase 3. Modulation and selection of new questions based on results. Criteria of retaining items: general: a limited amount of items, statistical: part of a reliable scale and inter-item-correlation <0,7, and theoretical. Phase 1. 27 anthroposophic items. Phase 2. Two new anthroposophic scales: Scale AntroposophicTreatmentGP: seven items, Alpha=0,832, ICC=4,2 Inter-factor-correlation with existing GP-scales range from r=0,24 (Accessibility) to r=0,56 (TailoredCare). Scale InteractionalStyleGP: five items, Alpha=0,810, ICC=5,8, Inter-factor-correlation with existing GP-scales range from r=0,32 (Accessibility) to r=0,76 (TailoredCare). Inter-factor-correlation between new scales: r=0,50. Phase 3: Adding both scales and four single items. Removing eleven items and reformulating two items. The CQ Index Anthroposophic Healthcare measures patient experiences with anthroposophic GP's validly and reliably. Regarding the inter-factor-correlations anthroposophic quality aspects from the patients' perspective are mostly associated with individually tailored care and patient centeredness.
Middle school students' reading comprehension of mathematical texts and algebraic equations

NASA Astrophysics Data System (ADS)

Duru, Adem; Koklu, Onder

2011-06-01

In this study, middle school students' abilities to translate mathematical texts into algebraic representations and vice versa were investigated. In addition, students' difficulties in making such translations and the potential sources for these difficulties were also explored. Both qualitative and quantitative methods were used to collect data for this study: questionnaire and clinical interviews. The questionnaire consisted of two general types of items: (1) selected-response (multiple-choice) items for which the respondent selects from multiple options and (2) open-ended items for which the respondent constructs a response. In order to further investigate the students' strategies while they were translating the given mathematical texts to algebraic equations and vice versa, five randomly chosen (n = 5) students were interviewed. Data were collected in the 2007-2008 school year from 185 middle-school students in five teachers' classrooms in three different schools in the city of Adıyaman, Turkey. After the analysis of data, it was found that students who participated in this study had difficulties in translating the mathematical texts into algebraic equations by using symbols. It was also observed that these students had difficulties in translating the symbolic representations into mathematical texts because of their weak reading comprehension. In addition, finding of this research revealed that students' difficulties in translating the given mathematical texts into symbolic representations or vice versa come from different sources.
The perceptual learning of time-compressed speech: A comparison of training protocols with different levels of difficulty

PubMed Central

Gabay, Yafit; Karni, Avi; Banai, Karen

2017-01-01

Speech perception can improve substantially with practice (perceptual learning) even in adults. Here we compared the effects of four training protocols that differed in whether and how task difficulty was changed during a training session, in terms of the gains attained and the ability to apply (transfer) these gains to previously un-encountered items (tokens) and to different talkers. Participants trained in judging the semantic plausibility of sentences presented as time-compressed speech and were tested on their ability to reproduce, in writing, the target sentences; trail-by-trial feedback was afforded in all training conditions. In two conditions task difficulty (low or high compression) was kept constant throughout the training session, whereas in the other two conditions task difficulty was changed in an adaptive manner (incrementally from easy to difficult, or using a staircase procedure). Compared to a control group (no training), all four protocols resulted in significant post-training improvement in the ability to reproduce the trained sentences accurately. However, training in the constant-high-compression protocol elicited the smallest gains in deciphering and reproducing trained items and in reproducing novel, untrained, items after training. Overall, these results suggest that training procedures that start off with relatively little signal distortion (“easy” items, not far removed from standard speech) may be advantageous compared to conditions wherein severe distortions are presented to participants from the very beginning of the training session. PMID:28545039
Adaptive Mental Testing: The State of the Art

DTIC Science & Technology

1979-11-01

typically vary in their psychometric properties --particularly in their difficulty--the test designer must decide what configuration of these item...psychometric properties best suits the test’s purpose. There are two extreme ration- ales to guide that decision. One rationale is to choose items that are...development of item response theory (Rasch, 1960; Lord, 1952, 1970, 1974a; Birnbaum, 1968) that provided the needed invariance properties for item
An Empirical Bayes Approach to Item Banking. Project Psychometric Aspects of Item Banking No. 6. Research Report 86-6.

ERIC Educational Resources Information Center

van der Linden, Wim J.; Eggen, Theo J. H. M.

A procedure for the sequential optimization of the calibration of an item bank is given. The procedure is based on an empirical Bayes approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items in which ties are allowed to occur. First, it is indicated how a paired-comparisons design…
Assessment of item-writing flaws in multiple-choice questions.

PubMed

Nedeau-Cayo, Rosemarie; Laughlin, Deborah; Rus, Linda; Hall, John

2013-01-01

This study evaluated the quality of multiple-choice questions used in a hospital's e-learning system. Constructing well-written questions is fraught with difficulty, and item-writing flaws are common. Study results revealed that most items contained flaws and were written at the knowledge/comprehension level. Few items had linked objectives, and no association was found between the presence of objectives and flaws. Recommendations include education for writing test questions.
Comparison of the functional rating index and the 18-item Roland-Morris Disability Questionnaire: responsiveness and reliability.

PubMed

Chansirinukor, Wunpen; Maher, Christopher G; Latimer, Jane; Hush, Julia

2005-01-01

Retrospective design. To compare the responsiveness and test-retest reliability of the Functional Rating Index and the 18-item version of the Roland-Morris Disability Questionnaire in detecting change in disability in patients with work-related low back pain. Many low back pain-specific disability questionnaires are available, including the Functional Rating Index and the 18-item version of the Roland-Morris Disability Questionnaire. No previous study has compared the responsiveness and reliability of these questionnaires. Files of patients who had been treated for work-related low back pain at a physical therapy clinic were reviewed, and those containing initial and follow-up Functional Rating Index and 18-item Roland-Morris Disability Questionnaires were selected. The responsiveness of both questionnaires was compared using two different methods. First, using the assumption that patients receiving treatment improve over time, various responsiveness coefficients were calculated. Second, using change in work status as an external criterion to identify improved and nonimproved patients, Spearman's rho and receiver operating characteristic curves were calculated. Reliability was estimated from the subset of patients who reported no change in their condition over this period and expressed with the intraclass correlation coefficient and the minimal detectable change. One hundred and forty-three patient files were retrieved. The responsiveness coefficients for the Functional Rating Index were greater than for the 18-item Roland-Morris Disability Questionnaire. The intraclass correlation coefficient values for both questionnaires calculated from 96 patient files were similar, but the minimal detectable change for the Functional Rating Index was less than for the 18-item Roland-Morris Disability Questionnaire. The Functional Rating Index seems preferable to the 18-item Roland-Morris Disability Questionnaire for use in clinical trials and clinical practice.
Spatial short-term memory in children with nonverbal learning disabilities: impairment in encoding spatial configuration.

PubMed

Narimoto, Tadamasa; Matsuura, Naomi; Takezawa, Tomohiro; Mitsuhashi, Yoshinori; Hiratani, Michio

2013-01-01

The authors investigated whether impaired spatial short-term memory exhibited by children with nonverbal learning disabilities is due to a problem in the encoding process. Children with or without nonverbal learning disabilities performed a simple spatial test that required them to remember 3, 5, or 7 spatial items presented simultaneously in random positions (i.e., spatial configuration) and to decide if a target item was changed or all items including the target were in the same position. The results showed that, even when the spatial positions in the encoding and probe phases were similar, the mean proportion correct of children with nonverbal learning disabilities was 0.58 while that of children without nonverbal learning disabilities was 0.84. The authors argue with the results that children with nonverbal learning disabilities have difficulty encoding relational information between spatial items, and that this difficulty is responsible for their impaired spatial short-term memory.
Application of Computerized Adaptive Testing to Entrance Examination for Graduate Studies in Turkey

ERIC Educational Resources Information Center

Bulut, Okan; Kan, Adnan

2012-01-01

Problem Statement: Computerized adaptive testing (CAT) is a sophisticated and efficient way of delivering examinations. In CAT, items for each examinee are selected from an item bank based on the examinee's responses to the items. In this way, the difficulty level of the test is adjusted based on the examinee's ability level. Instead of…

Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.

ERIC Educational Resources Information Center

O'Neill, Thomas R.; Lunz, Mary E.

To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…
Rasch Based Analysis of Oral Proficiency Test Data.

ERIC Educational Resources Information Center

Nakamura, Yuji

2001-01-01

This paper examines the rating scale data of oral proficiency tests analyzed by a Rasch Analysis focusing on an item map and factor analysis. In discussing the item map, the difficulty order of six items and students' answering patterns are analyzed using descriptive statistics and measures of central tendency of test scores. The data ranks the…
Application of Item Analysis to Assess Multiple-Choice Examinations in the Mississippi Master Cattle Producer Program

ERIC Educational Resources Information Center

Parish, Jane A.; Karisch, Brandi B.

2013-01-01

Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…
Exploring the Manifestations of Anxiety in Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Hallett, Victoria; Lecavalier, Luc; Sukhodolsky, Denis G.; Cipriano, Noreen; Aman, Michael G.; McCracken, James T.; McDougle, Christopher J.; Tierney, Elaine; King, Bryan H.; Hollander, Eric; Sikich, Linmarie; Bregman, Joel; Anagnostou, Evdokia; Donnelly, Craig; Katsovich, Lily; Dukes, Kimberly; Vitiello, Benedetto; Gadow, Kenneth; Scahill, Lawrence

2013-01-01

This study explores the manifestation and measurement of anxiety symptoms in 415 children with ASDs on a 20-item, parent-rated, DSM-IV referenced anxiety scale. In both high and low-functioning children (IQ above vs. below 70), commonly endorsed items assessed restlessness, tension and sleep difficulties. Items requiring verbal expression of worry…
Sensitivity of Equated Aggregate Scores to the Treatment of Misbehaving Common Items

ERIC Educational Resources Information Center

Michaelides, Michalis P.

2010-01-01

The delta-plot method (Angoff, 1972) is a graphical technique used in the context of test equating for identifying common items with aberrant changes in their item difficulties across administrations or alternate forms. This brief research report explores the effects on equated aggregate scores when delta-plot outliers are either retained in or…
Response pattern of depressive symptoms among college students: What lies behind items of the Beck Depression Inventory-II?

PubMed

de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang

2018-07-01

This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
Using an analytical hierarchy process (AHP) for weighting items of a measurement scale: a pilot study.

PubMed

Benaïm, C; Perennou, D-A; Pelissier, J-Y; Daures, J-P

2010-02-01

Many clinical scales contain items that are scored separately prior to being compiled into a single score. However, if the items have different degrees of importance, they should be weighted differently before being compiled. The principal aims of this study were to show how the "analytic hierarchy process" (AHP), which has never been used for this purpose, can be applied to weighting the six items of the "London handicap scale", and to compare the AHP to the "conjoint analysis" (CA), which was previously implemented by Harwood et al. (1994) [1]. In order to assess the relative importance of the six items, we submitted AHP and CA to a group of 10 physiatrists. We compared the methods in terms of item ranking according to importance, assessment of fictitious patients based on weights determined by each method, and perceived difficulty by the physiatrist. For both techniques, "Physical independence" (PHY) was the best-weighted item, but other ranks varied depending on the technique. AHP was better than CA in terms of accuracy (global assessment of the clinical status) and perceived difficulty. AHP may be used to reveal the importance that experts assign to the items of a multidimensional scale, and to calculate the appropriate weights for specific items. For this purpose, AHP seems to be more accurate than CA.
Validity of a Protocol for Adult Self-Report of Dyslexia and Related Difficulties

ERIC Educational Resources Information Center

Snowling, Margaret; Dawes, Piers; Nash, Hannah; Hulme, Charles

2012-01-01

Background: There is an increased prevalence of reading and related difficulties in children of dyslexic parents. In order to understand the causes of these difficulties, it is important to quantify the risk factors passed from parents to their offspring. Method: 417 adults completed a protocol comprising a 15-item questionnaire rating reading and…
Influence of dominant- as compared with nondominant-side symptoms on Disabilities of the Arm, Shoulder and Hand and Western Ontario Rotator Cuff scores in patients with rotator cuff tendinopathy.

PubMed

Christiansen, David Høyrup; Michener, Lori; Roy, Jean-Sébastien

2018-02-13

The Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire and the Western Ontario Rotator Cuff (WORC) index are 2 widely used patient-reported questionnaires in individuals with rotator cuff (RC) tendinopathy. In contrast to the WORC index, for which the items are specific to the affected shoulder, the items of the DASH questionnaire assess the ability to perform activities regardless of the arm used. The objective of this study is to determine whether scores on the DASH questionnaire and WORC index are affected if the symptoms are on the dominant or nondominant side in individuals with RC tendinopathy. Given the number of items that can be influenced by dominance, the hypothesis is that DASH scores will be impacted by the side of the symptoms. Individuals with RC tendinopathy (N = 149) completed questions on symptomatology and hand dominance, the DASH questionnaire, and the WORC index. Differences in total scores (independent t test) and single items (Wilcoxon rank sum test) were compared between groups of participants with dominant-side symptoms and those without dominant-side symptoms. No significant differences were observed for WORC or DASH total scores when comparing participants with and without symptoms on their dominant side. Single-item comparison revealed more items being affected by symptom side on the DASH questionnaire (6 of 30 items) than on the WORC index (2 of 21 items). The side of the symptoms does not influence the DASH and WORC total scores, as there are no systematic differences between individuals with and without symptoms in their dominant shoulder. However, the presence of dominant symptoms does influence item scores more on the DASH questionnaire than on the WORC index. Copyright © 2018 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
An Investigation of the Performance of the Generalized S-X[superscript 2] Item-Fit Index for Polytomous IRT Models. ACT Research Report Series, 2007-1

ERIC Educational Resources Information Center

Kang, Taehoon; Chen, Troy T.

2007-01-01

Orlando and Thissen (2000, 2003) proposed an item-fit index, S-X[superscript 2], for dichotomous item response theory (IRT) models, which has performed better than traditional item-fit statistics such as Yen's (1981) Q[subscript 1] and McKinley and Mill's (1985) G[superscript 2]. This study extends the utility of S-X[superscript 2] to polytomous…
Instrument validation process: a case study using the Paediatric Pain Knowledge and Attitudes Questionnaire.

PubMed

Peirce, Deborah; Brown, Janie; Corkish, Victoria; Lane, Marguerite; Wilson, Sally

2016-06-01

To compare two methods of calculating interrater agreement while determining content validity of the Paediatric Pain Knowledge and Attitudes Questionnaire for use with Australian nurses. Paediatric pain assessment and management documentation was found to be suboptimal revealing a need to assess paediatric nurses' knowledge and attitude to pain. The Paediatric Pain Knowledge and Attitudes Questionnaire was selected as it had been reported as valid and reliable in the United Kingdom with student nurses. The questionnaire required content validity determination prior to use in the Australian context. A two phase process of expert review. Ten paediatric nurses completed a relevancy rating of all 68 questionnaire items. In phase two, five pain experts reviewed the items of the questionnaire that scored an unacceptable item level content validity. Item and scale level content validity indices and intraclass correlation coefficients were calculated. In phase one, 31 items received an item level content validity index <0·78 and the scale level content validity index average was 0·80 which were below levels required for acceptable validity. The intraclass correlation coefficient was 0·47. In phase two, 10 items were amended and four items deleted. The revised questionnaire provided a scale level content validity index average >0·90 and an intraclass correlation coefficient of 0·94 demonstrating excellent agreement between raters therefore acceptable content validity. Equivalent outcomes were achieved using the content validity index and the intraclass correlation coefficient. To assess content validity the content validity index has the advantage of providing an item level score and is a simple calculation. The intraclass correlation coefficient requires statistical knowledge, or support, and has the advantage of accounting for the possibility of chance agreement. © 2016 John Wiley & Sons Ltd.
TEDS-M 2008 User Guide for the International Database. Supplement 4: TEDS-M Released Mathematics and Mathematics Pedagogy Knowledge Assessment Items

ERIC Educational Resources Information Center

Brese, Falk, Ed.

2012-01-01

The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…
Development of the Computer-Adaptive Version of the Late-Life Function and Disability Instrument

PubMed Central

Tian, Feng; Kopits, Ilona M.; Moed, Richard; Pardasaney, Poonam K.; Jette, Alan M.

2012-01-01

Background. Having psychometrically strong disability measures that minimize response burden is important in assessing of older adults. Methods. Using the original 48 items from the Late-Life Function and Disability Instrument and newly developed items, a 158-item Activity Limitation and a 62-item Participation Restriction item pool were developed. The item pools were administered to a convenience sample of 520 community-dwelling adults 60 years or older. Confirmatory factor analysis and item response theory were employed to identify content structure, calibrate items, and build the computer-adaptive testings (CATs). We evaluated real-data simulations of 10-item CAT subscales. We collected data from 102 older adults to validate the 10-item CATs against the Veteran’s Short Form-36 and assessed test–retest reliability in a subsample of 57 subjects. Results. Confirmatory factor analysis revealed a bifactor structure, and multi-dimensional item response theory was used to calibrate an overall Activity Limitation Scale (141 items) and an overall Participation Restriction Scale (55 items). Fit statistics were acceptable (Activity Limitation: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.03; Participation Restriction: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.05). Correlation of 10-item CATs with full item banks were substantial (Activity Limitation: r = .90; Participation Restriction: r = .95). Test–retest reliability estimates were high (Activity Limitation: r = .85; Participation Restriction r = .80). Strength and pattern of correlations with Veteran’s Short Form-36 subscales were as hypothesized. Each CAT, on average, took 3.56 minutes to administer. Conclusions. The Late-Life Function and Disability Instrument CATs demonstrated strong reliability, validity, accuracy, and precision. The Late-Life Function and Disability Instrument CAT can achieve psychometrically sound disability assessment in older persons while reducing respondent burden. Further research is needed to assess their ability to measure change in older adults. PMID:22546960
Validation of a clinical critical thinking skills test in nursing.

PubMed

Shin, Sujin; Jung, Dukyoo; Kim, Sungeun

2015-01-27

The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing

PubMed Central

2015-01-01

Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
Identifying Measurement Disturbance Effects Using Rasch Item Fit Statistics and the Logit Residual Index.

ERIC Educational Resources Information Center

Mount, Robert E.; Schumacker, Randall E.

1998-01-01

A Monte Carlo study was conducted using simulated dichotomous data to determine the effects of guessing on Rasch item fit statistics and the Logit Residual Index. Results indicate that no significant differences were found between the mean Rasch item fit statistics for each distribution type as the probability of guessing the correct answer…
Anxiety and depression in chronic hemodialysis: some somatopsychic determinants.

PubMed

Jadoulle, V; Hoyois, P; Jadoul, M

2005-02-01

Depression and anxiety are so common in hemodialysis (HD) patients that we found it useful to study the respective contributions of the subjective somatic sensations and of the objective medical comorbidity to psychological distress. We also hypothesized that denial has a protective effect against anxiety and depression, and that alexithymia is, on the contrary, a risk factor. In a cross-sectional design, we investigated relationships between psychological distress and somatic complaints, Charlson comorbidity index, denial and alexithymia, in a group of 54 patients on incenter HD. They filled psychometric self-rated questionnaires in (State Anxiety Inventory, Hospital Anxiety and Depression Scale, 13-item Short Beck Depression Inventory, Kidney Disease Quality of Life Short Form, 20-item Toronto Alexithymia Scale). A principal component analysis allowed us to focus on HADS-total score, which was confirmed to be representative of anxio-depression. Then, correlational analyses and a stepwise regression analysis were performed. HADS-total score is inversely associated with the use of denial as a psychological defence mechanism (p < 0.001), and positively correlated with difficulties in identifying emotions (p < 0.001), with difficulties in expressing feelings (p < 0.05), and with the intensity of subjective somatic complaints (p < 0.001). On the contrary, it is not related to the somatic comorbidity. In the stepwise regression, the somatic complaints, the denial and the difficulties in recognizing emotions emerge as the three main variables related to the HADS-total score (p < 0.001). Subjective physical complaints are here associated with psychological distress in chronic HD patients, while objective organic comorbidity does not seem to influence their mood and anxiety status. Denial is an efficient coping style against negative emotions, but it can diminish compliance. So, the subjective perception of the disease seems to have an important impact on the anxiety and mood levels, which can also be influenced by the emotional regulation abilities.
A Comparison of Different Psychometric Approaches to Modeling Testlet Structures: An Example with C-Tests

ERIC Educational Resources Information Center

Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan

2014-01-01

C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test Data

ERIC Educational Resources Information Center

Magno, Carlo

2009-01-01

The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…
The Accuracy of Estimated Total Test Statistics. Final Report.

ERIC Educational Resources Information Center

Kleinke, David J.

In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…

Are Faculty Predictions or Item Taxonomies Useful for Estimating the Outcome of Multiple-Choice Examinations?

ERIC Educational Resources Information Center

Kibble, Jonathan D.; Johnson, Teresa

2011-01-01

The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The…
Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

ERIC Educational Resources Information Center

Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

2017-01-01

Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
The Effect of Sequential Dependence on the Sampling Distributions of KR-20, KR-21, and Split-Halves Reliabilities.

ERIC Educational Resources Information Center

Sullins, Walter L.

Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…
Exploring Alternative Conceptions from Newtonian Dynamics and Simple DC Circuits: Links between Item Difficulty and Item Confidence

ERIC Educational Resources Information Center

Planinic, Maja; Boone, William J.; Krsnik, Rudolf; Beilfuss, Meredith L.

2006-01-01

Croatian 1st-year and 3rd-year high-school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear…
A measure of early physical functioning (EPF) post-stroke.

PubMed

Finch, Lois E; Higgins, Johanne; Wood-Dauphinee, Sharon; Mayo, Nancy E

2008-07-01

To develop a comprehensive measure of Early Physical Functioning (EPF) post-stroke quantified through Rasch analysis and conceptualized using the International Classification of Functioning Disability and Health (ICF). An observational cohort study. A cohort of 262 subjects (mean age 71.6 (standard deviation 12.5) years) hospitalized post-acute stroke. Functional assessments were made within 3 days of stroke with items from valid and reliable indices commonly utilized to evaluate stroke survivors. Information on important variables was also collected. Principal component and Rasch analysis confirmed the factor structure, and dimensionality of the measure. Rasch analysis combined items across ICF components to develop the measure. Items were deleted iteratively, those retained fit the model and were related to the construct; reliability and validity were assessed. A 38-item unidimensional measure of the EPF met all Rasch model requirements. The item difficulty matched the person ability (mean person measure: -0.31; standard error 0.37 logits), reliability of the person-item-hierarchy was excellent at 0.97. Initial validity was adequate. The 38-item EPF measure was developed. It expands the range of assessment post acute stroke; it covers a broad spectrum of difficulty with good initial psychometric properties that, once revalidated, can assist in planning and evaluating early interventions.
Everyday technology use among people with mental retardation: relevance, perceived difficulty, and influencing factors.

PubMed

Hällgren, Monica; Nygård, Louise; Kottorp, Anders

2014-05-01

While the development and possibilities of technology today are commonly regarded to be unlimited, knowledge regarding the technological needs of people with mental retardation is fairly limited. The aim of this study was to enhance knowledge of perceived relevance and difficulty in using everyday technology (ET) such as stoves, cell phones, and elevators in adults with mental retardation. 120 participants with different levels of mental retardation were interviewed with the Everyday Technology Use Questionnaire (ETUQ) about their use of such technologies in their everyday life. Analyses of variance, post hoc tests, and regression analyses were used to explore the data. Participants with moderate and severe mental retardation differed in mean perceived difficulty from those with mild mental retardation, suggesting that increased perceived difficulty in ET use is related to the level of mental retardation. Differences between groups were also found in the proportion of items that were relevant for each person. The variables Level of Mental Retardation, Additional Disabilities, and Proportional Relevance of ET Items could together predict 67.2% of the variation in perceived difficulty in technology use. The findings also indicate that age, housing, gender, and geographical district do not covariate with perceived difficulty in ET use.
BCQ+: a body constitution questionnaire to assess Yang-Xu. Part I: establishment of a first final version through a Delphi process.

PubMed

Su, Yi-Chang; Chen, Li-Li; Lin, Jun-Dai; Lin, Jui-Shan; Huang, Yi-Chia; Lai, Jim-Shoung

2008-12-01

Assessing an individual's level of Yang deficiency (Yang-Xu) by its manifestations is a frequent issue in traditional Chinese medicine (TCM) clinical trials. To this end, an objective, reliable and rigorous diagnostic tool is required. This study aimed to develop a first final version of the Yang-Xu Constitution Questionnaire. We conducted 3 steps to develop such an objective measurement tool: 1) the research team was formed and a panel of 26 experts was selected for the Delphi process; 2) items for the questionnaire were generated by literature review and a Delphi process; items were reworded into colloquial questions; face and content validity of the items were evaluated through a Delphi process again; 3) the difficulty of the questionnaire was evaluated in a pilot study with 81 subjects aged 20-60 years. The literature review retrieved 35 relevant items which matched the definition of 'constitution' and 'Yang-Xu'. After a first Delphi process, 22 items were retained and translated into colloquial questions. According to the second part of the Delphi process, the content validity index of each of the 22 questions ranged between 0.85-1. These 22 questions were evaluated by 81 subjects, 2 questions that were hard to tell the difference were combined; 3 questions were modified after the research team had discussed the participants' feedback. Finally, the questionnaire was established with 21 questions. This first final version of a questionnaire to assess Yang-Xu constitution with considerable face and content validity may serve as a basis to develop an advanced Yang-Xu questionnaire. 2008 S. Karger AG, Basel.
How well do TTM measures work among a sample of individuals with unhealthy alcohol use that is characterized by low readiness to change?

PubMed

Baumann, Sophie; Gaertner, Beate; Schnuerer, Inga; Bischof, Gallus; John, Ulrich; Freyer-Adam, Jennis

2013-09-01

Little is known about the applicability of the transtheoretical model of intentional behavior change (TTM) to individuals with unhealthy alcohol use that is primarily characterized by low readiness to change. This study examined the psychometric properties of short measures by assessing three core constructs of the TTM: the 20-item Processes of Change (POC-20) scale, and short versions of the Alcohol Decisional Balance Scale (ADBS) and the Alcohol Abstinence Self-Efficacy (AASE) scale. A sample of 427 individuals with unhealthy alcohol use (Mage = 30 years, 65% men), identified at job agencies in northeastern Germany, completed all three scales. Item difficulty (d), selectivity (rit), and Cronbach's alpha were calculated. Confirmatory factory analyses were used to test for construct validity and latent mean differences across the stages. The psychometric properties of the 8-item AASE were adequate (d range: 0.59-0.78; rit range: 0.59-0.68; α range: 0.74-0.81), except for one subscale. Most items of the POC-20 and the 10-item ADBS were difficult (dPOC range: 0.08-0.40; dADBS range: 0.21-0.58); selectivity (ritPOC range: 0.26-0.62; ritADBS range: 0.34-0.68) and internal consistency (αPOC range: 0.41-0.76; αADBS range: 0.64-0.78) were low to moderate. Construct validity was acceptable (Comparative Fit Index range: 0.95-0.99). The association between stages and TTM constructs partially followed expected patterns. Suggestions for modifications of TTM measures are discussed for better applicability among proactively recruited samples of individuals with unhealthy alcohol use and with primarily low readiness to change. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Building an Evaluation Scale using Item Response Theory.

PubMed

Lalor, John P; Wu, Hao; Yu, Hong

2016-11-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory

PubMed Central

Lalor, John P.; Wu, Hao; Yu, Hong

2016-01-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
Using wound care algorithms: a content validation study.

PubMed

Beitz, J M; van Rijswijk, L

1999-09-01

Valid and reliable heuristic devices facilitating optimal wound care are lacking. The objectives of this study were to establish content validation data for a set of wound care algorithms, to identify their associated strengths and weaknesses, and to gain insight into the wound care decision-making process. Forty-four registered nurse wound care experts were surveyed and interviewed at national and regional educational meetings. Using a cross-sectional study design and an 83-item, 4-point Likert-type scale, this purposive sample was asked to quantify the degree of validity of the algorithms' decisions and components. Participants' comments were tape-recorded, transcribed, and themes were derived. On a scale of 1 to 4, the mean score of the entire instrument was 3.47 (SD +/- 0.87), the instrument's Content Validity Index was 0.86, and the individual Content Validity Index of 34 of 44 participants was > 0.8. Item scores were lower for those related to packing deep wounds (P < .001). No other significant differences were observed. Qualitative data analysis revealed themes of difficulty associated with wound assessment and care issues, that is, the absence of valid and reliable definitions. The wound care algorithms studied proved valid. However, the lack of valid and reliable wound assessment and care definitions hinders optimal use of these instruments. Further research documenting their clinical use is warranted. Research-based practice recommendations should direct the development of future valid and reliable algorithms designed to help nurses provide optimal wound care.
Validation of the Brazilian version of the 'Spanish Burnout Inventory' in teachers.

PubMed

Gil-Monte, Pedro R; Carlotto, Mary Sandra; Câmara, Sheila Gonçalves

2010-02-01

To assess factorial validity and internal consistency of the Brazilian version of the 'Spanish Burnout Inventory' (SBI). The translation process of the SBI into Brazilian Portuguese included translation, back translation, and semantic equivalence. A confirmatory factor analysis was carried out using a four-factor model, which was similar to the original SBI. The sample consisted of 714 teachers working in schools in the metropolitan area of the city of Porto Alegre, Southern Brazil, in 2008. The instrument comprises 20 items and four subscales: Enthusiasm towards job (5 items), Psychological exhaustion (4 items), Indolence (6 items), and Guilt (5 items). The model was analyzed using LISREL 8. Goodness-of-Fit statistics showed that the hypothesized model had adequate fit: chi2(164) = 605.86 (p<0.000); Goodness-of-Fit Index = 0.92; Adjusted Goodness-of-Fit Index = 0.90; Root Mean Square Error of Approximation = 0.062; Nonnormed Fit Index = 0.91; Comparative Fit Index = 0.92; and Parsimony Normed Fit Index = 0.77. Cronbach's alpha measures for all subscales were higher than 0.70. The study showed that the SBI has adequate factorial validity and internal consistency to assess burnout in Brazilian teachers.
Rasch measurement: the Arm Activity measure (ArmA) passive function sub-scale.

PubMed

Ashford, Stephen; Siegert, Richard J; Alexandrescu, Roxana

2016-01-01

To evaluate the conformity of the Arm Activity measure (ArmA) passive function sub-scale to the Rasch model. A consecutive cohort of patients (n = 92) undergoing rehabilitation, including upper limb rehabilitation and spasticity management, at two specialist rehabilitation units were included. Rasch analysis was used to examine scaling and conformity to the model. Responses were analysed using Rasch unidimensional measurement models (RUMM 2030). The following aspects were considered: overall model and individual item fit statistics and fit residuals, internal reliability, item response threshold ordering, item bias, local dependency and unidimensionality. ArmA contains both active and passive function sub-scales, but in this analysis only the passive function sub-scale was considered. Four of the seven items in the ArmA passive function sub-scale initially had disordered thresholds. These items were rescored to four response options, which resulted in ordered thresholds for all items. Once the items with disordered thresholds had been rescored, item bias was not identified for age, global disability level or diagnosis, but with a small difference in difficulty between males and females for one item of the scale. Local dependency was not observed and the unidimensionality of the sub-scale was supported and good fit to the Rasch model was identified. The person separation index (PSI) was 0.95 indicating that the scale is able to reliably differentiate at least two groups of patients. The ArmA passive function sub-scale was shown in this evaluation to conform to the Rasch model once disordered thresholds had been addressed. Using the logit scores produced by the Rasch model it was possible to convert this back to the original scale range. Implications for Rehabilitation The ArmA passive function sub-scale was shown, in this evaluation, to conform to the Rasch model once disordered thresholds had been addressed and therefore to be a clinically applicable and potentially useful hierarchical measure. Using Rasch logit scores it has be possible to convert back to the original ordinal scale range and provide an indication of real change to enable evaluation of clinical outcome of importance to patients and clinicians.
Measuring student learning using initial and final concept test in an STEM course

NASA Astrophysics Data System (ADS)

Kaw, Autar; Yalcin, Ali

2012-06-01

Effective assessment is a cornerstone in measuring student learning in higher education. For a course in Numerical Methods, a concept test was used as an assessment tool to measure student learning and its improvement during the course. The concept test comprised 16 multiple choice questions and was given in the beginning and end of the class for three semesters. Hake's gain index, a measure of learning gains from pre- to post-tests, of 0.36 to 0.41 were recorded. The validity and reliability of the concept test was checked via standard measures such as Cronbach's alpha, content and criterion-related validity, item characteristic curves and difficulty and discrimination indices. The performance of various subgroups such as pre-requisite grades, transfer students, gender and age were also studied.
Influence of cognitive function on quality of life in anorexia nervosa patients.

PubMed

Hamatani, Sayo; Tomotake, Masahito; Takeda, Tomoya; Kameoka, Naomi; Kawabata, Masashi; Kubo, Hiroko; Tada, Yukio; Tomioka, Yukiko; Watanabe, Shinya; Inoshita, Masatoshi; Kinoshita, Makoto; Ohta, Masashi; Ohmori, Tetsuro

2017-05-01

The purpose of this study was to elucidate determinants of quality of life (QOL) in anorexia nervosa (AN) patients. Twenty-one female patients with AN participated in the study. QOL was assessed with the 36-Item Short Form Health Survey (SF-36), and cognitive function was evaluated using the Wisconsin Card Sorting Test Keio version, the Rey Complex Figure Test, and the Social Cognition Screening Questionnaire. Clinical symptoms were evaluated with the Beck Depression Inventory-II, the State-Trait Anxiety Inventory-Form JYZ (STAI-JYZ), and the Maudsley Obsessive Compulsive Inventory. The Difficulty Maintaining Set score of the Wisconsin Card Sorting Test Keio version was negatively correlated to the SF-36 Physical Component Summary. Scores of the Beck Depression Inventory-II and the STAI-JYZ State and Trait were negatively correlated to the SF-36 Mental Component Summary (MCS), and the Central Coherence Index 30-min Delayed Recall score of the Rey Complex Figure Test was positively correlated with the MCS. Stepwise regression analysis showed that the Difficulty Maintaining Set score was an independent predictor of the Physical Component Summary and scores for Central Coherence Index 30-min Delayed Recall and the STAI-JYZ Trait-predicted MCS. These results suggest that not only trait anxiety but also poor central coherence and impaired ability to maintain new rule worsen AN patients' QOL. © 2016 The Authors. Psychiatry and Clinical Neurosciences © 2016 Japanese Society of Psychiatry and Neurology.
Comparison of Factor Simplicity Indices for Dichotomous Data: DETECT R, Bentler's Simplicity Index, and the Loading Simplicity Index

ERIC Educational Resources Information Center

Finch, Holmes; Stage, Alan Kirk; Monahan, Patrick

2008-01-01

A primary assumption underlying several of the common methods for modeling item response data is unidimensionality, that is, test items tap into only one latent trait. This assumption can be assessed several ways, using nonlinear factor analysis and DETECT, a method based on the item conditional covariances. When multidimensionality is identified,…
Rasch Measurement of Collaborative Problem Solving in an Online Environment.

PubMed

Harding, Susan-Marie E; Griffin, Patrick E

2016-01-01

This paper describes an approach to the assessment of human to human collaborative problem solving using a set of online interactive tasks completed by student dyads. Within the dyad, roles were nominated as either A or B and students selected their own roles. The question as to whether role selection affected individual student performance measures is addressed. Process stream data was captured from 3402 students in six countries who explored the problem space by clicking, dragging the mouse, moving the cursor and collaborating with their partner through a chat box window. Process stream data were explored to identify behavioural indicators that represented elements of a conceptual framework. These indicative behaviours were coded into a series of dichotomous items. These items represented actions and chats performed by students. The frequency of occurrence was used as a proxy measure of item difficulty. Then given a measure of item difficulty, student ability could be estimated using the difficulty estimates of the range of items demonstrated by the student. The Rasch simple logistic model was used to review the indicators to identify those that were consistent with the assumptions of the model and were invariant across national samples, language, curriculum and age of the student. The data were analysed using a one and two dimension, one parameter model. Rasch separation reliability, fit to the model, distribution of students and items on the underpinning construct, estimates for each country and the effect of role differences are reported. This study provides evidence that collaborative problem solving can be assessed in an online environment involving human to human interaction using behavioural indicators shown to have a consistent relationship between the estimate of student ability, and the probability of demonstrating the behaviour.
Validation of a new measure of availability and accommodation of health care that is valid for rural and urban contexts.

PubMed

Haggerty, Jeannie L; Levesque, Jean-Frédéric

2017-04-01

Patients are the most valid source for evaluating the accessibility of services, but a previous study observed differential psychometric performance of instruments in rural and urban respondents. To validate a measure of organizational accessibility free of differential rural-urban performance that predicts consequences of difficult access for patient-initiated care. Sequential qualitative-quantitative study. Qualitative findings used to adapt or develop evaluative and reporting items. Quantitative validation study. Primary data by telephone from 750 urban, rural and remote respondents in Quebec, Canada; follow-up mailed questionnaire to a subset of 316. Items were developed for barriers along the care trajectory. We used common factor and confirmatory factor analysis to identify constructs and compare models. We used item response theory analysis to test for differential rural-urban performance; examine individual item performance; adjust response options; and exclude redundant or non-discriminatory items. We used logistic regression to examine predictive validity of the subscale on access difficulty (outcome). Initial factor resolution suggested geographic and organizational dimensions, plus consequences of access difficulty. After second administration, organizational accommodation and geographic indicators were integrated into a 6-item subscale of Effective Availability and Accommodation, which demonstrates good variability and internal consistency (α = 0.84) and no differential functioning by geographic area. Each unit increase predicts decreased likelihood of consequences of access difficulties (unmet need and problem aggravation). The new subscale is a practical, valid and reliable measure for patients to evaluate first-contact health services accessibility, yielding valid comparisons between urban and rural contexts. © 2016 The Authors. Health Expectations published by John Wiley & Sons Ltd.
Which dimensions of disability does the HIV Disability Questionnaire (HDQ) measure? A factor analysis.

PubMed

O'Brien, Kelly K; Bayoumi, Ahmed M; Stratford, Paul; Solomon, Patricia

2015-01-01

To assess the dimensions of disability measured by the HIV Disability Questionnaire (HDQ), a newly developed 72-item self-administered questionnaire that describes the presence, severity and episodic nature of disability experienced by people living with HIV. We recruited adults living with HIV from hospital clinics, AIDS service organizations and a specialty hospital and administered the HDQ followed by a demographic questionnaire. We conducted an exploratory factor analysis using disability severity scores to determine the domains of disability in the HDQ. We used the following steps: (a) ensured correlations between items were >0.30 and <0.80; (b) conducted a principal components analysis to extract factors; (c) used the Scree Test and eigenvalue threshold >1.5 to determine the number of factors to retain; and d) used oblique rotation to simplify the factor loading matrix. We assigned items to factors based on factor loadings of >0.30. Of the 361 participants, 80% were men and 77% reported living with at least two concurrent health conditions in addition to HIV. The exploratory factor analysis suggested retaining six factors. Items related to symptoms and impairments loaded on three factors (physical [20 items], cognitive [3 items], and mental and emotional health [11 items]) and items related to worrying about the future, daily activities, and personal relationships loaded on three additional factors (uncertainty [14 items], difficulties with day-to-day activities [9 items], social inclusion [12 items]). The HDQ has six domains: physical symptoms and impairments; cognitive symptoms and impairments; mental and emotional health symptoms and impairments; uncertainty; difficulties with day-to-day activities and challenges to social inclusion. These domains establish the scoring structure for the dimensions of disability measured by the HDQ. Implications for Rehabilitation As individuals live longer and age with HIV, they may be living with the health-related consequences of HIV and concurrent health conditions, a concept that may be termed disability. Measuring disability is important to understand the impact of HIV and its comorbidities. The HIV Disability Questionnaire (HDQ) is a self-administered questionnaire developed to describe the presence, severity and episodic nature of disability experienced by people living with HIV. The HDQ is comprised of six domains of disability including: physical symptoms and impairments (20 items); cognitive symptoms and impairments (3 items); mental and emotional health symptoms and impairments (11 items); uncertainty (14 items); difficulties with day-to-day activities (9 items) and challenges to social inclusion (12 items). These domains represent the dimensions of disability measured by the HDQ. The HDQ is the first known HIV-specific disability measure for adults living with HIV. The HDQ may be used by clinicians and researchers to assess disability experienced by adults living with HIV.
Improvements in throat function and qualities of sore throat from locally applied flurbiprofen 8.75 mg in spray or lozenge format: findings from a randomized trial of patients with upper respiratory tract infection in the Russian Federation

PubMed Central

Burova, Natalia; Bychkova, Valeria; Shephard, Adrian

2018-01-01

Objective To assess the speed of relief provided by flurbiprofen 8.75 mg spray and lozenge and their effect on many of the different qualities and characteristics of throat pain and discomfort, and the many articulations of the broad term “sore throat” (ST). Patients and methods Four hundred and forty adults with recent-onset, moderate-to-severe ST due to upper respiratory tract infection (URTI) were randomized to a single dose of either flurbiprofen 8.75 mg spray (n=218) or flurbiprofen 8.75 mg lozenge (n=222). Throat swabs for bacterial culture were taken at baseline. ST relief was assessed at 1 minute, 1 and 2 hours post-dose using the Sore Throat Relief Rating Scale. The change from baseline at 1 and 2 hours post-dose in difficulty swallowing and swollen throat was assessed using the difficulty swallowing scale and the swollen throat scale, respectively. Patients’ experience of URTI symptoms was assessed using a URTI questionnaire at baseline and 2 hours post-dose. The change in Qualities of Sore Throat Index, a 10-item index of qualities of ST, from baseline at 2 hours post-dose was also measured. Results ST relief was evident in the spray and the lozenge treatment groups at 1 minute, 1 and 2 hours post-dose (P>0.05). In both groups, scores for difficulty swallowing and swollen throat significantly improved at 1 and 2 hours post-dose compared with baseline. At 2 hours post-dose, the number of patients experiencing URTI symptoms that can be attributed to or associated with ST decreased relative to baseline. The mean change from baseline to 2 hours post-dose for each individual score on the Qualities of Sore Throat Index showed significant improvements for flurbiprofen spray and lozenge (all P<0.0001). Conclusion Non-inferiority was established, and flurbiprofen spray and lozenge provided effective relief from ST pain and many of the other commonly reported qualities of ST.

Improvements in throat function and qualities of sore throat from locally applied flurbiprofen 8.75 mg in spray or lozenge format: findings from a randomized trial of patients with upper respiratory tract infection in the Russian Federation.

PubMed

Burova, Natalia; Bychkova, Valeria; Shephard, Adrian

2018-01-01

To assess the speed of relief provided by flurbiprofen 8.75 mg spray and lozenge and their effect on many of the different qualities and characteristics of throat pain and discomfort, and the many articulations of the broad term "sore throat" (ST). Four hundred and forty adults with recent-onset, moderate-to-severe ST due to upper respiratory tract infection (URTI) were randomized to a single dose of either flurbiprofen 8.75 mg spray (n=218) or flurbiprofen 8.75 mg lozenge (n=222). Throat swabs for bacterial culture were taken at baseline. ST relief was assessed at 1 minute, 1 and 2 hours post-dose using the Sore Throat Relief Rating Scale. The change from baseline at 1 and 2 hours post-dose in difficulty swallowing and swollen throat was assessed using the difficulty swallowing scale and the swollen throat scale, respectively. Patients' experience of URTI symptoms was assessed using a URTI questionnaire at baseline and 2 hours post-dose. The change in Qualities of Sore Throat Index, a 10-item index of qualities of ST, from baseline at 2 hours post-dose was also measured. ST relief was evident in the spray and the lozenge treatment groups at 1 minute, 1 and 2 hours post-dose ( P >0.05). In both groups, scores for difficulty swallowing and swollen throat significantly improved at 1 and 2 hours post-dose compared with baseline. At 2 hours post-dose, the number of patients experiencing URTI symptoms that can be attributed to or associated with ST decreased relative to baseline. The mean change from baseline to 2 hours post-dose for each individual score on the Qualities of Sore Throat Index showed significant improvements for flurbiprofen spray and lozenge (all P <0.0001). Non-inferiority was established, and flurbiprofen spray and lozenge provided effective relief from ST pain and many of the other commonly reported qualities of ST.
Overall quality of life and difficulty paying for ostomy supplies in the Veterans Affairs ostomy health-related quality of life study: an exploratory analysis.

PubMed

Coons, Stephen Joel; Chongpison, Yuda; Wendel, Christopher S; Grant, Marcia; Krouse, Robert S

2007-09-01

To explore whether there was a significant relationship between difficulty paying for ostomy supplies and overall quality of life among a sample of ostomates receiving care from the Veterans Health Administration (VHA). The data were collected as part of the Veterans Affairs (VA) Ostomy Health-Related Quality of Life Study, in which 511 respondents (239 cases, 272 controls) completed a survey instrument that included the modified City of Hope Quality of Life (mCOH-QOL) Ostomy questionnaire, SF-36V, and sociodemographic items. Responses from the 239 cases (ie, patients with intestinal stomas) were used in this analysis. The modified City of Hope Quality of Life Ostomy questionnaire item, "How good is your overall quality of life?," was the dependent variable for this analysis. The primary independent variable was the response (yes/no) to the item, "If you pay for any of the (ostomy) costs, is it difficult for you?" A hierarchical regression model was used to examine whether difficulty paying was significantly related to overall quality of life after adjusting for age, income, race/ethnicity, and physical health. After accounting for the proportion of variance explained by age, income, race/ethnicity, and physical health, the additional proportion of variance explained by difficulty paying was statistically significant. Individuals reporting difficulty paying had a roughly 1 point lower (ie, beta-coefficient = -1.052; SE = 0.481) overall quality of life score on the 11-point scale. We found a significant association between difficulty paying for ostomy supplies and overall quality of life. Although the cross-sectional study design does not allow causal inference, the results suggest a relationship that merits further examination.
The analysis of senior high school students' physics HOTS in Bantul District measured using PhysReMChoTHOTS

NASA Astrophysics Data System (ADS)

Istiyono, Edi

2017-08-01

The purpose of this research is to describe the results of higher order thinking skills in physics (PhysHOTS) measurement including: (1) percentage of PhysHOTS level and (2) percentage of the domination of response in the category of students in each analyzing, evaluating, and creating skill. There were 404 10th grade students in Bantul District as the respondents of this research. The instrument used for measurement was PhysReMChoTHOTS. It was divided into two sets consisting of 44 items and including 8 anchor items stated valid by a Physicist, Physics Education Expert, and Physics Education Measurement Expert. The instrument was fit to PCM. The reliability coefficient of this test is 0.71, while the difficulty index of the items ranges from -0.61 to 0.51. The results of the measurement show that: (1) The percentage of each category of PhysHOTS for the 10th grade students in Bantul District for the very low, low, medium, high, and very high category is 4.75 %, 40.30 %, 33.45 %, 19.50 %, and 2.00 %, respectively; and (2) The order in analyzing skills, starts from the weakest, is attributing, differentiating and organizing. The order in evaluating skills, starts from the weakest, is critiquing and checking. Meanwhile, the order in creating skills, starts from the weakest, is producing, planning, and generating.
Tradeoffs between Price and Quality: How a Value Index Affects Preference Formation.

ERIC Educational Resources Information Center

Creyer, Elizabeth H.; Ross, William T., Jr.

1997-01-01

Some of a group of 143 consumers were given a choice between higher-priced, higher-quality items and items with lower price and quality but higher value index (benefit/cost tradeoff); others were given price and quality information only. Consumers were more likely to choose lower-priced, higher-value options when the index information was…
76 FR 31991 - All Items Consumer Price Index for All Urban Consumers; United States City Average

Federal Register 2010, 2011, 2012, 2013, 2014

2011-06-02

... DEPARTMENT OF LABOR Office of the Secretary All Items Consumer Price Index for All Urban Consumers; United States City Average Pursuant to Section 33105(c) of Title 49, United States Code, and the... Consumer Price Index for All Urban Consumers (1967 = 100) increased 110.0 percent from its 1984 annual...
77 FR 23283 - All Items Consumer Price Index for All Urban Consumers; United States City Average

Federal Register 2010, 2011, 2012, 2013, 2014

2012-04-18

... DEPARTMENT OF LABOR Office of the Secretary All Items Consumer Price Index for All Urban Consumers; United States City Average Pursuant to Section 33105(c) of Title 49, United States Code, and the... Consumer Price Index for All Urban Consumers (1967 = 100) increased 116.6 percent from its 1984 annual...
78 FR 35054 - All Items Consumer Price Index for All Urban Consumers; United States City Average

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-11

... DEPARTMENT OF LABOR Office of the Secretary All Items Consumer Price Index for All Urban Consumers... delegation of the Secretary of Transportation's responsibilities under that Act to the Administrator of the... Consumer Price Index for All Urban Consumers (1967=100) increased 121.1 percent from its 1984 annual...
75 FR 22164 - All Items Consumer Price Index for All Urban Consumers; United States City Average

Federal Register 2010, 2011, 2012, 2013, 2014

2010-04-27

... DEPARTMENT OF LABOR Office of the Secretary All Items Consumer Price Index for All Urban Consumers... delegation of the Secretary of Transportation's responsibilities under that Act to the Administrator of the... Consumer Price Index for All Urban Consumers (1967=100) increased 106.6 percent from its 1984 annual...
A Monte Carlo Simulation Investigating the Validity and Reliability of Ability Estimation in Item Response Theory with Speeded Computer Adaptive Tests

ERIC Educational Resources Information Center

Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M.

2010-01-01

Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…
The Influence of Task Demands, Verbal Ability and Executive Functions on Item and Source Memory in Autism Spectrum Disorder

ERIC Educational Resources Information Center

Semino, Sara; Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.

2018-01-01

Autism Spectrum Disorder (ASD) is generally associated with difficulties in contextual source memory but not single item memory. There are surprising inconsistencies in the literature, however, that the current study seeks to address by examining item and source memory in age and ability matched groups of 22 ASD and 21 comparison adults. Results…
Can manual ability be measured with a generic ABILHAND scale? A cross-sectional study conducted on six diagnostic groups

PubMed Central

Arnould, Carlyne; Vandervelde, Laure; Batcho, Charles Sèbiyo; Penta, Massimo; Thonnard, Jean-Louis

2012-01-01

Objectives Several ABILHAND Rasch-built manual ability scales were previously developed for chronic stroke (CS), cerebral palsy (CP), rheumatoid arthritis (RA), systemic sclerosis (SSc) and neuromuscular disorders (NMD). The present study aimed to explore the applicability of a generic manual ability scale unbiased by diagnosis and to study the nature of manual ability across diagnoses. Design Cross-sectional study. Setting Outpatient clinic homes (CS, CP, RA), specialised centres (CP), reference centres (CP, NMD) and university hospitals (SSc). Participants 762 patients from six diagnostic groups: 103 CS adults, 113 CP children, 112 RA adults, 156 SSc adults, 124 NMD children and 124 NMD adults. Primary and secondary outcome measures Manual ability as measured by the ABILHAND disease-specific questionnaires, diagnosis and nature (ie, uni-manual or bi-manual involvement and proximal or distal joints involvement) of the ABILHAND manual activities. Results The difficulties of most manual activities were diagnosis dependent. A principal component analysis highlighted that 57% of the variance in the item difficulty between diagnoses was explained by the symmetric or asymmetric nature of the disorders. A generic scale was constructed, from a metric point of view, with 11 items sharing a common difficulty among diagnoses and 41 items displaying a category-specific location (asymmetric: CS, CP; and symmetric: RA, SSc, NMD). This generic scale showed that CP and NMD children had significantly less manual ability than RA patients, who had significantly less manual ability than CS, SSc and NMD adults. However, the generic scale was less discriminative and responsive to small deficits than disease-specific instruments. Conclusions Our finding that most of the manual item difficulties were disease-dependent emphasises the danger of using generic scales without prior investigation of item invariance across diagnostic groups. Nevertheless, a generic manual ability scale could be developed by adjusting and accounting for activities perceived differently in various disorders. PMID:23117570
Lawton IADL scale in dementia: can item response theory make it more informative?

PubMed

McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M

2014-07-01

impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
The optimal sequence and selection of screening test items to predict fall risk in older disabled women: the Women's Health and Aging Study.

PubMed

Lamb, Sarah E; McCabe, Chris; Becker, Clemens; Fried, Linda P; Guralnik, Jack M

2008-10-01

Falls are a major cause of disability, dependence, and death in older people. Brief screening algorithms may be helpful in identifying risk and leading to more detailed assessment. Our aim was to determine the most effective sequence of falls screening test items from a wide selection of recommended items including self-report and performance tests, and to compare performance with other published guidelines. Data were from a prospective, age-stratified, cohort study. Participants were 1002 community-dwelling women aged 65 years old or older, experiencing at least some mild disability. Assessments of fall risk factors were conducted in participants' homes. Fall outcomes were collected at 6 monthly intervals. Algorithms were built for prediction of any fall over a 12-month period using tree classification with cross-set validation. Algorithms using performance tests provided the best prediction of fall events, and achieved moderate to strong performance when compared to commonly accepted benchmarks. The items selected by the best performing algorithm were the number of falls in the last year and, in selected subpopulations, frequency of difficulty balancing while walking, a 4 m walking speed test, body mass index, and a test of knee extensor strength. The algorithm performed better than that from the American Geriatric Society/British Geriatric Society/American Academy of Orthopaedic Surgeons and other guidance, although these findings should be treated with caution. Suggestions are made on the type, number, and sequence of tests that could be used to maximize estimation of the probability of falling in older disabled women.
Predictive factors for dementia and cognitive impairment among residents living in the veterans' retirement communities in Taiwan: Implications for cognitive health promotion activities.

PubMed

Chen, Liang-Yu; Wu, Yi-Hui; Huang, Chung-Yu; Liu, Li-Kuo; Hwang, An-Chun; Peng, Li-Ning; Lin, Ming-Hsieh; Chen, Liang-Kung

2017-04-01

To identify potentially modifiable risk factors for cognitive decline among veterans' home residents in Taiwan METHODS: The present retrospective cohort study was part of the Veteran Affairs-Comprehensive Geriatric Assessment study that retrieved data of the comprehensive geriatric assessment for 946 residents living at four veterans' homes in Taiwan. The study participants were interviewed every 3-6 months from January 2012 and December 2014. Demographic characteristics,multimorbidity by Charlson's Comorbidities Index, physical function by the Barthel Index, cognition by the Mini-Mental State Examination (MMSE), depression by the five-item Geriatric Depression Scale and nutritional status by the Mini-Nutrition Assessment-Short Form were collected for analysis. A generalized estimating equation model was used after it was adjusted for age, educational level, five-item Geriatric Depression Scale, and problem of communication difficulty to identify potential modifiable risk factors for cognitive decline. The mean age of the participants was 85.7 ± 5.2 years, with a mean follow-up period of 41 ± 21.6 weeks. The prevalence of cognitive impairment (defined by MMSE <24) was 65.6%, whereas 34% of the study participants were positive for depressive symptoms. Approximately one-fifth of the study participants were using psychotropic agents, which was higher among participants with cognitive impairment (23.6% vs 15.6%, P < 0.05) than those without. In the generalized estimating equation model, physical function, nutritional status, depressive symptoms, ex-drinker, multimorbidity and stool incontinence were positively correlated with MMSE score; whereas advanced age, low educational level (<6 years), presence of communication difficulty and use of psychotropic agents were inversely associated with the MMSE score. Physical function and nutritional status were positively associated with the MMSE score, and use of psychotropic agents was negatively correlated with cognitive function. Further intervention study is required to improve the cognitive health of older adults living in the veterans' retirement communities. Geriatr Gerontol Int 2017: 17 (Suppl. 1): 7-13. © 2017 Japan Geriatrics Society.
The Relation between Item Identification Difficulty and Elaborative Conceptual Processing for Children and Adults.

ERIC Educational Resources Information Center

Ackerman, Brian P.; And Others

1990-01-01

Results of four experiments show that developmental differences in elaborative conceptual processing at acquisition and retrieval contribute independently to developmental increases in recall. Item identification processes for both words and pictures constrain children's elaborative processing. The constraints are time limited. (RH)
Treatment of Not-Administered Items on Individually Administered Intelligence Tests

ERIC Educational Resources Information Center

He, Wei; Wolfe, Edward W.

2012-01-01

In administration of individually administered intelligence tests, items are commonly presented in a sequence of increasing difficulty, and test administration is terminated after a predetermined number of incorrect answers. This practice produces stochastically censored data, a form of nonignorable missing data. By manipulating four factors…
49 CFR 826.6 - Allowable fees and expenses.

Code of Federal Regulations, 2011 CFR

2011-10-01

... these rules may exceed $75 indexed as follows: ER14JN94.001 The CPI to be used is the annual average CPI, All Urban Consumers, U.S. City Average, All Items, except where a local, All Item index is available. Where a local index is available, but results in a manifest inequity vis-a-vis the U.S. City Average...
49 CFR 826.6 - Allowable fees and expenses.

Code of Federal Regulations, 2010 CFR

2010-10-01

... these rules may exceed $75 indexed as follows: ER14JN94.001 The CPI to be used is the annual average CPI, All Urban Consumers, U.S. City Average, All Items, except where a local, All Item index is available. Where a local index is available, but results in a manifest inequity vis-a-vis the U.S. City Average...
Perceptions of Korean Pre-Service Special Educators Regarding Teaching Competencies for Students with Disabilities

ERIC Educational Resources Information Center

Kim, Yu-Ri; Park, Jiyeon; Lee, Suk-Hyang

2015-01-01

The purpose of this study is to develop a Teaching competency index in special education and to investigate Korean pre-service special educators' (PSSEs') perceptions regarding each item of the index. Based on a review of the literature on exemplary instruction in special education, we developed an index composed of 44 items. The six sub-domains…
A distributed air index based on maximum boundary rectangle over grid-cells for wireless non-flat spatial data broadcast.

PubMed

Im, Seokjin; Choi, JinTak

2014-06-17

In the pervasive computing environment using smart devices equipped with various sensors, a wireless data broadcasting system for spatial data items is a natural way to efficiently provide a location dependent information service, regardless of the number of clients. A non-flat wireless broadcast system can support the clients in accessing quickly their preferred data items by disseminating the preferred data items more frequently than regular data on the wireless channel. To efficiently support the processing of spatial window queries in a non-flat wireless data broadcasting system, we propose a distributed air index based on a maximum boundary rectangle (MaxBR) over grid-cells (abbreviated DAIM), which uses MaxBRs for filtering out hot data items on the wireless channel. Unlike the existing index that repeats regular data items in close proximity to hot items at same frequency as hot data items in a broadcast cycle, DAIM makes it possible to repeat only hot data items in a cycle and reduces the length of the broadcast cycle. Consequently, DAIM helps the clients access the desired items quickly, improves the access time, and reduces energy consumption. In addition, a MaxBR helps the clients decide whether they have to access regular data items or not. Simulation studies show the proposed DAIM outperforms existing schemes with respect to the access time and energy consumption.

Constructing three emotion knowledge tests from the invariant measurement approach

PubMed Central

Prieto, Gerardo; Burin, Debora I.

2017-01-01

Background Psychological constructionist models like the Conceptual Act Theory (CAT) postulate that complex states such as emotions are composed of basic psychological ingredients that are more clearly respected by the brain than basic emotions. The objective of this study was the construction and initial validation of Emotion Knowledge measures from the CAT frame by means of an invariant measurement approach, the Rasch Model (RM). Psychological distance theory was used to inform item generation. Methods Three EK tests—emotion vocabulary (EV), close emotional situations (CES) and far emotional situations (FES)—were constructed and tested with the RM in a community sample of 100 females and 100 males (age range: 18–65), both separately and conjointly. Results It was corroborated that data-RM fit was sufficient. Then, the effect of type of test and emotion on Rasch-modelled item difficulty was tested. Significant effects of emotion on EK item difficulty were found, but the only statistically significant difference was that between “happiness” and the remaining emotions; neither type of test, nor interaction effects on EK item difficulty were statistically significant. The testing of gender differences was carried out after corroborating that differential item functioning (DIF) would not be a plausible alternative hypothesis for the results. No statistically significant sex-related differences were found out in EV, CES, FES, or total EK. However, the sign of d indicate that female participants were consistently better than male ones, a result that will be of interest for future meta-analyses. Discussion The three EK tests are ready to be used as components of a higher-level measurement process. PMID:28929013
A Study of General Education Astronomy Students' Understandings of Cosmology. Part III. Evaluating Four Conceptual Cosmology Surveys: An Item Response Theory Approach

ERIC Educational Resources Information Center

Wallace, Colin S.; Prather, Edward E.; Duncan, Douglas K.

2012-01-01

This is the third of five papers detailing our national study of general education astronomy students' conceptual and reasoning difficulties with cosmology. In this paper, we use item response theory to analyze students' responses to three out of the four conceptual cosmology surveys we developed. The specific item response theory model we use is…
Item response theory analysis of the Utrecht Work Engagement Scale for Students (UWES-S) using a sample of Japanese university and college students majoring medical science, nursing, and natural science.

PubMed

Tsubakita, Takashi; Shimazaki, Kazuyo; Ito, Hiroshi; Kawazoe, Nobuo

2017-10-30

The Utrecht Work Engagement Scale for Students has been used internationally to assess students' academic engagement, but it has not been analyzed via item response theory. The purpose of this study was to conduct an item response theory analysis of the Japanese version of the Utrecht Work Engagement Scale for Students translated by authors. Using a two-parameter model and Samejima's graded response model, difficulty and discrimination parameters were estimated after confirming the factor structure of the scale. The 14 items on the scale were analyzed with a sample of 3214 university and college students majoring medical science, nursing, or natural science in Japan. The preliminary parameter estimation was conducted with the two parameter model, and indicated that three items should be removed because there were outlier parameters. Final parameter estimation was conducted using the survived 11 items, and indicated that all difficulty and discrimination parameters were acceptable. The test information curve suggested that the scale better assesses higher engagement than average engagement. The estimated parameters provide a basis for future comparative studies. The results also suggested that a 7-point Likert scale is too broad; thus, the scaling should be modified to fewer graded scaling structure.
Item analysis of three Spanish naming tests: a cross-cultural investigation.

PubMed

Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
Examination of the item structure of the Alberta infant motor scale.

PubMed

Liao, Pai-Jun M; Campbell, Suzann K

2004-01-01

The Alberta Infant Motor Scale (AIMS) is a screening tool for identifying delayed motor development from birth to 18 months of age. The purpose of this study was to examine the psychometric structure of the AIMS, including the hierarchical scale of items and the precision for measuring infant ability at different ages. Ninety-seven infants with varying degrees of risk of developmental disability were recruited from three hospitals or from the community in the Chicago metropolitan area. Infants were tested on the AIMS at three, six, nine, and 12 months of age. The hierarchical structure and the range and distribution of item difficulty on the AIMS were analyzed using Rasch psychometric analysis. The Rasch analysis confirmed that items for each of the four testing positions (supine, prone, sitting, and standing) were arranged in increasing order of difficulty, but a ceiling effect was present. Gaps exist at six ability levels, indicating low precision of measurement for differentiating among infants after about nine months of age. The AIMS shows a ceiling effect, measures infant ability best from three to nine months of age, and has few items available for discriminating among infants after they pass the controlled lowering through standing item. Clinical impressions should be drawn with caution at ages when the precision of measurement is low.
The concept of research utilization as understood by Swedish nurses: demarcations of instrumental, conceptual, and persuasive research utilization.

PubMed

Strandberg, Elisabeth; Catrine Eldh, Ann; Forsman, Henrietta; Rudman, Ann; Gustavsson, Petter; Wallin, Lars

2014-02-01

The literature implies research utilization (RU) to be a multifaceted and complex phenomenon, difficult to trace in clinical practice. A deeper understanding of the concept of RU in a nursing context is needed, in particular, for the development of instruments for measuring nurses' RU, which could facilitate the evaluation of interventions to support the implementation of evidence-based practice. In this paper, we explored nurses' demarcation of instrumental RU (IRU), conceptual RU (CRU), and persuasive RU (PRU) using an item pool proposed to measure IRU, CRU, and PRU. The item pool (12 items) was presented to two samples: one of practicing registered nurses (n = 890) in Sweden 4 years after graduating and one of recognized content experts (n = 7). Correlation analyses and content validity index (CVI) calculations were used together with qualitative content analysis, in a mixed methods design. According to the item and factor analyses, CRU and PRU could not be distinguished, whereas IRU could. Analyses also revealed problems in linking the CRU items to the external criteria. The CVIs, however, showed excellent or good results for the IRU, CRU, and PRU items as well as at the scale level. The qualitative data indicated that IRU was the least problematic for the experts to categorize, whereas CRU and PRU were harder to demarcate. Our findings illustrate a difficulty in explicitly demarcating between CRU and PRU in clinical nursing. We suggest this overlap is related to conceptual incoherence, indicating a need for further studies. The findings constitute new knowledge about the RU concepts in a clinical nursing context, and highlight differences in how the concepts can be understood by RNs in clinical practice and experts within the field. We suggest that the findings are useful for defining RU in nursing and further development of measures of RU. © 2013 Sigma Theta Tau International.
Establishing Reliability and Validity of the Criterion Referenced Exam of GeoloGy Standards EGGS

NASA Astrophysics Data System (ADS)

Guffey, S. K.; Slater, S. J.; Slater, T. F.; Schleigh, S.; Burrows, A. C.

2016-12-01

Discipline-based geoscience education researchers have considerable need for a criterion-referenced, easy-to-administer and -score conceptual diagnostic survey for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing education research across the geosciences, we are continuing to rigorously and systematically work to firmly establish the reliability and validity of the recently released Exam of GeoloGy Standards, EGGS. In educational testing, reliability refers to the consistency or stability of test scores whereas validity refers to the accuracy of the inferences or interpretations one makes from test scores. There are several types of reliability measures being applied to the iterative refinement of the EGGS survey, including test-retest, alternate form, split-half, internal consistency, and interrater reliability measures. EGGS rates strongly on most measures of reliability. For one, Cronbach's alpha provides a quantitative index indicating the extent to which if students are answering items consistently throughout the test and measures inter-item correlations. Traditional item analysis methods further establish the degree to which a particular item is reliably assessing students is actually quantifiable, including item difficulty and item discrimination. Validity, on the other hand, is perhaps best described by the word accuracy. For example, content validity is the to extent to which a measurement reflects the specific intended domain of the content, stemming from judgments of people who are either experts in the testing of that particular content area or are content experts. Perhaps more importantly, face validity is a judgement of how representative an instrument is reflective of the science "at face value" and refers to the extent to which a test appears to measure a the targeted scientific domain as viewed by laypersons, examinees, test users, the public, and other invested stakeholders.
Assessing Goodness of Fit in Item Response Theory with Nonparametric Models: A Comparison of Posterior Probabilities and Kernel-Smoothing Approaches

ERIC Educational Resources Information Center

Sueiro, Manuel J.; Abad, Francisco J.

2011-01-01

The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…
Construction and validation of a measure of integrative well-being in seven languages: the Pemberton Happiness Index.

PubMed

Hervás, Gonzalo; Vázquez, Carmelo

2013-04-22

We introduce the Pemberton Happiness Index (PHI), a new integrative measure of well-being in seven languages, detailing the validation process and presenting psychometric data. The scale includes eleven items related to different domains of remembered well-being (general, hedonic, eudaimonic, and social well-being) and ten items related to experienced well-being (i.e., positive and negative emotional events that possibly happened the day before); the sum of these items produces a combined well-being index. A distinctive characteristic of this study is that to construct the scale, an initial pool of items, covering the remembered and experienced well-being domains, were subjected to a complete selection and validation process. These items were based on widely used scales (e.g., PANAS, Satisfaction With Life Scale, Subjective Happiness Scale, and Psychological Well-Being Scales). Both the initial items and reference scales were translated into seven languages and completed via Internet by participants (N = 4,052) aged 16 to 60 years from nine countries (Germany, India, Japan, Mexico, Russia, Spain, Sweden, Turkey, and USA). Results from this initial validation study provided very good support for the psychometric properties of the PHI (i.e., internal consistency, a single-factor structure, and convergent and incremental validity). Given the PHI's good psychometric properties, this simple and integrative index could be used as an instrument to monitor changes in well-being. We discuss the utility of this integrative index to explore well-being in individuals and communities.
A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.

ERIC Educational Resources Information Center

Lau, C. Allen; Wang, Tianyou

This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…
Universal Ontology: Attentive Tracking of Objects and Substances across Languages and over Development

ERIC Educational Resources Information Center

Cacchione, Trix; Indino, Marcello; Fujita, Kazuo; Itakura, Shoji; Matsuno, Toyomi; Schaub, Simone; Amici, Federica

2014-01-01

Previous research has demonstrated that adults are successful at visually tracking rigidly moving items, but experience great difficulties when tracking substance-like "pouring" items. Using a comparative approach, we investigated whether the presence/absence of the grammatical count-mass distinction influences adults and children's…
The Handling of Missing Binary Data in Language Research

ERIC Educational Resources Information Center

Pichette, François; Béland, Sébastien; Jolani, Shahab; Lesniewska, Justyna

2015-01-01

Researchers are frequently confronted with unanswered questions or items on their questionnaires and tests, due to factors such as item difficulty, lack of testing time, or participant distraction. This paper first presents results from a poll confirming previous claims (Rietveld & van Hout, 2006; Schafer & Graham, 2002) that data…
Decimal Fraction Arithmetic: Logical Error Analysis and Its Validation.

ERIC Educational Resources Information Center

Standiford, Sally N.; And Others

This report illustrates procedures of item construction for addition and subtraction examples involving decimal fractions. Using a procedural network of skills required to solve such examples, an item characteristic matrix of skills analysis was developed to describe the characteristics of the content domain by projected student difficulties. Then…
Mutual Information Item Selection in Adaptive Classification Testing

ERIC Educational Resources Information Center

Weissman, Alexander

2007-01-01

A general approach for item selection in adaptive multiple-category classification tests is provided. The approach uses mutual information (MI), a special case of the Kullback-Leibler distance, or relative entropy. MI works efficiently with the sequential probability ratio test and alleviates the difficulties encountered with using other local-…
Interference resolution in major depression.

PubMed

Joormann, Jutta; Nee, Derek Evan; Berman, Marc G; Jonides, John; Gotlib, Ian H

2010-03-01

In two experiments, we investigated individual differences in the ability to resolve interference in participants diagnosed with major depressive disorder (MDD). Participants were administered the "Ignore/Suppress" task, a short-term memory task composed of two steps. In Step 1 ("ignore"), participants were instructed to memorize a set of stimuli while ignoring simultaneously presented irrelevant material. In Step 2 ("suppress"), participants were instructed to forget a subset of the previously memorized material. The ability to resolve interference was indexed by response latencies on two recognition tasks in which participants decided whether a probe was a member of the target set. In Step 1, we compared response latencies to probes from the to-be-ignored list with response latencies to nonrecently presented items. In Step 2, we compared response latencies to probes from the to-be-suppressed list with response latencies to nonrecently presented items. The results indicate that, compared with control participants, depressed participants exhibited increased interference in the "suppress" but not in the "ignore" step of the task, when the stimuli were negative words. No group differences were obtained when we presented letters instead of emotional words. These findings indicate that depression is associated with difficulty in removing irrelevant negative material from short-term memory.
Assessing the Life Science Knowledge of Students and Teachers Represented by the K–8 National Science Standards

PubMed Central

Sadler, Philip M.; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K–8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students’ performance and to have a high level of awareness of the particular misconceptions that their students hold on the K–4 standards, but a low level of awareness of misconceptions related to the 5–8 standards. PMID:24006402
Assessing the life science knowledge of students and teachers represented by the K-8 national science standards.

PubMed

Sadler, Philip M; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students' performance and to have a high level of awareness of the particular misconceptions that their students hold on the K-4 standards, but a low level of awareness of misconceptions related to the 5-8 standards.
Equating with Miditests Using IRT

ERIC Educational Resources Information Center

Fitzpatrick, Joseph; Skorupski, William P.

2016-01-01

The equating performance of two internal anchor test structures--miditests and minitests--is studied for four IRT equating methods using simulated data. Originally proposed by Sinharay and Holland, miditests are anchors that have the same mean difficulty as the overall test but less variance in item difficulties. Four popular IRT equating methods…
A Test of the Similar Sequence Hypothesis.

ERIC Educational Resources Information Center

Silverstein, A. B.; And Others

1982-01-01

Scales for object permanence and spatial relationships were administered to 98 severely and profoundly mentally retarded children (mean age 13 years) on three occasions, 6 months apart. Differences in the difficulty of the items were quite stable, but their order of difficulty differed appreciably from that for nonretarded infants. (Author/SB)
Reproduction of Inflectional Markers in French-Speaking Children with Reading Impairment

ERIC Educational Resources Information Center

St-Pierre, Marie-Catherine; Beland, Renee

2010-01-01

Purpose: Children with reading impairment (RI) experience difficulties in oral and written production of inflectional markers. The origin of these difficulties is not well documented in French. According to some authors, acquisition of irregular items by typically developing children is predicted by token frequency, whereas acquisition of regular…

Reliability and Validity of the Alcohol Short Index of Problems and a Newly Constructed Drug Short Index of Problems*

PubMed Central

Alterman, Arthur I.; Cacciola, John S.; Ivey, Megan A.; Lynch, Kevin G.

2009-01-01

Objective: This study evaluated the psychometric properties of the 15-item alcohol Short Index of Problems (SIP) instrument and those of a newly constructed 15-item drug Short Index of Problems (SIP-D) instrument in 277 newly entered substance-abuse patients. Method: The SIP is derived from the longer, 50-item Drinker Inventory of Consequences (DrInC), which was designed to assess adverse consequences of alcohol use. The SIP-D was constructed by substituting the term “drug use” for the term “drinking” in each SIP item. A 3-month recall interval was employed. Results: Factor analyses of each of the instruments revealed similar solutions, with only one main factor accounting for the majority of variance. Nonparametric item response theory methods produced the same finding. Internal consistency reliability estimates for the SIP and SIP-D total scores were .98 and .97, respectively. Concurrent validity was demonstrated by examining the correlations of the total scores for each of the instruments with the recent summary indexes of the newly revised Addiction Severity Index (ASI-Version 6): alcohol, drug, medical, economic, legal, family/social, and psychiatric problems. Conclusions: This study is the first to confirm the psychometric validity of the SIP when used as an independent instrument unembedded within the DrInC. The study also supports the use of the SIP-D as a brief measure of adverse consequences of drug use. The findings strongly support the unidimensional structure of both measures. PMID:19261243
Reliability and validity of the alcohol short index of problems and a newly constructed drug short index of problems.

PubMed

Alterman, Arthur I; Cacciola, John S; Ivey, Megan A; Habing, Brian; Lynch, Kevin G

2009-03-01

This study evaluated the psychometric properties of the 15-item alcohol Short Index of Problems (SIP) instrument and those of a newly constructed 15-item drug Short Index of Problems (SIP-D) instrument in 277 newly entered substance-abuse patients. The SIP is derived from the longer, 50-item Drinker Inventory of Consequences (DrInC), which was designed to assess adverse consequences of alcohol use. The SIP-D was constructed by substituting the term "drug use" for the term "drinking" in each SIP item. A 3-month recall interval was employed. Factor analyses of each of the instruments revealed similar solutions, with only one main factor accounting for the majority of variance. Nonparametric item response theory methods produced the same finding. Internal consistency reliability estimates for the SIP and SIP-D total scores were .98 and .97, respectively. Concurrent validity was demonstrated by examining the correlations of the total scores for each of the instruments with the recent summary indexes of the newly revised Addiction Severity Index (ASI-Version 6): alcohol, drug, medical, economic, legal, family/social, and psychiatric problems. This study is the first to confirm the psychometric validity of the SIP when used as an independent instrument unembedded within the DrInC. The study also supports the use of the SIP-D as a brief measure of adverse consequences of drug use. The findings strongly support the unidimensional structure of both measures.
Design and development of an instrument to measure overall lifestyle habits for epidemiological research: the Mediterranean Lifestyle (MEDLIFE) index.

PubMed

Sotos-Prieto, Mercedes; Moreno-Franco, Belén; Ordovás, Jose M; León, Montse; Casasnovas, Jose A; Peñalvo, Jose L

2015-04-01

To design and develop a questionnaire that can account for an individual's adherence to a Mediterranean lifestyle including the assessment of diet and physical activity patterns, as well as social interaction. The Mediterranean Lifestyle (MEDLIFE) index was created based on the current Spanish Mediterranean food guide pyramid. MEDLIFE is a twenty-eight-item derived index consisting of questions about food consumption (fifteen items), traditional Mediterranean dietary habits (seven items) and physical activity, rest and social interaction habits (six items). Linear regression models and Spearman rank correlation were fitted to assess content validity and internal consistency. A subset of participants in the Aragon Workers' Health Study cohort (Zaragoza, Spain) provided the data for development of MEDLIFE. Participants (n 988) of the Aragon Workers' Health Study cohort in Spain. Mean MEDLIFE score was 11·3 (sd 2·6; range: 0-28), and the quintile distribution of MEDLIFE score showed a significant association with each of the individual items as well as with specific nutrients and lifestyle indicators (intra-validity). We also quantified MEDLIFE correspondence with previously reported diet quality indices and found significant correlations (ρ range: 0·44-0·53; P<0·001) for the Alternate Healthy Eating Index, the Alternate Mediterranean Diet Index and Mediterranean Diet Adherence Screener. MEDLIFE is the first index to include an overall assessment of lifestyle habits. It is expected to be a more holistic tool to measure adherence to the Mediterranean lifestyle in epidemiological studies.
Leader Effectiveness Index Manual.

ERIC Educational Resources Information Center

Moss, Jerome, Jr.; And Others

The "Leader Effectiveness Index (LEI) is a multirater instrument designed to assess the effectiveness of leadership performance of vocational educators. It consists of seven items. The first six items are statements of six broad tasks (or responsibilities) of a leader in vocational education: (1) inspires shared vision and establishes…
Social Exclusion Index-for Health Surveys (SEI-HS): a prospective nationwide study to extend and validate a multidimensional social exclusion questionnaire.

PubMed

van Bergen, Addi P L; Hoff, Stella J M; Schreurs, Hanneke; van Loon, Annelies; van Hemert, Albert M

2017-03-14

Social exclusion (SE) refers to the inability of certain groups or individuals to fully participate in society. SE is associated with socioeconomic inequalities in health, and its measurement in routine public health monitoring is considered key to designing effective health policies. In an earlier retrospective analysis we demonstrated that in all four major Dutch cities, SE could largely be measured with existing local public health monitoring data. The current prospective study is aimed at constructing and validating an extended national measure for SE that optimally employs available items. In 2012, a stratified general population sample of 258,928 Dutch adults completed a version of the Netherlands Public Health Monitor (PHM) questionnaire in which 9 items were added covering aspects of SE that were found to be missing in our previous research. Items were derived from the SCP social exclusion index, a well-constructed 15-item instrument developed by the Netherlands Institute for Social Research (SCP). The dataset was randomly divided into a development sample (N =129,464) and a validation sample (N = 129,464). Canonical correlation analysis was conducted in the development sample. The psychometric properties were studied and compared with those of the original SCP index. All analyses were then replicated in the validation sample. The analysis yielded a four dimensional index, the Social Exclusion Index for Health Surveys (SEI-HS), containing 8 SCP items and 9 PHM items. The four dimensions: "lack of social participation", "material deprivation", "lack of normative integration" and "inadequate access to basic social rights", were each measured with 3 to 6 items. The SEI-HS showed adequate internal consistency for both the general index and for two of four dimension scales. The internal structure and construct validity of the SEI-HS were satisfactory and similar to the original SCP index. Replication of the SEI-HS in the validation sample confirmed its generalisability. This study demonstrates that the SEI-HS offers epidemiologists and public health researchers a uniform, reliable, valid and efficient means of assessing social exclusion and its underlying dimensions. The study also provides valuable insights in how to develop embedded measures for public health surveillance.
Bank of Items for H.S.C. Biology Level III and Division 1 with Computerised Self-Moderation and Error Analysis Procedures Using the Items from the Bank.

ERIC Educational Resources Information Center

Palmer, D. G.

This publication presents an organized collection of biology questions, designed for use in evaluation at the secondary level in Tasmania. Each item has been tried for quality and is accompanied by its difficulty percentage as well as by its content area and the mental processes required to answer it. The content areas include: Diversity,…
Psychometric properties of the Chinese version of the Menopause-Specific Quality-of-Life questionnaire.

PubMed

Nie, Guangning; Yang, Hongyan; Liu, Jian; Zhao, ChunMei; Wang, Xiaoyun

2017-05-01

The Menopause-Specific Quality-of-Life (MENQOL) questionnaire was developed as a specific tool to measure the health-related quality-of-life of postmenopausal women. Thus far, the Chinese version questionnaire has not been subjected to psychometric assessment with a large sample. This study aims to evaluate the validity and reliability of the Chinese version of the MENQOL specific to postmenopausal women in China. A total of 1,137 menopausal symptomatic and 491 menopausal asymptomatic women from eight cities in China were recruited using a convenience sampling method. Psychometric properties were evaluated by descriptive statistics, validity, and reliability. Reliability was assessed for each subscale of the MENQOL through internal consistency reliability with Cronbach's α and intersubscale correlations. Item-domain correlations, principal components analysis (PCA), and confirmatory factor analysis were performed to determine construct validity. t tests were used to compare the differences between the menopausal symptomatic and asymptomatic women and to evaluate the discriminate validity. Pearson correlation coefficients were calculated between MENQOL scores and the Kupperman index to assess criterion-related validity. The most common symptoms in Chinese menopausal symptomatic women were "experiencing poor memory" (94.4%), "feeling tired or worn out" (93.8%), "aching in muscle and joints" (89.4%), "low backache" (86.9%), "decrease in physical strength" (86.6%), "aches in back of neck or head" (86.2%), "difficulty sleeping" (83.6%), "accomplishing less than I used to" (83.4%), "feeling a lack of energy" (83.3%), "change in your sexual desire" (81%), and "hot flash" (80.7%) among others. The symptoms of "increased facial hair" were rarely seen (9.9%). The vasomotor domain, as well as psychosocial, physical, and sexual domains showed high reliability (Cronbach's α 0.84, 0.87, 0.89, and 0.86, respectively). Item-domain correlation analysis showed that all items correlated more strongly with their own domains than with other domains. In the PCA, after deleting the "increased facial hair" item, items in the vasomotor, sexual, and psychosocial subscales loaded on their respective domains by and large, and items in the physical subscale divided into two factors. The PCA revealed a latent structure of the Chinese version of MENQOL nearly identical to the original MENQOL domains. The confirmatory factor analysis demonstrated that the questionnaire fits well with a four-domain model. The MENQOL can discriminate between menopausal symptomatic women with asymptomatic women as it showed good discriminate validity. Criterion-related validity was confirmed by a significant correlation between MENQOL scores and the Kupperman index. This study showed that Chinese version of MENQOL has good psychometric properties and would be suitable to measure the health-related quality-of-life of Chinese menopausal women except for item 21 (increased facial hair).
Development and assessment of floor and ceiling items for the PROMIS physical function item bank

PubMed Central

2013-01-01

Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at one or the other extreme ends of functioning. Optimal use of these new items will be assisted by computerized adaptive testing (CAT), reducing questionnaire burden and insuring item administration to appropriate individuals. PMID:24286166
Effects of Ignoring Item Interaction on Item Parameter Estimation and Detection of Interacting Items

ERIC Educational Resources Information Center

Chen, Cheng-Te; Wang, Wen-Chung

2007-01-01

This study explores the effects of ignoring item interaction on item parameter estimation and the efficiency of using the local dependence index Q[subscript 3] and the SAS NLMIXED procedure to detect item interaction under the three-parameter logistic model and the generalized partial credit model. Through simulations, it was found that ignoring…
Reliability of scores between stroke patients and significant others on the Reintegration to Normal Living (RNL) Index.

PubMed

Tooth, Leigh R; McKenna, Kryss T; Smith, Melinda; O'Rourke, Peter K

2003-05-06

This study measured reliability between stroke patients' and significant others' scores on items on the Reintegration to Normal Living (RNL) Index and whether there were any scoring biases. The 11-item RNL Index was administered to 57 pairs of patients and significants six months after stroke rehabilitation. The index was scored using a 10-point visual analogue scale. Patient and significant other demographic information and data on patients' clinical, functional and cognitive status were collected. Reliability was measured using the intra-class correlation coefficient (ICC) and percent agreement. Overall poor reliability was found for the RNL Index total score (ICC=.36, 95% CI .07 to .59) and the daily functioning subscale (ICC=.24, 95% Cl -.003 to .46) and moderate reliability was found for the perception of self subscale (ICC= .55, 95% Cl .28 to .73). There was a moderate bias for patients to rate themselves as achieving better reintegration than was indicated by significant others, although no demographic or clinical factors were associated with this bias. Exact match agreement was best for the subjective items and worse for items reflecting mobility around the community and participation in a work activity. Caution is needed when interpreting patient information reported by significant others on the RNL Index. The use of a shorter scale to rate the RNL Index requires investigation.
Evaluation of Resident Evacuations in Urban Rainstorm Waterlogging Disasters Based on Scenario Simulation: Daoli District (Harbin, China) as an Example

PubMed Central

Chen, Peng; Zhang, Jiquan; Zhang, Lifeng; Sun, Yingyue

2014-01-01

With the acceleration of urbanization, waterlogging has become an increasingly serious issue. Road waterlogging has a great influence on residents’ travel and traffic safety. Thus, evaluation of residents’ travel difficulties caused by rainstorm waterlogging disasters is of great significance for their travel safety and emergency shelter needs. This study investigated urban rainstorm waterlogging disasters, evaluating the impact of the evolution of such disasters’ evolution on residents’ evacuation, using Daoli District (Harbin, China) as the research demonstration area to perform empirical research using a combination of scenario simulations, questionnaires, GIS spatial technology analysis and a hydrodynamics method to establish an urban rainstorm waterlogging numerical simulation model. The results show that under the conditions of a 10-year frequency rainstorm, there are three street sections in the study area with a high difficulty index, five street sections with medium difficulty index and the index is low at other districts, while under the conditions of a 50-year frequency rainstorm, there are five street sections with a high difficulty index, nine street sections with a medium difficulty index and the other districts all have a low index. These research results can help set the foundation for further small-scale urban rainstorm waterlogging disaster scenario simulations and emergency shelter planning as well as forecasting and warning, and provide a brand-new thought and research method for research on residents’ safe travel. PMID:25264676
Construction and validation of a measure of integrative well-being in seven languages: The Pemberton Happiness Index

PubMed Central

2013-01-01

Purpose We introduce the Pemberton Happiness Index (PHI), a new integrative measure of well-being in seven languages, detailing the validation process and presenting psychometric data. The scale includes eleven items related to different domains of remembered well-being (general, hedonic, eudaimonic, and social well-being) and ten items related to experienced well-being (i.e., positive and negative emotional events that possibly happened the day before); the sum of these items produces a combined well-being index. Methods A distinctive characteristic of this study is that to construct the scale, an initial pool of items, covering the remembered and experienced well-being domains, were subjected to a complete selection and validation process. These items were based on widely used scales (e.g., PANAS, Satisfaction With Life Scale, Subjective Happiness Scale, and Psychological Well-Being Scales). Both the initial items and reference scales were translated into seven languages and completed via Internet by participants (N = 4,052) aged 16 to 60 years from nine countries (Germany, India, Japan, Mexico, Russia, Spain, Sweden, Turkey, and USA). Results Results from this initial validation study provided very good support for the psychometric properties of the PHI (i.e., internal consistency, a single-factor structure, and convergent and incremental validity). Conclusions Given the PHI’s good psychometric properties, this simple and integrative index could be used as an instrument to monitor changes in well-being. We discuss the utility of this integrative index to explore well-being in individuals and communities. PMID:23607679
How Task Features Impact Evidence from Assessments Embedded in Simulations and Games

ERIC Educational Resources Information Center

Almond, Russell G.; Kim, Yoon Jeon; Velasquez, Gertrudes; Shute, Valerie J.

2014-01-01

One of the key ideas of evidence-centered assessment design (ECD) is that task features can be deliberately manipulated to change the psychometric properties of items. ECD identifies a number of roles that task-feature variables can play, including determining the focus of evidence, guiding form creation, determining item difficulty and…
An Eye-Movement Study of Relational Memory in Adults with Autism Spectrum Disorder

ERIC Educational Resources Information Center

Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.

2017-01-01

Persons with Autism Spectrum Disorder (ASD) demonstrate good memory for single items but difficulties remembering contextual information related to these items. Recently, we found compromised explicit but intact implicit retrieval of object-location information in ASD (Ring et al. "Autism Res" 8(5):609-619, 2015). Eye-movement data…
Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach

ERIC Educational Resources Information Center

Mesic, Vanes; Muratovic, Hasnija

2011-01-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…
Cognitive Complexity in the Remote Association Test--Chinese Version

ERIC Educational Resources Information Center

Hung, Su-Pin; Huang, Po-Sheng; Chen, Hsueh-Chih

2016-01-01

The remote association test (RAT) has been applied in various fields; however, evidence of construct validity for the original version and subsequent extensions of the RAT remains limited. This study aimed to elucidate the dimensionality and the relationship between item features and item difficulties for the RAT--Chinese Version (RAT-C) using the…
Analysis of Open-Ended Statistics Questions with Many Facet Rasch Model

ERIC Educational Resources Information Center

Güler, Nese

2014-01-01

Problem Statement: The most significant disadvantage of open-ended items that allow the valid measurement of upper level cognitive behaviours, such as synthesis and evaluation, is scoring. The difficulty associated with objectively scoring the answers to the items contributes to the reduction of the reliability of the scores. Moreover, other…
Developing and Evaluating a Machine-Scorable, Constrained Constructed-Response Item.

ERIC Educational Resources Information Center

Braun, Henry I.; And Others

The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non-multiple choice…
A Comparison between Element Salience versus Context as Item Difficulty Factors in Raven's Matrices

ERIC Educational Resources Information Center

Perez-Salas, Claudia P.; Streiner, David L.; Roberts, Maxwell J.

2012-01-01

The nature of contextual facilitation effects for items derived from Raven's Progressive Matrices was investigated in two experiments. For these, the original matrices were modified, creating either abstract versions with high element salience, or versions which comprised realistic entities set in familiar contexts. In order to replicate and…
An Application of the Rasch Model.

ERIC Educational Resources Information Center

Veitch, William R.

The one parameter latent trait theory of Georg Rasch has two assumptions: that student abilities can be measured on an equal interval scale, and that the success of a student with a given item is a function of student achievement and item difficulty. The grade four Michigan Educational Assessment Program reading test was designed to measure…

[THE NEED FOR HELP OF FAMILY CAREGIVERS OF PERSONS WITH MENTAL ILLNESS IN A UNIQUE SERVICE FOR FAMILIES AT THE BEER SHEVA MENTAL HEALTH CENTER].

PubMed

Shalev, Anat; Shor, Ron

2016-12-01

Limited research attention has been given to the needs of family caregivers of persons with mental illness in psychiatric hospitals despite the stressors and difficulties they experience. In light of the recognition of the significance of helping family caregivers, a new model of consultation and support centers for family caregivers, called Meital, has been developed. To examine the needs of family caregivers who receive help in Meital, at the Beer Sheva Mental Health Center. Eighty-five family caregivers participated in the research. They completed a structured questionnaire constructed for this research two weeks after they started receiving services from Meital. The questionnaire included four areas of needs for help. These areas examined the extent of the need for help with respect to each of the items in the instrument. The mean of the extent of need for help of the items in the 'information and knowledge' subscale was the highest. Average to high means of the items of the subscales were found in the subscales relating to 'difficulties stemming from the impact of the situation of the person with mental illness on the function of the family caregiver receiving help,' 'on the function of other family members' and 'difficulties coping with the person with mental illness.' The mean of the items of the subscale 'relationships with professionals and informal systems' was the lowest. An examination of the items within the subscales indicated that items relating to the 'impact of the situation of the person with mental illness on the family caregiver who receives help' were ranked higher than the items relating to the 'impact on the function of other family caregivers.' Items relating to 'relationships with professionals' were ranked higher than items relating to 'relationships with informal systems.' This research emphasizes the importance of implementing the family-centered approach, the basis of the Meital Model, in psychiatric institutions. The focus of this approach is on the need for help of family caregivers beyond the help needed for them to function as a resource of help for the ill person. The findings also illuminate the importance of making information and knowledge accessible for family caregivers.
The Effect of Visual-Chunking-Representation Accommodation on Geometry Testing for Students with Math Disabilities

ERIC Educational Resources Information Center

Zhang, Dake; Ding, Yi; Stegall, Joanna; Mo, Lei

2012-01-01

Students who struggle with learning mathematics often have difficulties with geometry problem solving, which requires strong visual imagery skills. These difficulties have been correlated with deficiencies in visual working memory. Cognitive psychology has shown that chunking of visual items accommodates students' working memory deficits. This…
Revisiting the Factor Structure of the Strengths and Difficulties Questionnaire: United States, 2001.

ERIC Educational Resources Information Center

Dickey, Wayne C.; Blumberg, Stephen J.

2004-01-01

Objective: The Strengths and Difficulties Questionnaire is a 25-item instrument developed to assess emotional and behavioral problems. The current study attempted to replicate previous European structural analyses and to describe the latent dimensions that underlie responses to the parent-reported version of the Strengths and Difficulties…
Eye Movements Reveal How Task Difficulty Moulds Visual Search

ERIC Educational Resources Information Center

Young, Angela H.; Hulleman, Johan

2013-01-01

In two experiments we investigated the relationship between eye movements and performance in visual search tasks of varying difficulty. Experiment 1 provided evidence that a single process is used for search among static and moving items. Moreover, we estimated the functional visual field (FVF) from the gaze coordinates and found that its size…
Comparison of Difficulties and Reliabilities of Math-Completion and Multiple-Choice Item Formats.

ERIC Educational Resources Information Center

Oosterhof, Albert C.; Coats, Pamela K.

Instructors who develop classroom examinations that require students to provide a numerical response to a mathematical problem are often very concerned about the appropriateness of the multiple-choice format. The present study augments previous research relevant to this concern by comparing the difficulty and reliability of multiple-choice and…
Belief-bias reasoning in non-clinical delusion-prone individuals.

PubMed

Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R

2017-03-01

It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
Belief-bias reasoning in non-clinical delusion-prone individuals.

PubMed

Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R

2017-09-01

It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
Cross-cultural comparisons of the Mini-mental State Examination between Japanese and U.S. cohorts

PubMed Central

Meguro, Kenichi; Ishii, Hiroshi; Yamaguchi, Satoshi; Saxton, Judith A.; Ganguli, Mary

2009-01-01

Background The Mini-mental State Examination (MMSE) is widely used in Japan and the U.S.A. for cognitive screening in the clinical setting and in epidemiological studies. A previous Japanese community study reported distributions of the MMSE total score very similar to that of the U.S.A. Methods Data were obtained from the Monongahela Valley Independent Elder's Study (MoVIES), a representative sample of community-dwelling elderly people aged 65 and older living near Pittsburgh, U.S.A., and from the Tajiri Project, with similar aims in Tajiri, Japan. We examined item-by-item distributions of the MMSE between two cohorts, comparing (1) percentage of correct answers for each item within each cohort, and (2) relative difficulty of each item measured by Item Characteristic Curve analysis (ICC), which estimates log odds of obtaining a correct answer adjusted for the remaining MMSE items, demographic variables (age, gender, education) and interactions of demographic variables and cohort. Results Median MMSE scores were very similar between the two samples within the same education groups. However, the relative difficulty of each item differed substantially between the two cohorts. Specifically, recall and auditory comprehension were easier for the Tajiri group, but reading comprehension and sentence construction were easier for the MoVIES group. Conclusions Our results reaffirm the importance of validation and examination of thresholds in each cohort to be studied when a common instrument is used as a dementia screening tool or for defining cognitive impairment. PMID:18925977
Visual search among items of different salience: removal of visual attention mimics a lesion in extrastriate area V4.

PubMed

Braun, J

1994-02-01

In more than one respect, visual search for the most salient or the least salient item in a display are different kinds of visual tasks. The present work investigated whether this difference is primarily one of perceptual difficulty, or whether it is more fundamental and relates to visual attention. Display items of different salience were produced by varying either size, contrast, color saturation, or pattern. Perceptual masking was employed and, on average, mask onset was delayed longer in search for the least salient item than in search for the most salient item. As a result, the two types of visual search presented comparable perceptual difficulty, as judged by psychophysical measures of performance, effective stimulus contrast, and stability of decision criterion. To investigate the role of attention in the two types of search, observers attempted to carry out a letter discrimination and a search task concurrently. To discriminate the letters, observers had to direct visual attention at the center of the display and, thus, leave unattended the periphery, which contained target and distractors of the search task. In this situation, visual search for the least salient item was severely impaired while visual search for the most salient item was only moderately affected, demonstrating a fundamental difference with respect to visual attention. A qualitatively identical pattern of results was encountered by Schiller and Lee (1991), who used similar visual search tasks to assess the effect of a lesion in extrastriate area V4 of the macaque.
Using a Process Dissociation Approach to Assess Verbal Short-Term Memory for Item and Order Information in a Sample of Individuals with a Self-Reported Diagnosis of Dyslexia

PubMed Central

Wang, Xiaoli; Xuan, Yifu; Jarrold, Christopher

2016-01-01

Previous studies have examined whether difficulties in short-term memory for verbal information, that might be associated with dyslexia, are driven by problems in retaining either information about to-be-remembered items or the order in which these items were presented. However, such studies have not used process-pure measures of short-term memory for item or order information. In this work we adapt a process dissociation procedure to properly distinguish the contributions of item and order processes to verbal short-term memory in a group of 28 adults with a self-reported diagnosis of dyslexia and a comparison sample of 29 adults without a dyslexia diagnosis. In contrast to previous work that has suggested that individuals with dyslexia experience item deficits resulting from inefficient phonological representation and language-independent order memory deficits, the results showed no evidence of specific problems in short-term retention of either item or order information among the individuals with a self-reported diagnosis of dyslexia, despite this group showing expected difficulties on separate measures of word and non-word reading. However, there was some suggestive evidence of a link between order memory for verbal material and individual differences in non-word reading, consistent with other claims for a role of order memory in phonologically mediated reading. The data from the current study therefore provide empirical evidence to question the extent to which item and order short-term memory are necessarily impaired in dyslexia. PMID:26941679
Using a Process Dissociation Approach to Assess Verbal Short-Term Memory for Item and Order Information in a Sample of Individuals with a Self-Reported Diagnosis of Dyslexia.

PubMed

Wang, Xiaoli; Xuan, Yifu; Jarrold, Christopher

2016-01-01

Previous studies have examined whether difficulties in short-term memory for verbal information, that might be associated with dyslexia, are driven by problems in retaining either information about to-be-remembered items or the order in which these items were presented. However, such studies have not used process-pure measures of short-term memory for item or order information. In this work we adapt a process dissociation procedure to properly distinguish the contributions of item and order processes to verbal short-term memory in a group of 28 adults with a self-reported diagnosis of dyslexia and a comparison sample of 29 adults without a dyslexia diagnosis. In contrast to previous work that has suggested that individuals with dyslexia experience item deficits resulting from inefficient phonological representation and language-independent order memory deficits, the results showed no evidence of specific problems in short-term retention of either item or order information among the individuals with a self-reported diagnosis of dyslexia, despite this group showing expected difficulties on separate measures of word and non-word reading. However, there was some suggestive evidence of a link between order memory for verbal material and individual differences in non-word reading, consistent with other claims for a role of order memory in phonologically mediated reading. The data from the current study therefore provide empirical evidence to question the extent to which item and order short-term memory are necessarily impaired in dyslexia.
Feeding practices in infancy associated with caries incidence in early childhood.

PubMed

Chaffee, Benjamin W; Feldens, Carlos Alberto; Rodrigues, Priscila Humbert; Vítolo, Márcia Regina

2015-08-01

Early-life feeding behaviors foretell later dietary habits and health outcomes. Few studies have examined infant dietary patterns and caries occurrence prospectively. Assess whether patterns in food and drink consumption before age 12 months are associated with caries incidence by preschool age. We collected early-life feeding data within a birth cohort from low-income families in Porto Alegre, Brazil. Three dietary indexes were defined, based on refined sugar content and/or previously reported caries associations: a count of sweet foods or drinks introduced <6-months (e.g., candy, cookies, soft drinks), a count of other, nonsweet items introduced <6-months (e.g., beans, meat), and a count of sweet items consumed at 12 months. Incidence of severe early childhood caries (S-ECC) at age 38 months (N = 458) was compared by score tertile on each index, adjusted for family, maternal, and child characteristics using regression modeling. Introduction to a greater number of presumably cariogenic items in infancy was positively associated with future caries. S-ECC incidence was highest in the uppermost tertile of the '6-month sweet index' (adjusted cumulative incidence ratio, RR, versus lowest tertile: 1.46; 95% CI: 0.97, 2.04) and the uppermost tertile of the '12-month sweet index' (RR: 1.55; 95% CI: 1.17, 2.23). The association was specific for sweet items: caries incidence did not differ by tertile of the '6-month nonsweet index' (RR: 1.00; 95% CI: 0.70, 1.40). Additionally, each one-unit increase on the 6-month and the 12-month sweet indexes, but not the 6-month nonsweet index, was statistically significantly associated with greater S-ECC incidence and associated with more decayed, missing, or restored teeth. Results were robust to minor changes in the items constituting each index and persisted if liquid items were excluded. Dietary factors observed before age 12-months were associated with S-ECC at preschool age, highlighting a need for timely, multilevel intervention. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION

PubMed Central

de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
Assessment of free and cued recall in Alzheimer's disease and vascular and frontotemporal dementia with 24-item Grober and Buschke test.

PubMed

Cerciello, Milena; Isella, Valeria; Proserpi, Alice; Papagno, Costanza

2017-01-01

Alzheimer's disease (AD), vascular dementia (VaD) and frontotemporal dementia (FTD) are the most common forms of dementia. It is well known that memory deficits in AD are different from those in VaD and FTD, especially with respect to cued recall. The aim of this clinical study was to compare the memory performance in 15 AD, 10 VaD and 9 FTD patients and 20 normal controls by means of a 24-item Grober-Buschke test [8]. The patients' groups were comparable in terms of severity of dementia. We considered free and total recall (free plus cued) both in immediate and delayed recall and computed an Index of Sensitivity to Cueing (ISC) [8] for immediate and delayed trials. We assessed whether cued recall predicted the subsequent free recall across our patients' groups. We found that AD patients recalled fewer items from the beginning and were less sensitive to cueing supporting the hypothesis that memory disorders in AD depend on encoding and storage deficit. In immediate recall VaD and FTD showed a similar memory performance and a stronger sensitivity to cueing than AD, suggesting that memory disorders in these patients are due to a difficulty in spontaneously implementing efficient retrieval strategies. However, we found a lower ISC in the delayed recall compared to the immediate trials in VaD than FTD due to a higher forgetting in VaD.
HoNOSCA-D As a Measure of the Severity of Diagnosed Mental Disorders in Children and Adolescents-Psychometric Properties of the German Translation.

PubMed

von Wyl, Agnes; Toggweiler, Stephan; Zollinger, Ruedi

2017-01-01

The Health of the Nation Outcome Scales for Children and Adolescents (HoNOSCA), in use worldwide, is a 13-item measure assessing the biopsychosocial severity of mental health problems in children and adolescents. This article introduces the authorized German-language version of HoNOSCA, the HoNOSCA-D, and examines and discusses its psychometric properties based on a clinical sample of 1,533 children and adolescents aged 4;0 to 17;11 years. For the HoNOSCA-D total score (severity of mental health problems), internal consistency (Cronbach's alpha) was 0.63. The discriminative power of the items ranged from 0.07 to 0.44; the average interitem correlation was 0.11. Due to this stochastic independence, calculation of a total severity index is acceptable. Using factor analysis, the principal axis factoring and varimax rotation resulted in a four-factor structure, which with a Kaiser-Meyer-Olkin measure of sampling adequacy of 0.684 explained 30.62% of total variance. The convergent correlations with the German-language parent report version of the Strengths and Difficulties Questionnaire were as expected and showed a medium effect size. Gender and age differences in the HoNOSCA-D total score were small. Regarding the 13 items gender and age differences were negligible to medium. The highest severity was found for schizophrenia and psychotic disorders, followed by affective disorders and social behavior disorders. Overall, validity of HoNOSCA-D was clearly supported.
Title list of documents made publicly available, April 1--30 1997, Vol. 19, No. 4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morris, E.B.

This report describes the information received and published by the U.S. Nuclear Regulatory Commission (NRC). This information includes: (1) material associated with civilian nuclear power plants and other uses of radioactive materials and (2) material received and published by NRC pertinent to its role as a regulatory agency. In this report, 7 items of the first type are included, and 25 regulatory type items are listed. The report is indexed by a Personal Author Index, a Corporate Source Index, and a Report Number Index.
Tracking neural correlates of successful learning over repeated sequence observations

PubMed Central

Steinemann, Natalie A.; Moisello, Clara; Ghilardi, M. Felice; Kelly, Simon P.

2016-01-01

The neural correlates of memory formation in humans have long been investigated by exposing subjects to diverse material and comparing responses to items later remembered to those forgotten. Tasks requiring memorization of sensory sequences afford unique possibilities for linking neural memorization processes to behavior, because, rather than comparing across different items of varying content, each individual item can be examined across the successive learning states of being initially unknown, newly learned, and eventually, fully known. Sequence learning paradigms have not yet been exploited in this way, however. Here, we analyze the event-related potentials of subjects attempting to memorize sequences of visual locations over several blocks of repeated observation, with respect to pre- and post-block recall tests. Over centro-parietal regions, we observed a rapid P300 component superimposed on a broader positivity, which exhibited distinct modulations across learning states that were replicated in two separate experiments. Consistent with its well-known encoding of surprise, the P300 deflection monotonically decreased over blocks as locations became better learned and hence more expected. In contrast, the broader positivity was especially elevated at the point when a given item was newly learned, i.e., started being successfully recalled. These results implicate the Broad Positivity in endogenously-driven, intentional memory formation, whereas the P300, in processing the current stimulus to the degree that it was previously uncertain, indexes the cumulative knowledge thereby gained. The decreasing surprise/P300 effect significantly predicted learning success both across blocks and across subjects. This presents a new, neural-based means to evaluate learning capabilities independent of verbal reports, which could have considerable value in distinguishing genuine learning disabilities from difficulties to communicate the outcomes of learning, or perceptual impairments, in a range of clinical brain disorders. PMID:27155129
Human Factors Engineering. Part 2. HEDGE (Human Factors Engineering Data Guide for Evaluation)

DTIC Science & Technology

1983-11-30

Use.Condit ions 0 7ý est Item ComoentsTask Categories EPurposes 2 ;c . INDEX TO THE INDEX MAN/ITEM TASK SHEET DETAILED DESIGN CONSIDERATION The purpose of...The use of these materials, in addition to standard Task and Design Checklists and Questionnaires, will enable you to tailor your FIFE subtest to a...specific Con item. The These materials have been prepared especially for you: I. They are intended to support test engineers not design engineers. 2
Item-focussed Trees for the Identification of Items in Differential Item Functioning.

PubMed

Tutz, Gerhard; Berger, Moritz

2016-09-01

A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
NON-SPECIFIC SYMPTOMS AND SCREENING OF NON-PSYCHOTIC MORBIDITY IN PRIMARY CARE1

PubMed Central

Srinivasan, T.N.; Suresh, T.R.

1990-01-01

SUMMARY Much of the non-psychotic mental morbidity in primary care goes undetected by the primary care health personnel. This is often because of the non-specific somatic nature of the presenting complaints of these patients and the difficulty on the part of the primary care physician to elicit specific emotional symptoms to screen psychiatric problems. This paper describes the development of the 7-item Primary care Psychiatric Questionnaire (PPQ.) which, by requiring to elicit only the non-specific symptoms, could overcome this practical difficulty. This new screening method has been standardised against the Self Report Questionaaire—20-item version which is commonly used in primary care. PMID:21927432

Technical flaws in multiple-choice questions in the access exam to medical specialties ("examen MIR") in Spain (2009-2013).

PubMed

Rodríguez-Díez, María Cristina; Alegre, Manuel; Díez, Nieves; Arbea, Leire; Ferrer, Marta

2016-02-03

The main factor that determines the selection of a medical specialty in Spain after obtaining a medical degree is the MIR ("médico interno residente", internal medical resident) exam. This exam consists of 235 multiple-choice questions with five options, some of which include images provided in a separate booklet. The aim of this study was to analyze the technical quality of the multiple-choice questions included in the MIR exam over the last five years. All the questions included in the exams from 2009 to 2013 were analyzed. We studied the proportion of questions including clinical vignettes, the number of items related to an image and the presence of technical flaws in the questions. For the analysis of technical flaws, we adapted the National Board of Medical Examiners (NBME) guidelines. We looked for 18 different issues included in the manual, grouped into two categories: issues related to testwiseness and issues related to irrelevant difficulties. The final number of questions analyzed was 1,143. The percentage of items based on clinical vignettes increased from 50% in 2009 to 56-58% in the following years (2010-2013). The percentage of items based on an image increased progressively from 10% in 2009 to 15% in 2012 and 2013. The percentage of items with at least one technical flaw varied between 68 and 72%. We observed a decrease in the percentage of items with flaws related to testwiseness, from 30% in 2009 to 20% in 2012 and 2013. While most of these issues decreased dramatically or even disappeared (such as the imbalance in the correct option numbers), the presence of non-plausible options remained frequent. With regard to technical flaws related to irrelevant difficulties, no improvement was observed; this is especially true with respect to negative stem questions and "hinged" questions. The formal quality of the MIR exam items has improved over the last five years with regard to testwiseness. A more detailed revision of the items submitted, checking systematically for the presence of technical flaws, could improve the validity and discriminatory power of the exam, without increasing its difficulty.
Refining a self-assessment of informatics competency scale using Mokken scaling analysis.

PubMed

Yoon, Sunmoo; Shaffer, Jonathan A; Bakken, Suzanne

2015-01-01

Healthcare environments are increasingly implementing health information technology (HIT) and those from various professions must be competent to use HIT in meaningful ways. In addition, HIT has been shown to enable interprofessional approaches to health care. The purpose of this article is to describe the refinement of the Self-Assessment of Nursing Informatics Competencies Scale (SANICS) using analytic techniques based upon item response theory (IRT) and discuss its relevance to interprofessional education and practice. In a sample of 604 nursing students, the 93-item version of SANICS was examined using non-parametric IRT. The iterative modeling procedure included 31 steps comprising: (1) assessing scalability, (2) assessing monotonicity, (3) assessing invariant item ordering, and (4) expert input. SANICS was reduced to an 18-item hierarchical scale with excellent reliability. Fundamental skills for team functioning and shared decision making among team members (e.g. "using monitoring systems appropriately," "describing general systems to support clinical care") had the highest level of difficulty, and "demonstrating basic technology skills" had the lowest difficulty level. Most items reflect informatics competencies relevant to all health professionals. Further, the approaches can be applied to construct a new hierarchical scale or refine an existing scale related to informatics attitudes or competencies for various health professions.
An Evaluation of Item Response Theory Classification Accuracy and Consistency Indices

ERIC Educational Resources Information Center

Wyse, Adam E.; Hao, Shiqi

2012-01-01

This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…
Difficulties in using Oswestry Disability Index in Indian patients and validity and reliability of translator-assisted Oswestry Disability Index.

PubMed

Aithala, Janardhana P

2015-06-09

In Indian patients, in view of language plurality and illiteracy, self-reporting of English version of Oswestry Disability Index (ODI) is not practical. Our study aim was to find out to what extent self-reporting of ODI was possible and in cases where self-reporting was not possible, to see validity and reliability of a translator-assisted ODI score. Fifty patients with low backache and who could not use the English version were assessed with ODI with the use of two translators at a gap of 3 h in a test and retest manner. Patients were also asked to report the most important disabling activity in their day-to-day life. A total of 58 questionnaires were filled during the study period out of which eight patients (14%) self-reported English version; while 50 patients needed a translator. The Cronbach's alpha between two translators for the ODI scores of 50 patients was 0.866, but aggregate of difference between two scores for each ODI component shows high difference between two translators for question nos. 3, 9, and 10. Cronbach's alpha was best when item no. 3 was deleted (0.875, translator 1; 0.777, translator 2). Thirty-seven people did not answer the question related to sexual activity. Agreement between two values was assessed using Kendall's tau and was found good (0.585, Spearman's coefficient 0.741). Kendall's tau values correlating total ODI score and individual components show that all the items move together, but correlation was poor for question no. 3 (P value 0.16 for translator 2). Translator-assisted ODI is a good outcome assessment tool in backache assessment in places where validated local language versions are not available, but in Indian patients, inclusion of question nos. 3 and 8 related to weight lifting and sexual function needs to be reviewed.
Meta-Analysis of Fluid Intelligence Tests of Children from the Chinese Mainland with Learning Difficulties

PubMed Central

Tong, Fang; Fu, Tong

2013-01-01

Objective To evaluate the differences in fluid intelligence tests between normal children and children with learning difficulties in China. Method PubMed, MD Consult, and other Chinese Journal Database were searched from their establishment to November 2012. After finding comparative studies of Raven measurements of normal children and children with learning difficulties, full Intelligent Quotation (FIQ) values and the original values of the sub-measurement were extracted. The corresponding effect model was selected based on the results of heterogeneity and parallel sub-group analysis was performed. Results Twelve documents were included in the meta-analysis, and the studies were all performed in mainland of China. Among these, two studies were performed at child health clinics, the other ten sites were schools and control children were schoolmates or classmates. FIQ was evaluated using a random effects model. WMD was −13.18 (95% CI: −16.50–−9.85). Children with learning difficulties showed significantly lower FIQ scores than controls (P<0.00001); Type of learning difficulty and gender differences were evaluated using a fixed-effects model (I2 = 0%). The sites and purposes of the studies evaluated here were taken into account, but the reasons of heterogeneity could not be eliminated; The sum IQ of all the subgroups showed considerable heterogeneity (I2 = 76.5%). The sub-measurement score of document A showed moderate heterogeneity among all documents, and AB, B, and E showed considerable heterogeneity, which was used in a random effect model. Individuals with learning difficulties showed heterogeneity as well. There was a moderate delay in the first three items (−0.5 to −0.9), and a much more pronounced delay in the latter three items (−1.4 to −1.6). Conclusion In the Chinese mainland, the level of fluid intelligence of children with learning difficulties was lower than that of normal children. Delayed development in sub-items of C, D, and E was more obvious. PMID:24236016
An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

ERIC Educational Resources Information Center

Ali, Usama S.; Chang, Hua-Hua

2014-01-01

Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…
The Development of a Post Separation/Post Divorce Problems and Stress Scale.

ERIC Educational Resources Information Center

Raschke, Helen J.

Factors associated with the speed and level of difficulty with which individuals adjust to separation and divorce were investigated. A scale was developed to analyze these factors, and included items dealing with the subdimensions of stress and the perception of the persons involved. Factor analysis of the scale items as well as additional tests…
Determining Cloze Item Difficulty from Item and Passage Characteristics across Different Learner Backgrounds

ERIC Educational Resources Information Center

Trace, Jonathan; Brown, James Dean; Janssen, Gerriet; Kozhevnikova, Liudmila

2017-01-01

Cloze tests have been the subject of numerous studies regarding their function and use in both first language and second language contexts (e.g., Jonz & Oller, 1994; Watanabe & Koyama, 2008). From a validity standpoint, one area of investigation has been the extent to which cloze tests measure reading ability beyond the sentence level.…
Some Considerations on the Partial Credit Model

ERIC Educational Resources Information Center

Verhelst, N. D.; Verstralen, H. H. F. M.

2008-01-01

The Partial Credit Model (PCM) is sometimes interpreted as a model for stepwise solution of polytomously scored items, where the item parameters are interpreted as difficulties of the steps. It is argued that this interpretation is not justified. A model for stepwise solution is discussed. It is shown that the PCM is suited to model sums of binary…
Language Effects in International Testing: The Case of PISA 2006 Science Items

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art

2016-01-01

We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
Pick-N Multiple Choice-Exams: A Comparison of Scoring Algorithms

ERIC Educational Resources Information Center

Bauer, Daniel; Holzer, Matthias; Kopp, Veronika; Fischer, Martin R.

2011-01-01

To compare different scoring algorithms for Pick-N multiple correct answer multiple-choice (MC) exams regarding test reliability, student performance, total item discrimination and item difficulty. Data from six 3rd year medical students' end of term exams in internal medicine from 2005 to 2008 at Munich University were analysed (1,255 students,…
Item Mass and Complexity and the Arithmetic Computation of Students with Learning Disabilities.

ERIC Educational Resources Information Center

Cawley, John F.; Shepard, Teri; Smith, Maureen; Parmar, Rene S.

1997-01-01

The performance of 76 students (ages 10 to 15) with learning disabilities on four tasks of arithmetic computation within each of the four basic operations was examined. Tasks varied in difficulty level and number of strokes needed to complete all items. Intercorrelations between task sets and operations were examined as was the use of…
The Golden Rule Agreement is Psychometrically Defensible.

ERIC Educational Resources Information Center

Gonzalez-Tamayo, Eulogio

The agreement between the Educational Testing Service (ETS) and the Golden Rule Insurance Company of Illinois is interpreted as setting the general principles on which items must be selected to be included in a licensure test. These principles put a limit to the difficulty level of any item, and they also limit the size of the difference in…
A Mixture Rasch Model with a Covariate: A Simulation Study via Bayesian Markov Chain Monte Carlo Estimation

ERIC Educational Resources Information Center

Dai, Yunyun

2013-01-01

Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
Generic ABILHAND Questionnaire Can Measure Manual Ability across a Variety of Motor Impairments

ERIC Educational Resources Information Center

Simone, Anna; Rota, Viviana; Tesio, Luigi; Perucca, Laura

2011-01-01

ABILHAND is, in its original version, a 46-item, 4-level questionnaire. It measures the difficulty perceived by patients with rheumatoid arthritis as they do various daily manual tasks. ABILHAND was originally built through Rasch analysis. In a later study, it was simplified to a generic 23-item, three-level questionnaire, showing both…
Solving Graphics Problems: Student Performance in Junior Grades

ERIC Educational Resources Information Center

Lowrie, Tom; Diezmann, Carmel M.

2007-01-01

The authors investigated the performance of 172 Grade 4 students (9 to 10 years) over 12 months on a 36-item test that comprised items from 6 distinct graphical languages (e.g., maps) commonly used to convey mathematical information. Results revealed (a) difficulties in Grade 4 students' capacity to decode a variety of graphics, (b) significant…
Learning through Feature Prediction: An Initial Investigation into Teaching Categories to Children with Autism through Predicting Missing Features

ERIC Educational Resources Information Center

Sweller, Naomi

2015-01-01

Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.

PubMed

Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland

2014-04-01

To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Vision and Quality of Life Index: validation of the Indian version using Rasch analysis.

PubMed

Gothwal, Vijaya K; Bagga, Deepak K

2013-07-18

A multi-attribute utility instrument (MAUI) consists of a descriptive system in which the items and responses seek information about a concept of the universe of health-related quality of life (QoL), and responses to these items then are weighted and combined to produce the index. To our knowledge, the 6-item Vision and Quality of Life Index (VisQoL) is the only available vision-related MAUI, developed and validated in Australia, specifically for visually impaired (VI) populations. To our knowledge, the psychometric properties of the VisQoL have not yet been investigated in an Indian VI sample; this was the aim of our study. The Indian VisQoL was administered to 349 VI adults face-to-face by a trained interviewer at the Vision Rehabilitation Centres of a tertiary eye care facility, South India. Rasch analysis was used to assess the psychometric properties. Rescoring was necessary for all except one item before ordered thresholds were obtained. All items fit the Rasch model and unidimensionality was confirmed. Person separation was acceptable (2.01), indicating that the instrument can discriminate among three strata of participants" vision-related QoL (VRQoL). The VisQoL items were targeted substantially to the participants" VRQoL (-0.69 logits). One item ("ability to have friendships") demonstrated large differential item functioning by work status; working participants reported the item to be more difficult (-1.13 logits) relative to other items when compared to the nonworking participants. The 6-item Indian VisQoL satisfies unidimensional Rasch model expectations in VI patients. Disordering of response categories was evident; replication is required before a common rescoring option should be considered.
A Systematic Review of Evidence for the Psychometric Properties of the Strengths and Difficulties Questionnaire

ERIC Educational Resources Information Center

Kersten, Paula; Czuba, Karol; McPherson, Kathryn; Dudley, Margaret; Elder, Hinemoa; Tauroa, Robyn; Vandal, Alain

2016-01-01

This article synthesized evidence for the validity and reliability of the Strengths and Difficulties Questionnaire in children aged 3-5 years. A systematic review using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement guidelines was carried out. Study quality was rated using the Consensus-based Standards for the…

Stereotype Threat in Classroom Settings: The Interactive Effect of Domain Identification, Task Difficulty and Stereotype Threat on Female Students' Maths Performance

ERIC Educational Resources Information Center

Keller, Johannes

2007-01-01

Background: Stereotype threat research revealed that negative stereotypes can disrupt the performance of persons targeted by such stereotypes. This paper contributes to stereotype threat research by providing evidence that domain identification and the difficulty level of test items moderate stereotype threat effects on female students' maths…
An analysis of the masking of speech by competing speech using self-report data.

PubMed

Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot

2009-01-01

Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
Psychometric properties of the NEPSY-II affect recognition subtest in a preschool sample: a Rasch modeling approach.

PubMed

Yao, Shih-Ying; Bull, Rebecca; Khng, Kiat Hui; Rahim, Anisa

2018-01-01

Understanding a child's ability to decode emotion expressions is important to allow early interventions for potential difficulties in social and emotional functioning. This study applied the Rasch model to investigate the psychometric properties of the NEPSY-II Affect Recognition subtest, a U.S. normed measure for 3-16 year olds which assesses the ability to recognize facial expressions of emotion. Data were collected from 1222 children attending preschools in Singapore. We first performed the Rasch analysis with the raw item data, and examined the technical qualities and difficulty pattern of the studied items. We subsequently investigated the relation of the estimated affect recognition ability from the Rasch analysis to a teacher-reported measure of a child's behaviors, emotions, and relationships. Potential gender differences were also examined. The Rasch model fits our data well. Also, the NEPSY-II Affect Recognition subtest was found to have reasonable technical qualities, expected item difficulty pattern, and desired association with the external measure of children's behaviors, emotions, and relationships for both boys and girls. Overall, findings from this study suggest that the NEPSY-II Affect Recognition subtest is a promising measure of young children's affect recognition ability. Suggestions for future test improvement and research were discussed.
Improving Measurement of Trait Competitiveness: A Rasch Analysis of the Revised Competitiveness Index With Samples From New Zealand and US University Students.

PubMed

Krägeloh, Christian U; Medvedev, Oleg N; Hill, Erin M; Webster, Craig S; Booth, Roger J; Henning, Marcus A

2018-01-01

Measuring competitiveness is necessary to fully understand variables affecting student learning. The 14-item Revised Competitiveness Index has become a widely used measure to assess trait competitiveness. The current study reports on a Rasch analysis to investigate the psychometric properties of the Revised Competitiveness Index and to improve its precision for international comparisons. Students were recruited from medical studies at a university in New Zealand, undergraduate health sciences courses at another New Zealand university, and a psychology undergraduate class at a university in the United States. Rasch model estimate parameters were affected by local dependency and item misfit. Best fit to the Rasch model (χ 2 (20) = 15.86, p = .73, person separation index = .95) was obtained for the Enjoyment of Competition subscale after combining locally dependent items into a subtest and discarding the highly misfitting Item 9. The only modifications required to obtain a suitable fit (χ 2 (25) = 25.81, p = .42, person separation index = .77) for the Contentiousness subscale were a subtest to combine two locally dependent items and splitting this subtest by country to deal with differential item functioning. The results support reliability and internal construct validity of the modified Revised Competitiveness Index. Precision of the measure may be enhanced using the ordinal-to-interval conversion algorithms presented here, allowing the use of parametric statistics without breaking fundamental statistical assumptions.
An international measure of awareness and beliefs about cancer: development and testing of the ABC

PubMed Central

Simon, Alice E; Forbes, Lindsay J L; Boniface, David; Warburton, Fiona; Brain, Kate E; Dessaix, Anita; Donnelly, Michael; Haynes, Kerry; Hvidberg, Line; Lagerlund, Magdalena; Petermann, Lisa; Tishelman, Carol; Vedsted, Peter; Vigmostad, Maria Nyre; Wardle, Jane; Ramirez, Amanda J

2012-01-01

Objectives To develop an internationally validated measure of cancer awareness and beliefs; the awareness and beliefs about cancer (ABC) measure. Design and setting Items modified from existing measures were assessed by a working group in six countries (Australia, Canada, Denmark, Norway, Sweden and the UK). Validation studies were completed in the UK, and cross-sectional surveys of the general population were carried out in the six participating countries. Participants Testing in UK English included cognitive interviewing for face validity (N=10), calculation of content validity indexes (six assessors), and assessment of test–retest reliability (N=97). Conceptual and cultural equivalence of modified (Canadian and Australian) and translated (Danish, Norwegian, Swedish and Canadian French) ABC versions were tested quantitatively for equivalence of meaning (≥4 assessors per country) and in bilingual cognitive interviews (three interviews per translation). Response patterns were assessed in surveys of adults aged 50+ years (N≥2000) in each country. Main outcomes Psychometric properties were evaluated through tests of validity and reliability, conceptual and cultural equivalence and systematic item analysis. Test–retest reliability used weighted-κ and intraclass correlations. Construction and validation of aggregate scores was by factor analysis for (1) beliefs about cancer outcomes, (2) beliefs about barriers to symptomatic presentation, and item summation for (3) awareness of cancer symptoms and (4) awareness of cancer risk factors. Results The English ABC had acceptable test–retest reliability and content validity. International assessments of equivalence identified a small number of items where wording needed adjustment. Survey response patterns showed that items performed well in terms of difficulty and discrimination across countries except for awareness of cancer outcomes in Australia. Aggregate scores had consistent factor structures across countries. Conclusions The ABC is a reliable and valid international measure of cancer awareness and beliefs. The methods used to validate and harmonise the ABC may serve as a methodological guide in international survey research. PMID:23253874
A brief survey of patients' first impression after CPAP titration predicts future CPAP adherence: a pilot study.

PubMed

Balachandran, Jay S; Yu, Xiaohong; Wroblewski, Kristen; Mokhlesi, Babak

2013-03-15

CPAP adherence patterns are often established very early in the course of therapy. Our objective was to quantify patients' perception of CPAP therapy using a 6-item questionnaire administered in the morning following CPAP titration. We hypothesized that questionnaire responses would independently predict CPAP adherence during the first 30 days of therapy. We retrospectively reviewed the CPAP perception questionnaires of 403 CPAP-naïve adults who underwent in-laboratory titration and who had daily CPAP adherence data available for the first 30 days of therapy. Responses to the CPAP perception questionnaire were analyzed for their association with mean CPAP adherence and with changes in daily CPAP adherence over 30 days. Patients were aged 52 ± 14 years, 53% were women, 54% were African American, the mean body mass index (BMI) was 36.3 ± 9.1 kg/m(2), and most patients had moderate-severe OSA. Four of 6 items from the CPAP perception questionnaire- regarding difficulty tolerating CPAP, discomfort with CPAP pressure, likelihood of wearing CPAP, and perceived health benefit-were significantly correlated with mean 30-day CPAP adherence, and a composite score from these 4 questions was found to be internally consistent. Stepwise linear regression modeling demonstrated that 3 variables were significant and independent predictors of reduced mean CPAP adherence: worse score on the 4-item questionnaire, African American race, and non-sleep specialist ordering polysomnogram and CPAP therapy. Furthermore, a worse score on the 4-item CPAP perception questionnaire was consistently associated with decreased mean daily CPAP adherence over the first 30 days of therapy. In this pilot study, responses to a 4-item CPAP perception questionnaire administered to patients immediately following CPAP titration independently predicted mean CPAP adherence during the first 30 days. Further prospective validation of this questionnaire in different patient populations is warranted.
Differential Item Functioning Analysis Using Rasch Item Information Functions

ERIC Educational Resources Information Center

Wyse, Adam E.; Mapuranga, Raymond

2009-01-01

Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…
Student Difficulties in Analyzing Thin-Film Interference

ERIC Educational Resources Information Center

Newburgh, Ronald; Goodale, Douglass

2009-01-01

A question we posed in a recent final examination has uncovered a fundamental difficulty for students in understanding destructive interference. The problem stated that glass of index n[subscript 3] was coated with a thin film of a substance with index n[subscript 2]. The question then asked the student to calculate (a) the minimum coating…
Citation analysis of Acta Dermatovenerologica Alpina, Pannonica et Adriatica: 1992-2011.

PubMed

Ostrbenk, Anja; Skamperle, Mateja; Poljak, Mario

2012-09-01

Acta Dermatovenerologica Alpina, Pannonica et Adriatica is small regional professional journal that started publishing in 1992. Despite the journal's relatively narrow readership, it has significantly improved its quality and global profile during the last 20 years, as shown in this citation analysis update. Since 1992, 654 bibliographical items have been published. Among these, 545 (83.4%) were considered WoS citable items and 109 (16.6%) WoS noncitable items. Since 2008, 90% of all published items have been considered WoS citable items and received an average of 1.9 citations per item. The predicted Acta Dermatovenerol APA impact factor calculated using data from a Cited Reference search of Thomson Scientific's Web of Science has shown steep and continuous increase since 2006, when the journal acquired full indexing status in Index Medicus/Medline, and has been above 0.5 since 2008.
Diagnostic accuracy research in glaucoma is still incompletely reported: An application of Standards for Reporting of Diagnostic Accuracy Studies (STARD) 2015.

PubMed

Michelessi, Manuele; Lucenteforte, Ersilia; Miele, Alba; Oddone, Francesco; Crescioli, Giada; Fameli, Valeria; Korevaar, Daniël A; Virgili, Gianni

2017-01-01

Research has shown a modest adherence of diagnostic test accuracy (DTA) studies in glaucoma to the Standards for Reporting of Diagnostic Accuracy Studies (STARD). We have applied the updated 30-item STARD 2015 checklist to a set of studies included in a Cochrane DTA systematic review of imaging tools for diagnosing manifest glaucoma. Three pairs of reviewers, including one senior reviewer who assessed all studies, independently checked the adherence of each study to STARD 2015. Adherence was analyzed on an individual-item basis. Logistic regression was used to evaluate the effect of publication year and impact factor on adherence. We included 106 DTA studies, published between 2003-2014 in journals with a median impact factor of 2.6. Overall adherence was 54.1% for 3,286 individual rating across 31 items, with a mean of 16.8 (SD: 3.1; range 8-23) items per study. Large variability in adherence to reporting standards was detected across individual STARD 2015 items, ranging from 0 to 100%. Nine items (1: identification as diagnostic accuracy study in title/abstract; 6: eligibility criteria; 10: index test (a) and reference standard (b) definition; 12: cut-off definitions for index test (a) and reference standard (b); 14: estimation of diagnostic accuracy measures; 21a: severity spectrum of diseased; 23: cross-tabulation of the index and reference standard results) were adequately reported in more than 90% of the studies. Conversely, 10 items (3: scientific and clinical background of the index test; 11: rationale for the reference standard; 13b: blinding of index test results; 17: analyses of variability; 18; sample size calculation; 19: study flow diagram; 20: baseline characteristics of participants; 28: registration number and registry; 29: availability of study protocol; 30: sources of funding) were adequately reported in less than 30% of the studies. Only four items showed a statistically significant improvement over time: missing data (16), baseline characteristics of participants (20), estimates of diagnostic accuracy (24) and sources of funding (30). Adherence to STARD 2015 among DTA studies in glaucoma research is incomplete, and only modestly increasing over time.
Rasch analysis of Stamps's Index of Work Satisfaction in nursing population.

PubMed

Ahmad, Nora; Oranye, Nelson Ositadimma; Danilov, Alyona

2017-01-01

One of the most commonly used tools for measuring job satisfaction in nursing is the Stamps Index of Work Satisfaction. Several studies have reported on the reliability of the Stamps' tool based on traditional statistical model. The aim of this study was to apply the Rasch model to examine the adequacy of Stamps's Index of Work Satisfaction for measuring nurses' job satisfaction cross-culturally and to determine the validity and reliability of the instrument using the Rasch criteria. A secondary data analysis was conducted on a sample of 556 registered nurses from two countries. The RUMM 2030 software was used to analyse the psychometric properties of the Index of Work Satisfaction. The persons mean location of -0.018 approximated the items mean of 0.00, suggesting a good alignment of the measure and the traits being measured. However, at the items level, some items were misfiting to the Rasch model.
Title list of documents made publicly available, October 1-31, 1997

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1997-12-01

This monthly publication describes the information received and published by the U.S. Nuclear Regulatory Commission (NRC). It includes: (1) docketed material associated with civilian nuclear power plants and other uses of radioactive materials, and (2) non-docketed material received and published. This series of documents is indexed by a Personal Author Index, a Corporate Source Index, and a Report Number index. Seven docketed items are included which pertain to licensing, radioactive waste, nuclear power plant design. The 26 non-docketed items include committee reports; NRC correspondence, issuances, and reports; inspections and deficiency findings; and waste management documents.
Evaluation of the Fecal Incontinence Quality of Life Scale (FIQL) using item response theory reveals limitations and suggests revisions.

PubMed

Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A

2018-06-01

The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Human Resources Research Office Bibliography of Publications as of 30 June 1968.

ERIC Educational Resources Information Center

George Washington Univ., Alexandria, VA. Human Resources Research Office.

A bibliography has been compiled to provide as complete information as is feasible about research publications and by-products from the Human Resources Research Office (HumRRO). It includes abstracts for many items; key word out of context indexing; author indexes; and AD numbers, indicating items available to qualified users through the Defense…
Psychometric properties of DSM assessments of illicit drug abuse and dependence: results from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC).

PubMed

Lynskey, M T; Agrawal, A

2007-09-01

DSM-IV criteria for illicit drug abuse and dependence are largely based on criteria developed for alcohol use disorders and there is a lack of research evidence on the psychometric properties of these symptoms when applied to illicit drugs. This study utilizes data on abuse/dependence criteria for cannabis, cocaine, stimulants, sedatives, tranquilizers, opiates, hallucinogens and inhalants from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC, n=43 093). Analyses included factor analysis to explore the dimensionality of illicit drug abuse and dependence criteria, calculation of item difficulty and discrimination within an item response framework and a descriptive analysis of 'diagnostic orphans': individuals meeting criteria for 1-2 dependence symptoms but not abuse. Rates of psychiatric disorders were compared across groups. Results favor a uni-dimensional construct for abuse/dependence on each of the eight drug classes. Factor loadings, item difficulty and discrimination were remarkably consistent across drug categories. For each drug category, between 29% and 51% of all individuals meeting criteria for at least one symptom did not receive a formal diagnosis of either abuse or dependence and were therefore classified as 'orphans'. Mean rates of disorder in these individuals suggested that illicit drug use disorders may be more adequately described along a spectrum of severity. While there were remarkable similarities across categories of illicit drugs, consideration of item difficulty suggested that some alterations to DSM regarding the relevant severity of specific abuse and dependence criteria may be warranted.
An Ethical Issue Scale for Community Pharmacy Setting (EISP): Development and Validation.

PubMed

Crnjanski, Tatjana; Krajnovic, Dusanka; Tadic, Ivana; Stojkov, Svetlana; Savic, Mirko

2016-04-01

Many problems that arise when providing pharmacy services may contain some ethical components and the aims of this study were to develop and validate a scale that could assess difficulties of ethical issues, as well as the frequency of those occurrences in everyday practice of community pharmacists. Development and validation of the scale was conducted in three phases: (1) generating items for the initial survey instrument after qualitative analysis; (2) defining the design and format of the instrument; (3) validation of the instrument. The constructed Ethical Issue scale for community pharmacy setting has two parts containing the same 16 items for assessing the difficulty and frequency thereof. The results of the 171 completely filled out scales were analyzed (response rate 74.89%). The Cronbach's α value of the part of the instrument that examines difficulties of the ethical situations was 0.83 and for the part of the instrument that examined frequency of the ethical situations was 0.84. Test-retest reliability for both parts of the instrument was satisfactory with all Interclass correlation coefficient (ICC) values above 0.6, (for the part that examines severity ICC = 0.809, for the part that examines frequency ICC = 0.929). The 16-item scale, as a self assessment tool, demonstrated a high degree of content, criterion, and construct validity and test-retest reliability. The results support its use as a research tool to asses difficulty and frequency of ethical issues in community pharmacy setting. The validated scale needs to be further employed on a larger sample of pharmacists.
Harmonizing Measures of Cognitive Performance Across International Surveys of Aging Using Item Response Theory.

PubMed

Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D

2015-12-01

To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
Directed forgetting and aging: the role of retrieval processes, processing speed, and proactive interference.

PubMed

Hogge, Michaël; Adam, Stéphane; Collette, Fabienne

2008-07-01

The directed forgetting effect obtained with the item method is supposed to depend on both selective rehearsal of to-be-remembered (TBR) items and attentional inhibition of to-be-forgotten (TBF) items. In this study, we investigated the locus of the directed forgetting deficit in older adults by exploring the influence of recollection and familiarity-based retrieval processes on age-related differences in directed forgetting. Moreover, we explored the influence of processing speed, short-term memory capacity, thought suppression tendencies, and sensitivity to proactive interference on performance. The results indicated that older adults' directed forgetting difficulties are due to decreased recollection of TBR items, associated with increased automatic retrieval of TBF items. Moreover, processing speed and proactive interference appeared to be responsible for the decreased recall of TBR items.
Examining an Alternative to Score Equating: A Randomly Equivalent Forms Approach. Research Report. ETS RR-08-14

ERIC Educational Resources Information Center

Liao, Chi-Wen; Livingston, Samuel A.

2008-01-01

Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…
An Information Analysis of 2-, 3-, and 4-Word Verbal Discrimination Learning.

ERIC Educational Resources Information Center

Arima, James K.; Gray, Francis D.

Information theory was used to qualify the difficulty of verbal discrimination (VD) learning tasks and to measure VD performance. Words for VD items were selected with high background frequency and equal a priori probabilities of being selected as a first response. Three VD lists containing only 2-, 3-, or 4-word items were created and equated for…

The Impact of Escape Alternative Position Change in Multiple-Choice Test on the Psychometric Properties of a Test and Its Items Parameters

ERIC Educational Resources Information Center

Hamadneh, Iyad Mohammed

2015-01-01

This study aimed at investigating the impact changing of escape alternative position in multiple-choice test on the psychometric properties of a test and it's items parameters (difficulty, discrimination & guessing), and estimation of examinee ability. To achieve the study objectives, a 4-alternative multiple choice type achievement test…
Adapting a Developmental Screening Measure: Exploring the Effects of Language and Culture on a Parent-Completed SocialEmotional Screening Test

ERIC Educational Resources Information Center

Chen, Chieh-Yu; Chen, Ching-I; Squires, Jane; Bian, Xiaoyan; Heo, Kay H.; Filgueiras, Alberto; Kalinina, Svetlana; Samarina, Larissa; Ermolaeva, Evgeniya; Xie, Huichao; Yu, Ting-Ying; Wu, Pei-Fang; Landeira-Fernandez, Jesus

2017-01-01

Ages & Stages Questionnaires: Social-Emotional (ASQ:SE) is a widely used screening instrument for detecting social-emotional difficulties in infants and young children. To use a screening instrument across cultures and countries, it is necessary to identify potential item-level biases and ensure item equivalence. This study investigated the…
Measuring Ability, Speed, or Both? Challenges, Psychometric Solutions, and What Can Be Gained from Experimental Control

ERIC Educational Resources Information Center

Goldhammer, Frank

2015-01-01

The main challenge of ability tests relates to the difficulty of items, whereas speed tests demand that test takers complete very easy items quickly. This article proposes a conceptual framework to represent how performance depends on both between-person differences in speed and ability and the speed-ability compromise within persons. Related…
Developing the Impossible Figures Task to Assess Visual-Spatial Talents among Chinese Students: A Rasch Measurement Model Analysis

ERIC Educational Resources Information Center

Chan, David W.

2010-01-01

Data of item responses to the Impossible Figures Task (IFT) from 492 Chinese primary, secondary, and university students were analyzed using the dichotomous Rasch measurement model. Item difficulty estimates and person ability estimates located on the same logit scale revealed that the pooled sample of Chinese students, who were relatively highly…
Sentence comprehension in specific language impairment: a task designed to distinguish between cognitive capacity and syntactic complexity.

PubMed

Leonard, Laurence B; Deevy, Patricia; Fey, Marc E; Bredin-Oja, Shelley L

2013-04-01

This study examined sentence comprehension in children with specific language impairment (SLI) in a manner designed to separate the contribution of cognitive capacity from the effects of syntactic structure. Nineteen children with SLI, 19 typically developing children matched for age (TD-A), and 19 younger typically developing children (TD-Y) matched according to sentence comprehension test scores responded to sentence comprehension items that varied in either length or their demands on cognitive capacity, based on the nature of the foils competing with the target picture. The TD-A children were accurate across all item types. The SLI and TD-Y groups were less accurate than the TD-A group on items with greater length and, especially, on items with the greatest demands on cognitive capacity. The types of errors were consistent with failure to retain details of the sentence apart from syntactic structure. The difficulty in the more demanding conditions seemed attributable to interference. Specifically, the children with SLI and the TD-Y children appeared to have difficulty retaining details of the target sentence when the information reflected in the foils closely resembled the information in the target sentence.
Both younger and older adults have difficulty updating emotional memories.

PubMed

Nashiro, Kaoru; Sakaki, Michiko; Huffman, Derek; Mather, Mara

2013-03-01

The main purpose of the study was to examine whether emotion impairs associative memory for previously seen items in older adults, as previously observed in younger adults. Thirty-two younger adults and 32 older adults participated. The experiment consisted of 2 parts. In Part 1, participants learned picture-object associations for negative and neutral pictures. In Part 2, they learned picture-location associations for negative and neutral pictures; half of these pictures were seen in Part 1 whereas the other half were new. The dependent measure was how many locations of negative versus neutral items in the new versus old categories participants remembered in Part 2. Both groups had more difficulty learning the locations of old negative pictures than of new negative pictures. However, this pattern was not observed for neutral items. Despite the fact that older adults showed overall decline in associative memory, the impairing effect of emotion on updating associative memory was similar between younger and older adults.
Statistical Indexes for Monitoring Item Behavior under Computer Adaptive Testing Environment.

ERIC Educational Resources Information Center

Zhu, Renbang; Yu, Feng; Liu, Su

A computerized adaptive test (CAT) administration usually requires a large supply of items with accurately estimated psychometric properties, such as item response theory (IRT) parameter estimates, to ensure the precision of examinee ability estimation. However, an estimated IRT model of a given item in any given pool does not always correctly…
PARADISE 24: A Measure to Assess the Impact of Brain Disorders on People’s Lives

PubMed Central

Cieza, Alarcos; Sabariego, Carla; Anczewska, Marta; Ballert, Carolina; Bickenbach, Jerome; Cabello, Maria; Giovannetti, Ambra; Kaskela, Teemu; Mellor, Blanca; Pitkänen, Tuuli; Quintas, Rui; Raggi, Alberto; Świtaj, Piotr; Chatterji, Somnath

2015-01-01

Objective To construct a metric of the impact of brain disorders on people’s lives, based on the psychosocial difficulties (PSDs) that are experienced in common across brain disorders. Study Design Psychometric study using data from a cross-sectional study with a convenience sample of 722 persons with 9 different brain disorders interviewed in four European countries: Italy, Poland, Spain and Finland. Questions addressing 64 PSDs were first reduced based on statistical considerations, patient’s perspective and clinical expertise. Rasch analyses for polytomous data were also applied. Setting In and outpatient settings. Results A valid and reliable metric with 24 items was created. The infit of all questions ranged between 0.7 and 1.3. There were no disordered thresholds. The targeting between item thresholds and persons’ abilities was good and the person-separation index was 0.92. Persons’ abilities were linearly transformed into a more intuitive scale ranging from zero (no PSDs) to 100 (extreme PSDs). Conclusion The metric, called PARADISE 24, is based on the hypothesis of horizontal epidemiology, which affirms that people with brain disorders commonly experience PSDs. This metric is a useful tool to carry out cardinal comparisons over time of the magnitude of the psychosocial impact of brain disorders and between persons and groups in clinical practice and research. PMID:26147343
Psychometric properties of the parent́s perception uncertainty in illness scale, spanish version.

PubMed

Suarez-Acuña, C E; Carvajal-Carrascal, G; Serrano-Gómez, M E

2018-03-27

To analyze the psychometric properties of the Parents' Perception of Uncertainty in Illness Scale, parents/children, adapted to Spanish. A descriptive methodological study involving the translation into Spanish of the Parents' Perception of Uncertainty in Illness Scale, parents/children, and analysis of their face validity, content validity, construct validity and internal consistency. The original version of the scale in English was translated into Spanish, and approved by its author. Six face validity items with comprehension difficulty were reported; which were reviewed and adapted, keeping its structure. The global content validity index with expert appraisal was 0.94. In the exploratory analysis of factors, 3 dimensions were identified: ambiguity and lack of information, unpredictability and lack of clarity, with a KMO=0.846, which accumulated 91.5% of the explained variance. The internal consistency of the scale yielded a Cronbach alpha of 0.86 demonstrating a good level of correlation between items. The Spanish version of "Parent's Perception of Uncertainty in Illness Scale" is a valid and reliable tool that can be used to determine the level of uncertainty of parents facing the illness of their children. Copyright © 2018 Sociedad Española de Enfermería Intensiva y Unidades Coronarias (SEEIUC). Publicado por Elsevier España, S.L.U. All rights reserved.
Modeling Polymorphemic Word Recognition: Exploring Differences among Children with Early-Emerging and Late- Emerging Word Reading Difficulty

ERIC Educational Resources Information Center

Kearns, Devin M.; Steacy, Laura M.; Compton, Donald L.; Gilbert, Jennifer K.; Goodwin, Amanda P.; Cho, Eunsoo; Lindstrom, Esther R.; Collins, Alyson A.

2016-01-01

Comprehensive models of derived polymorphemic word recognition skill in developing readers, with an emphasis on children with reading difficulty (RD), have not been developed. The purpose of the present study was to model individual differences in polymorphemic word recognition ability at the item level among 5th-grade children (N = 173)…
Examining the Structural Validity of the Strengths and Difficulties Questionnaire (SDQ) in a U.S. Sample of Custodial Grandmothers

ERIC Educational Resources Information Center

Palmieri, Patrick A.; Smith, Gregory C.

2007-01-01

The authors examined the structural validity of the parent informant version of the Strengths and Difficulties Questionnaire (SDQ) with a sample of 733 custodial grandparents. Three models of the SDQ's factor structure were evaluated with confirmatory factor analysis based on the item covariance matrix. Although indices of fit were good across all…
Understanding Test-Takers' Perceptions of Difficulty in EAP Vocabulary Tests: The Role of Experiential Factors

ERIC Educational Resources Information Center

Oruç Ertürk, Nesrin; Mumford, Simon E.

2017-01-01

This study, conducted by two researchers who were also multiple-choice question (MCQ) test item writers at a private English-medium university in an English as a foreign language (EFL) context, was designed to shed light on the factors that influence test-takers' perceptions of difficulty in English for academic purposes (EAP) vocabulary, with the…
An Evaluation of Different Statistical Targets for Assembling Parallel Forms in Item Response Theory

PubMed Central

Ali, Usama S.; van Rijn, Peter W.

2015-01-01

Assembly of parallel forms is an important step in the test development process. Therefore, choosing a suitable theoretical framework to generate well-defined test specifications is critical. The performance of different statistical targets of test specifications using the test characteristic curve (TCC) and the test information function (TIF) was investigated. Test length, the number of test forms, and content specifications are considered as well. The TCC target results in forms that are parallel in difficulty, but not necessarily in terms of precision. Vice versa, test forms created using a TIF target are parallel in terms of precision, but not necessarily in terms of difficulty. As sometimes the focus is either on TIF or TCC, differences in either difficulty or precision can arise. Differences in difficulty can be mitigated by equating, but differences in precision cannot. In a series of simulations using a real item bank, the two-parameter logistic model, and mixed integer linear programming for automated test assembly, these differences were found to be quite substantial. When both TIF and TCC are combined into one target with manipulation to relative importance, these differences can be made to disappear.
Perceived difficulty in the theory of planned behaviour: perceived behavioural control or affective attitude?

PubMed

Kraft, Pål; Rise, Jostein; Sutton, Stephen; Røysamb, Espen

2005-09-01

A study was conducted to explore (a) the dimensional structure of perceived behavioural control (PBC), (b) the conceptual basis of perceived difficulty items, and (c) how PBC components and instrumental and affective attitudes, respectively, relate to intention and behaviour. The material stemmed from a two-wave study of Norwegian graduate students (N = 227 for the prediction of intention and N = 110 for the prediction of behaviour). Data were analysed using confirmatory factor analysis (CFA) and multiple regression by the application of structural equation modelling (SEM). CFA suggested that PBC could be conceived of as consisting of three separate but interrelated factors (perceived control, perceived confidence and perceived difficulty), or as two separate but interrelated factors representing self-efficacy (measured by perceived difficulty and perceived confidence or by just perceived confidence) and perceived control. However, the perceived difficulty items also overlapped substantially with affective attitude. Perceived confidence was a strong predictor of exercise intention but not of recycling intention. Perceived control, however, was a strong predictor of recycling intention but not exercise intention. Affective attitudes but not instrumental attitudes were identified as substantial predictors of intentions. The findings suggest that at least under some circumstances it may be inadequate to measure PBC by means of perceived difficulty. One possible consequence may be that the role of PBC as a predictor of intention is somewhat overestimated, whereas the role of (affective) attitude may be similarly underestimated.
The effects of individual factors and school environment on mental health and prejudiced attitudes among Norwegian adolescents.

PubMed

Andersson, Helle Wessel; Bjørngaard, Johan Håkon; Kaspersen, Silje Lill; Wang, Catharina E A; Skre, Ingunn; Dahl, Thomas

2010-05-01

The aim was to examine the prevalence of mental health difficulties and prejudices toward mental illness among adolescents, and to analyze possible school and school class effects on these issues. The sample comprised 4,046 pupils (16-19 years) in 257 school classes from 45 Norwegian upper secondary schools. The estimated response rate among the pupils was about 96%. Self-reported mental health difficulties were measured with a four-item scale that covered emotional and behavioral difficulties. Prejudiced attitudes toward mental illness were assessed using a nine-item scale. Multilevel regression analysis was used to estimate the contribution of factors at the individual level, and at the school and class levels. Most of the variance in self-reported mental health difficulties and prejudices was accounted for by individual level factors (92-94%). However, there were statistically significant school and class level effects (P < 0.01), confounded by socioeconomic factors. Mental health difficulties were commonly reported, more often by females than males (P < 0.01). Difficulties with emotions and attention were the two main problem areas, with definite to severe difficulties being reported by 19 and 21% of the females, and by 9 and 16% of the males, respectively. Prejudices were reported more often by males than females (P < 0.01). Both self-reported mental health difficulties and prejudiced attitudes were related to educational program, living situation, and parental education (P < 0.01). The relatively high prevalences of mental health difficulties and prejudiced attitudes toward mental illness among adolescents indicate a need for effective mental health intervention programs. Targeted intervention strategies should be considered when there is evidence of a high number of risk factors in schools and school classes. Furthermore, the gender differences found in self-reported mental health difficulties and prejudices suggest a need for gender-differentiated programs.
An Efficient Index Dissemination in Unstructured Peer-to-Peer Networks

NASA Astrophysics Data System (ADS)

Takahashi, Yusuke; Izumi, Taisuke; Kakugawa, Hirotsugu; Masuzawa, Toshimitsu

Using Bloom filters is one of the most popular and efficient lookup methods in P2P networks. A Bloom filter is a representation of data item indices, which achieves small memory requirement by allowing one-sided errors (false positive). In the lookup scheme besed on the Bloom filter, each peer disseminates a Bloom filter representing indices of the data items it owns in advance. Using the information of disseminated Bloom filters as a clue, each query can find a short path to its destination. In this paper, we propose an efficient extension of the Bloom filter, called a Deterministic Decay Bloom Filter (DDBF) and an index dissemination method based on it. While the index dissemination based on a standard Bloom filter suffers performance degradation by containing information of too many data items when its dissemination radius is large, the DDBF can circumvent such degradation by limiting information according to the distance between the filter holder and the items holders, i. e., a DDBF contains less information for faraway items and more information for nearby items. Interestingly, the construction of DDBFs requires no extra cost above that of standard filters. We also show by simulation that our method can achieve better lookup performance than existing ones.
Establishing a Measurement Tool for a Nursing Work Environment in Taiwan.

PubMed

Lin, Li-Chiu; Lee, Huan-Fang; Yen, Miaofen

2017-02-01

The nursing work environment is a critical global health care problem. Many health care providers are concerned about the associations between the nursing work environment and the outcomes of organizations, nurses, and patients. Nursing work environment instruments have been assessed in the West but have not been considered in Asia. However, different cultures will affect the factorial structure of the tool. Using a stratified nationwide random sample, we created a measurement tool for the nursing work environment in Taiwan. The Nursing Work Environment Index-Revised Scale and the Essentials of Magnetism scale were used to examine the factorial structure. Item analysis, exploratory factor analysis, and confirmatory factor analysis were used to examine the hypothesis model and generate a new factorial structure. The Taiwan Nursing Work Environment Index (TNWEI) was established to evaluate the nursing work environment in Taiwan. The four factors were labeled "Organizational Support" (7 items), "Nurse Staffing and Resources" (4 items), "Nurse-Physician Collaboration" (4 items), and "Support for Continuing Education" (4 items). The 19 items explained 58.5% of the variance. Confirmatory factor analysis showed a good fit to the model (x2/df = 5.99; p < .05, goodness of fit index [GFI] = .90; RMSEA = .07). The TNWEI provides a comprehensive and efficient method for measuring the nurses' work environment in Taiwan.
Developing and validating a chemical bonding instrument for Korean high school students

NASA Astrophysics Data System (ADS)

Jang, Nak Han

The major purpose of this study was to develop a reliable and valid instrument designed to collect and investigate on Korean high school students' understanding about concepts regarding chemical bonding. The Chemical Bonding Diagnostic Test (CBDT) was developed by the procedure by previously relevant researches (Treagust, 1985; Peterson, 1986; Tan, 1994). The final instrument consisted of 15 two-tier items. The reliability coefficient (Cronbach alpha) for the whole test was 0.74. Also, the range of values for the discrimination index was from 0.38 to 0.90 and the overall average difficulty index was 0.38. The test was administered to 716 science declared students in Korean high school. The 37 common misconceptions on chemical bonding were identified through analysis of the items from the CBDT. The grade 11 students had slightly more misconceptions than the grade 12 students for ionic bonding, covalent bonding, and hydrogen bonding while the grade 12 students had more misconceptions about octet rule and hydrogen bonding than the grade 11 students. From the analysis of ANCOVA, there was no significant difference in grades, and between grade levels and gender on the mean score of CBDT. However, there was a significant difference in gender and a significant interaction between grade levels and chemistry preference. In conclusion, Korean high school students had the most common misconception about the electron configuration on ionic bonding and the water density on hydrogen bonding. Korean students' understanding about the chemical bonding was dependent on the interaction between grade levels and the chemistry preference. Consequently, grade 12 chemistry-preferred students had the highest mean scores among student groups concerned by this study.
Development of the IBD Disk: A Visual Self-administered Tool for Assessing Disability in Inflammatory Bowel Diseases.

PubMed

Ghosh, Subrata; Louis, Edouard; Beaugerie, Laurent; Bossuyt, Peter; Bouguen, Guillaume; Bourreille, Arnaud; Ferrante, Marc; Franchimont, Denis; Frost, Karen; Hebuterne, Xavier; Marshall, John K; OʼShea, Ciara; Rosenfeld, Greg; Williams, Chadwick; Peyrin-Biroulet, Laurent

2017-03-01

The Inflammatory bowel disease (IBD) Disability Index is a validated tool that evaluates functional status; however, it is used mainly in the clinical trial setting. We describe the use of an iterative Delphi consensus process to develop the IBD Disk-a shortened, self-administered adaption of the validated IBD Disability Index-to give immediate visual representation of patient-reported IBD-related disability. In the preparatory phase, the IBD CONNECT group (30 health care professionals) ranked IBD Disability Index items in the perceived order of importance. The Steering Committee then selected 10 items from the IBD Disability Index to take forward for inclusion in the IBD Disk. In the consensus phase, the items were refined and agreed by the IBD Disk Working Group (14 gastroenterologists) using an online iterative Delphi consensus process. Members could also suggest new element(s) or recommend changes to included elements. The final items for the IBD Disk were agreed in February 2016. After 4 rounds of voting, the following 10 items were agreed for inclusion in the IBD Disk: abdominal pain, body image, education and work, emotions, energy, interpersonal interactions, joint pain, regulating defecation, sexual functions, and sleep. All elements, except sexual functions, were included in the validated IBD Disability Index. The IBD Disk has the potential to be a valuable tool for use at a clinical visit. It can facilitate assessment of inflammatory bowel disease-related disability relevant to both patients and physicians, discussion on specific disability-related issues, and tracking changes in disease burden over time.
The promise and challenge of including multimedia items in medical licensure examinations: some insights from an empirical trial.

PubMed

Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank

2010-10-01

The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.

Instructional Sensitivity Statistics Appropriate for Objectives-Based Test Items. CSE Report No. 91.

ERIC Educational Resources Information Center

Kosecoff, Jacqueline B.; Klein, Stephen P.

Two types of sensitivity indices were developed in this paper, one internal to the total test and the second external. To evaluate the success of these statistics the three criteria suggested for a satisfactory index of item quality were considered. The Internal Sensitivity Index appears to meet these demands. Certainly it is easily computed. In…
Sixteen-Item Anxiety Sensitivity Index: Confirmatory Factor Analytic Evidence, Internal Consistency, and Construct Validity in a Young Adult Sample from the Netherlands

ERIC Educational Resources Information Center

Vujanovic, Anka A.; Arrindell, Willem A.; Bernstein, Amit; Norton, Peter J.; Zvolensky, Michael J.

2007-01-01

The present investigation examined the factor structure, internal consistency, and construct validity of the 16-item Anxiety Sensitivity Index (ASI; Reiss Peterson, Gursky, & McNally 1986) in a young adult sample (n = 420) from the Netherlands. Confirmatory factor analysis was used to comparatively evaluate two-factor, three-factor, and…
Analysis of Item Response Patterns: Consistency Indices and Their Application to Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Harnisch, Delwyn L.

The major emphasis of this paper is in the examination of test item response patterns. Tatsuoka and Tatsuoka (1980) have developed two indices of response consistency: the norm-conformity index (NCI) and the individual consistency index (ICI). The NCI provides a measure of the degree of consistency between the response pattern of an individual and…
The Chiropractic Care of Infants with Breastfeeding Difficulties.

PubMed

Alcantara, Joel; Alcantara, Joey D; Alcantara, Junjoe

2015-01-01

Chiropractors have long advocated on the benefits of breastfeeding and given the realized and potential role of chiropractors in the care of infants with breastfeeding difficulties, we performed this review of the literature on the subject to inform clinical practice. For this article, we searched Pubmed [1966-2013], Manual, Alternative and Natural Therapy Index System (MANTIS) [1964-2013] and Index to Chiropractic Literature [1984-2013] for the relevant literature. The search terms utilized "breastfeeding", "breast feeding", "breastfeeding difficulties", "breastfeeding difficulty", "TMJ dysfunction", "temporomandibular joint", "birth trauma" and "infants", in the appropriate Boolean combinations. We also examined non-peer-reviewed articles as revealed by Index to Chiropractic Literature and secondary analysis of references. Inclusion criteria for review included breastfeeding difficulties regardless of peer-review and written in the English language. A total of 24 articles met our inclusion criteria. These consisted of 8 case reports, 2 case series, and 3 cohort studies. We were also able to identify 6 manuscripts (5 case reports and a case series) that involved breastfeeding difficulties as a secondary complaint. Our findings reveal a theoretical and clinical framework based on the detection of spinal and extraspinal subluxations involving the cervico-cranio-mandibular complex and assessment of the infant while breastfeeding. Chiropractors care of infants with breastfeeding difficulties by addressing spinal and extraspinal subluxations involving the cervico-cranio-mandibular complex. Copyright © 2015 Elsevier Inc. All rights reserved.
Psychometric properties of the Chinese version of the Menopause-Specific Quality-of-Life questionnaire

PubMed Central

Nie, Guangning; Yang, Hongyan; Liu, Jian; Zhao, ChunMei; Wang, Xiaoyun

2017-01-01

Abstract Objective: The Menopause-Specific Quality-of-Life (MENQOL) questionnaire was developed as a specific tool to measure the health-related quality-of-life of postmenopausal women. Thus far, the Chinese version questionnaire has not been subjected to psychometric assessment with a large sample. This study aims to evaluate the validity and reliability of the Chinese version of the MENQOL specific to postmenopausal women in China. Methods: A total of 1,137 menopausal symptomatic and 491 menopausal asymptomatic women from eight cities in China were recruited using a convenience sampling method. Psychometric properties were evaluated by descriptive statistics, validity, and reliability. Reliability was assessed for each subscale of the MENQOL through internal consistency reliability with Cronbach's α and intersubscale correlations. Item-domain correlations, principal components analysis (PCA), and confirmatory factor analysis were performed to determine construct validity. t tests were used to compare the differences between the menopausal symptomatic and asymptomatic women and to evaluate the discriminate validity. Pearson correlation coefficients were calculated between MENQOL scores and the Kupperman index to assess criterion-related validity. Results: The most common symptoms in Chinese menopausal symptomatic women were “experiencing poor memory” (94.4%), “feeling tired or worn out” (93.8%), “aching in muscle and joints” (89.4%), “low backache” (86.9%), “decrease in physical strength” (86.6%), “aches in back of neck or head” (86.2%), “difficulty sleeping” (83.6%), “accomplishing less than I used to” (83.4%), “feeling a lack of energy” (83.3%), “change in your sexual desire” (81%), and “hot flash” (80.7%) among others. The symptoms of “increased facial hair” were rarely seen (9.9%). The vasomotor domain, as well as psychosocial, physical, and sexual domains showed high reliability (Cronbach's α 0.84, 0.87, 0.89, and 0.86, respectively). Item-domain correlation analysis showed that all items correlated more strongly with their own domains than with other domains. In the PCA, after deleting the “increased facial hair” item, items in the vasomotor, sexual, and psychosocial subscales loaded on their respective domains by and large, and items in the physical subscale divided into two factors. The PCA revealed a latent structure of the Chinese version of MENQOL nearly identical to the original MENQOL domains. The confirmatory factor analysis demonstrated that the questionnaire fits well with a four-domain model. The MENQOL can discriminate between menopausal symptomatic women with asymptomatic women as it showed good discriminate validity. Criterion-related validity was confirmed by a significant correlation between MENQOL scores and the Kupperman index. Conclusions: This study showed that Chinese version of MENQOL has good psychometric properties and would be suitable to measure the health-related quality-of-life of Chinese menopausal women except for item 21 (increased facial hair). PMID:27922934
Development of a brief parent-report risk index for children following parental divorce.

PubMed

Tein, Jenn-Yun; Sandler, Irwin N; Braver, Sanford L; Wolchik, Sharlene A

2013-12-01

This article reports on the development of a brief 15-item parent-report risk index (Child Risk Index for Divorced or Separated Families; CRI-DS) to predict problem outcomes of children who have experienced parental divorce. A series of analyses using 3 data sets were conducted that identified and cross-validated a parsimonious set of items representing parent report of child behavior problems and family level risk and protective factors, each of which contributed to the predictive accuracy of the index. The index predicted child behavior outcomes and substance abuse problems up to 6 years later. The index has acceptable levels of sensitivity and specificity as a screening measure to predict problem outcomes up to 1 year later. The use of the index to identify the need for preventive services is discussed, along with limitations of the study.
Dual processing theory and experts' reasoning: exploring thinking on national multiple-choice questions.

PubMed

Durning, Steven J; Dong, Ting; Artino, Anthony R; van der Vleuten, Cees; Holmboe, Eric; Schuwirth, Lambert

2015-08-01

An ongoing debate exists in the medical education literature regarding the potential benefits of pattern recognition (non-analytic reasoning), actively comparing and contrasting diagnostic options (analytic reasoning) or using a combination approach. Studies have not, however, explicitly explored faculty's thought processes while tackling clinical problems through the lens of dual process theory to inform this debate. Further, these thought processes have not been studied in relation to the difficulty of the task or other potential mediating influences such as personal factors and fatigue, which could also be influenced by personal factors such as sleep deprivation. We therefore sought to determine which reasoning process(es) were used with answering clinically oriented multiple-choice questions (MCQs) and if these processes differed based on the dual process theory characteristics: accuracy, reading time and answering time as well as psychometrically determined item difficulty and sleep deprivation. We performed a think-aloud procedure to explore faculty's thought processes while taking these MCQs, coding think-aloud data based on reasoning process (analytic, nonanalytic, guessing or combination of processes) as well as word count, number of stated concepts, reading time, answering time, and accuracy. We also included questions regarding amount of work in the recent past. We then conducted statistical analyses to examine the associations between these measures such as correlations between frequencies of reasoning processes and item accuracy and difficulty. We also observed the total frequencies of different reasoning processes in the situations of getting answers correctly and incorrectly. Regardless of whether the questions were classified as 'hard' or 'easy', non-analytical reasoning led to the correct answer more often than to an incorrect answer. Significant correlations were found between self-reported recent number of hours worked with think-aloud word count and number of concepts used in the reasoning but not item accuracy. When all MCQs were included, 19 % of the variance of correctness could be explained by the frequency of expression of these three think-aloud processes (analytic, nonanalytic, or combined). We found evidence to support the notion that the difficulty of an item in a test is not a systematic feature of the item itself but is always a result of the interaction between the item and the candidate. Use of analytic reasoning did not appear to improve accuracy. Our data suggest that individuals do not apply either System 1 or System 2 but instead fall along a continuum with some individuals falling at one end of the spectrum.
Vulnerability Risk Index Profile for Elder Abuse in Community-Dwelling Population

PubMed Central

Dong, XinQi; Simon, Melissa A.

2013-01-01

Objectives Elder abuse is associated with increased morbidity and mortality. This study aims to develop a vulnerability index for elder abuse in a community-dwelling population. Design Population-based study Setting Geographically defined community in Chicago. Participants A population-based study was conducted in Chicago of community-dwelling older adults who participated in the Chicago Health and Aging Project (CHAP). Of the 8,157 participants in the CHAP study, 213 participants were reported to social services agency for suspected elder abuse. Measurements A vulnerability index for elder abuse was constructed from sociodemographic, health-related, and psychosocial factors. The outcomes of interest were reported and confirmed elder abuse. Logistic regression models were used to determine the accuracy of the index with respect to elder abuse outcomes. Results Out of the selected risk index for elder abuse, every one point increase in the 9 item vulnerability index items, there was a two fold increase in the risk for reported elder abuse (OR, 2.19 (2.00–2.40) and confirmed elder abuse (OR, 2.19 (1.94–2.47). Compared to the reference group, older adults with 3–4 vulnerability index items had increased risk for reported elder abuse (OR, 2.98 (1.98–4.49) and confirmed elder abuse (OR, 3.90, (2.07–7.36); and older adults with 5 or more risk index items, there was an 18 fold increase in risk for reported elder abuse (OR, 18.46 (12.15–28.04) and confirmed elder abuse (OR, 26.79 (14.18–50.61). Receiver Operating Characteristic (ROC) statistically derived curves for identifying reported elder abuse ranged between 0.77–0.84 and for predicting confirmed elder abuse ranged between 0.79–0.86. Conclusion The vulnerability risk index demonstrates value for identifying individuals at risk for elder abuse. Additional studies are needed to validate this index in other community dwelling populations. PMID:25180376
[Item function analysis on the Quality of Life-Alzheimer's Disease(QOL-AD)Chinese version, based on the Item Response Theory(IRT)].

PubMed

Wan, Li-ping; He, Run-lian; Ai, Yong-mei; Zhang, Hui-min; Xing, Min; Yang, Lin; Song, Yan-long; Yu, Hong-mei

2013-07-01

To introduce the Item Function Analysis(IFA) of Quality of Life- Alzheimer's disease(QOL-AD)Chinese version and to explore the feasibility of its application on Chinese patients with AD. Two hundred AD patients were interviewed and assessed by QOL-AD, through the stratified cluster sampling method. Multilog 7.03. was used for Item Function Analysis. Difference scale(a), difficulty scale(b)and Item Characteristic Curve(ICC) of each item of QOL-AD were provided. Different scales of the item 1, 7 were below 0.6, while all the others were above 0.6. As for ICC. The first and last lines for the other items were monotonic in which the two in between were in inverted V-shape, with very steep slopes, except for the item 1 and 7. Results form the IFA showed that QOL-AD was applicable to be used in the Chinese patients with AD.
HoNOSCA-D As a Measure of the Severity of Diagnosed Mental Disorders in Children and Adolescents—Psychometric Properties of the German Translation

PubMed Central

von Wyl, Agnes; Toggweiler, Stephan; Zollinger, Ruedi

2017-01-01

The Health of the Nation Outcome Scales for Children and Adolescents (HoNOSCA), in use worldwide, is a 13-item measure assessing the biopsychosocial severity of mental health problems in children and adolescents. This article introduces the authorized German-language version of HoNOSCA, the HoNOSCA-D, and examines and discusses its psychometric properties based on a clinical sample of 1,533 children and adolescents aged 4;0 to 17;11 years. For the HoNOSCA-D total score (severity of mental health problems), internal consistency (Cronbach’s alpha) was 0.63. The discriminative power of the items ranged from 0.07 to 0.44; the average interitem correlation was 0.11. Due to this stochastic independence, calculation of a total severity index is acceptable. Using factor analysis, the principal axis factoring and varimax rotation resulted in a four-factor structure, which with a Kaiser–Meyer–Olkin measure of sampling adequacy of 0.684 explained 30.62% of total variance. The convergent correlations with the German-language parent report version of the Strengths and Difficulties Questionnaire were as expected and showed a medium effect size. Gender and age differences in the HoNOSCA-D total score were small. Regarding the 13 items gender and age differences were negligible to medium. The highest severity was found for schizophrenia and psychotic disorders, followed by affective disorders and social behavior disorders. Overall, validity of HoNOSCA-D was clearly supported. PMID:29033858
The dialysis orders objective structured clinical examination (OSCE): a formative assessment for nephrology fellows.

PubMed

Prince, Lisa K; Campbell, Ruth C; Gao, Sam W; Kendrick, Jessica; Lebrun, Christopher J; Little, Dustin J; Mahoney, David L; Maursetter, Laura A; Nee, Robert; Saddler, Mark; Watson, Maura A; Yuan, Christina M

2018-04-01

Few quantitative nephrology-specific simulations assess fellow competency. We describe the development and initial validation of a formative objective structured clinical examination (OSCE) assessing fellow competence in ordering acute dialysis. The three test scenarios were acute continuous renal replacement therapy, chronic dialysis initiation in moderate uremia and acute dialysis in end-stage renal disease-associated hyperkalemia. The test committee included five academic nephrologists and four clinically practicing nephrologists outside of academia. There were 49 test items (58 points). A passing score was 46/58 points. No item had median relevance less than 'important'. The content validity index was 0.91. Ninety-five percent of positive-point items were easy-medium difficulty. Preliminary validation was by 10 board-certified volunteers, not test committee members, a median of 3.5 years from graduation. The mean score was 49 [95% confidence interval (CI) 46-51], κ = 0.68 (95% CI 0.59-0.77), Cronbach's α = 0.84. We subsequently administered the test to 25 fellows. The mean score was 44 (95% CI 43-45); 36% passed the test. Fellows scored significantly less than validators (P < 0.001). Of evidence-based questions, 72% were answered correctly by validators and 54% by fellows (P = 0.018). Fellows and validators scored least well on the acute hyperkalemia question. In self-assessing proficiency, 71% of fellows surveyed agreed or strongly agreed that the OSCE was useful. The OSCE may be used to formatively assess fellow proficiency in three common areas of acute dialysis practice. Further validation studies are in progress.
A Classical Test Theory Analysis of the Light and Spectroscopy Concept Inventory National Study Data Set

ERIC Educational Resources Information Center

Schlingman, Wayne M.; Prather, Edward E.; Wallace, Colin S.; Brissenden, Gina; Rudolph, Alexander L.

2012-01-01

This paper is the first in a series of investigations into the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI). In this paper, we use classical test theory to form a framework of results that will be used to evaluate individual item difficulties, item discriminations, and the overall reliability of the…
Brain waves-based index for workload estimation and mental effort engagement recognition

NASA Astrophysics Data System (ADS)

Zammouri, A.; Chraa-Mesbahi, S.; Ait Moussa, A.; Zerouali, S.; Sahnoun, M.; Tairi, H.; Mahraz, A. M.

2017-10-01

The advent of the communication systems and considering the complexity that some impose in their use, it is necessary to incorporate and equip these systems with a certain intelligence which takes into account the cognitive and mental capacities of the human operator. In this work, we address the issue of estimating the mental effort of an operator according to the cognitive tasks difficulty levels. Based on the Electroencephalogram (EEG) measurements, the proposed approach analyzes the user’s brain activity from different brain regions while performing cognitive tasks with several levels of difficulty. At a first time, we propose a variances comparison-based classifier (VCC) that makes use of the Power Spectral Density (PSD) of the EEG signal. The aim of using such a classifier is to highlight the brain regions that enter into interaction according to the cognitive task difficulty. In a second time, we present and describe a new EEG-based index for the estimation of mental efforts. The designed index is based on information recorded from two EEG channels. Results from the VCC demonstrate that powers of the Theta [4-7 Hz] (θ) and Alpha [8-12 Hz] (α) oscillations decrease while increasing the cognitive task difficulty. These decreases are mainly located in parietal and temporal brain regions. Based on the Kappa coefficients, decisions of the introduced index are compared to those obtained from an existing index. This performance assessment method revealed strong agreements. Hence the efficiency of the introduced index.
Validation of the content of the prevention protocol for early sepsis caused by Streptococcus agalactiaein newborns

PubMed Central

da Silva, Fabiana Alves; Vidal, Cláudia Fernanda de Lacerda; de Araújo, Ednaldo Cavalcante

2015-01-01

Abstract Objective: to validate the content of the prevention protocol for early sepsis caused by Streptococcus agalactiaein newborns. Method: a transversal, descriptive and methodological study, with a quantitative approach. The sample was composed of 15 judges, 8 obstetricians and 7 pediatricians. The validation occurred through the assessment of the content of the protocol by the judges that received the instrument for data collection - checklist - which contained 7 items that represent the requisites to be met by the protocol. The validation of the content was achieved by applying the Content Validity Index. Result: in the judging process, all the items that represented requirements considered by the protocol obtained concordance within the established level (Content Validity Index > 0.75). Of 7 items, 6 have obtained full concordance (Content Validity Index 1.0) and the feasibility item obtained a Content Validity Index of 0.93. The global assessment of the instruments obtained a Content Validity Index of 0.99. Conclusion: the validation of content that was done was an efficient tool for the adjustment of the protocol, according to the judgment of experienced professionals, which demonstrates the importance of conducting a previous validation of the instruments. It is expected that this study will serve as an incentive for the adoption of universal tracking by other institutions through validated protocols. PMID:26444165
Disability Measurement for Korean Community-Dwelling Adults With Stroke: Item-Level Psychometric Analysis of the Korean Longitudinal Study of Ageing

PubMed Central

2018-01-01

Objective To investigate the psychometric properties of the activities of daily living (ADL) instrument used in the analysis of Korean Longitudinal Study of Ageing (KLoSA) dataset. Methods A retrospective study was carried out involving 2006 KLoSA records of community-dwelling adults diagnosed with stroke. The ADL instrument used for the analysis of KLoSA included 17 items, which were analyzed using Rasch modeling to develop a robust outcome measure. The unidimensionality of the ADL instrument was examined based on confirmatory factor analysis with a one-factor model. Item-level psychometric analysis of the ADL instrument included fit statistics, internal consistency, precision, and the item difficulty hierarchy. Results The study sample included a total of 201 community-dwelling adults (1.5% of the Korean population with an age over 45 years; mean age=70.0 years, SD=9.7) having a history of stroke. The ADL instrument demonstrated unidimensional construct. Two misfit items, money management (mean square [MnSq]=1.56, standardized Z-statistics [ZSTD]=2.3) and phone use (MnSq=1.78, ZSTD=2.3) were removed from the analysis. The remaining 15 items demonstrated good item fit, high internal consistency (person reliability=0.91), and good precision (person strata=3.48). The instrument precisely estimated person measures within a wide range of theta (−4.75 logits < θ < 3.97 logits) and a reliability of 0.9, with a conceptual hierarchy of item difficulty. Conclusion The findings indicate that the 15 ADL items met Rasch expectations of unidimensionality and demonstrated good psychometric properties. It is proposed that the validated ADL instrument can be used as a primary outcome measure for assessing longitudinal disability trajectories in the Korean adult population and can be employed for comparative analysis of international disability across national aging studies. PMID:29765888
Psychometrics of the self-report safe driving behavior measure for older adults.

PubMed

Classen, Sherrilene; Wen, Pey-Shan; Velozo, Craig A; Bédard, Michel; Winter, Sandra M; Brumback, Babette; Lanford, Desiree N

2012-01-01

We investigated the psychometric properties of the 68-item Safe Driving Behavior Measure (SDBM) with 80 older drivers, 80 caregivers, and 2 evaluators from two sites. Using Rasch analysis, we examined unidimensionality and local dependence; rating scale; item- and person-level psychometrics; and item hierarchy of older drivers, caregivers, and driving evaluators who had completed the SDBM. The evidence suggested the SDBM is unidimensional, but pairs of items showed local dependency. Across the three rater groups, the data showed good person (≥3.4) and item (≥3.6) separation as well as good person (≥.93) and item reliability (≥.92). Cronbach's α was ≥.96, and few items were misfitting. Some of the items did not follow the hypothesized order of item difficulty. The SDBM classified the older drivers into six ability levels, but to fully calibrate the instrument it must be refined in terms of its items (e.g., item exclusion) and then tested among participants of lesser ability. Copyright © 2012 by the American Occupational Therapy Association, Inc.
Cognitive testing of tobacco use items for administration to patients with cancer and cancer survivors in clinical research.

PubMed

Land, Stephanie R; Warren, Graham W; Crafts, Jennifer L; Hatsukami, Dorothy K; Ostroff, Jamie S; Willis, Gordon B; Chollette, Veronica Y; Mitchell, Sandra A; Folz, Jasmine N M; Gulley, James L; Szabo, Eva; Brandon, Thomas H; Duffy, Sonia A; Toll, Benjamin A

2016-06-01

To the authors' knowledge, there are currently no standardized measures of tobacco use and secondhand smoke exposure in patients diagnosed with cancer, and this gap hinders the conduct of studies examining the impact of tobacco on cancer treatment outcomes. The objective of the current study was to evaluate and refine questionnaire items proposed by an expert task force to assess tobacco use. Trained interviewers conducted cognitive testing with cancer patients aged ≥21 years with a history of tobacco use and a cancer diagnosis of any stage and organ site who were recruited at the National Institutes of Health Clinical Center in Bethesda, Maryland. Iterative rounds of testing and item modification were conducted to identify and resolve cognitive issues (comprehension, memory retrieval, decision/judgment, and response mapping) and instrument navigation issues until no items warranted further significant modification. Thirty participants (6 current cigarette smokers, 1 current cigar smoker, and 23 former cigarette smokers) were enrolled from September 2014 to February 2015. The majority of items functioned well. However, qualitative testing identified wording ambiguities related to cancer diagnosis and treatment trajectory, such as "treatment" and "surgery"; difficulties with lifetime recall; errors in estimating quantities; and difficulties with instrument navigation. Revisions to item wording, format, order, response options, and instructions resulted in a questionnaire that demonstrated navigational ease as well as good question comprehension and response accuracy. The Cancer Patient Tobacco Use Questionnaire (C-TUQ) can be used as a standardized item set to accelerate the investigation of tobacco use in the cancer setting. Cancer 2016;122:1728-34. © 2016 American Cancer Society. © 2016 American Cancer Society.
[Simple and useful evaluation of motor difficulty in childhood (9-12 years old children ) by interview score on motor skills and soft neurological signs--aim for the diagnosis of developmental coordination disorder].

PubMed

Kashiwagi, Mitsuru; Suzuki, Shuhei

2009-09-01

Many children with developmental disorders are known to have motor impairment such as clumsiness and poor physical ability;however, the objective evaluation of such difficulties is not easy in routine clinical practice. In this study, we aimed to establish a simple method for evaluating motor difficulty of childhood. This method employs a scored interview and examination for detecting soft neurological signs (SNSs). After a preliminary survey with 22 normal children, we set the items and the cutoffs for the interview and SNSs. The interview consisted of questions pertaining to 12 items related to a child's motor skills in his/her past and current life, such as skipping, jumping a rope, ball sports, origami, and using chopsticks. The SNS evaluation included 5 tests, namely, standing on one leg with eyes closed, diadochokinesia, associated movements during diadochokinesia, finger opposition test, and laterally fixed gaze. We applied this method to 43 children, including 25 cases of developmental disorders. Children showing significantly high scores in both the interview and SNS were assigned to the "with motor difficulty" group, while those with low scores in both the tests were assigned to the "without motor difficulty" group. The remaining children were assigned to the "with suspicious motor difficulty" group. More than 90% of the children in the "with motor difficulty" group had high impairment scores in Movement Assessment Battery for Children (M-ABC), a standardized motor test, whereas 82% of the children in the "without motor difficulty" group revealed no motor impairment. Thus, we conclude that our simple method and criteria would be useful for the evaluation of motor difficulty of childhood. Further, we have discussed the diagnostic process for developmental coordination disorder using our evaluation method.
Performance of the Generalized S-X[squared] Item Fit Index for the Graded Response Model

ERIC Educational Resources Information Center

Kang, Taehoon; Chen, Troy T.

2011-01-01

The utility of Orlando and Thissen's ("2000", "2003") S-X[squared] fit index was extended to the model-fit analysis of the graded response model (GRM). The performance of a modified S-X[squared] in assessing item-fit of the GRM was investigated in light of empirical Type I error rates and power with a simulation study having…
77 FR 33013 - Self-Regulatory Organizations; NASDAQ OMX PHLX LLC; Notice of Filing and Immediate Effectiveness...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-06-04

... Change Relating to the MSCI EAFE Index May 29, 2012. Pursuant to Section 19(b)(1) of the Securities... change the trading hours for options on the MSCI EAFE Index on the last trading day prior to expiration... (``Commission'') the proposed rule change as described in Items I and II below, which Items have been prepared...

Development and Factor Analysis of the Protective Factors Index: A Report Card Section Related to the Work of School Counselors

ERIC Educational Resources Information Center

Bass, Gwen; Lee, Ji Hee; Wells, Craig; Carey, John C.; Lee, Sangmin

2015-01-01

The scale development and exploratory and confirmatory factor analyses of the Protective Factor Index (PFI) is described. The PFI is a 13-item component of elementary students' report cards that replaces typical items associated with student behavior. The PFI is based on the Construct-Based Approach (CBA) to school counseling, which proposes that…
The Anxiety Sensitivity Index--Revised: Confirmatory Factor Analyses, Structural Invariance in Caucasian and African American Samples, and Score Reliability and Validity

ERIC Educational Resources Information Center

Arnau, Randolph C.; Broman-Fulks, Joshua J.; Green, Bradley A.; Berman, Mitchell E.

2009-01-01

The most commonly used measure of anxiety sensitivity is the 36-item Anxiety Sensitivity Index--Revised (ASI-R). Exploratory factor analyses have produced several different factors structures for the ASI-R, but an acceptable fit using confirmatory factor analytic approaches has only been found for a 21-item version of the instrument. We evaluated…
Monitoring task loading with multivariate EEG measures during complex forms of human-computer interaction

NASA Technical Reports Server (NTRS)

Smith, M. E.; Gevins, A.; Brown, H.; Karnik, A.; Du, R.

2001-01-01

Electroencephalographic (EEG) recordings were made while 16 participants performed versions of a personal-computer-based flight simulation task of low, moderate, or high difficulty. As task difficulty increased, frontal midline theta EEG activity increased and alpha band activity decreased. A participant-specific function that combined multiple EEG features to create a single load index was derived from a sample of each participant's data and then applied to new test data from that participant. Index values were computed for every 4 s of task data. Across participants, mean task load index values increased systematically with increasing task difficulty and differed significantly between the different task versions. Actual or potential applications of this research include the use of multivariate EEG-based methods to monitor task loading during naturalistic computer-based work.
32 CFR 701.39 - Vaughn index.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 32 National Defense 5 2010-07-01 2010-07-01 false Vaughn index. 701.39 Section 701.39 National... DOCUMENTS AFFECTING THE PUBLIC FOIA Definitions and Terms § 701.39 Vaughn index. Itemized index, correlating... agency's nondisclosure justification. The index may contain such information as: date of document...
A New Clinical Pain Knowledge Test for Nurses: Development and Psychometric Evaluation.

PubMed

Bernhofer, Esther I; St Marie, Barbara; Bena, James F

2017-08-01

All nurses care for patients with pain, and pain management knowledge and attitude surveys for nurses have been around since 1987. However, no validated knowledge test exists to measure postlicensure clinicians' knowledge of the core competencies of pain management in current complex patient populations. To develop and test the psychometric properties of an instrument designed to measure pain management knowledge of postlicensure nurses. Psychometric instrument validation. Four large Midwestern U.S. hospitals. Registered nurses employed full time and part time August 2015 to April 2016, aged M = 43.25 years; time as RN, M = 16.13 years. Prospective survey design using e-mail to invite nurses to take an electronic multiple choice pain knowledge test. Content validity of initial 36-item test "very good" (95.1% agreement). Completed tests that met analysis criteria, N = 747. Mean initial test score, 69.4% correct (range 27.8-97.2). After revision/removal of 13 unacceptable questions, mean test score was 50.4% correct (range 8.7-82.6). Initial test item percent difficulty range was 15.2%-98.1%; discrimination values range, 0.03-0.50; final test item percent difficulty range, 17.6%-91.1%, discrimination values range, -0.04 to 1.04. Split-half reliability final test was 0.66. A high decision consistency reliability was identified, with test cut-score of 75%. The final 23-item Clinical Pain Knowledge Test has acceptable discrimination, difficulty, decision consistency, reliability, and validity in the general clinical inpatient nurse population. This instrument will be useful in assessing pain management knowledge of clinical nurses to determine gaps in education, evaluate knowledge after pain management education, and measure research outcomes. Copyright © 2017 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods

PubMed Central

Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-01-01

Background The eHealth Literacy Scale (eHEALS) is a tool to assess consumers’ comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. Objective The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Methods Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. Results CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. Conclusions The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers’ eHealth literacy. PMID:28400356
Development and preliminary evaluation of a music-based attention assessment for patients with traumatic brain injury.

PubMed

Jeong, Eunju; Lesiuk, Teresa L

2011-01-01

Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.
Hoarding disorder

MedlinePlus

... of items, gradual buildup of clutter in living spaces and difficulty discarding things are usually the first ... for which there is no immediate need or space. By middle age, symptoms are often severe and ...
Evaluation of five guidelines for option development in multiple-choice item-writing.

PubMed

Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva

2009-05-01

This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.
Development of the Facial Skin Care Index: A Health-Related Outcomes Index for Skin Cancer Patients

PubMed Central

Matthews, B. Alex; Rhee, John S.; Neuburg, Marcy; Burzynski, Mary L.; Nattinger, Ann B.

2006-01-01

BACKGROUND Existing health-related quality-of-life (HRQOL) tools do not appear to capture patients' specific skin cancer concerns. OBJECTIVE To describe the conceptual foundation, item generation, reduction process, and reliability testing for the Facial Skin Cancer Index (FSCI), a HRQOL outcomes tool for skin cancer researchers and clinicians. METHODS Participants in Phases I to III consisted of adult patients (N = 134) diagnosed with biopsy-proven nonmelanoma cervicofacial skin cancer. Data were collected via self-report surveys and clinical records. RESULTS Seventy-one distinct items were generated in Phase I and rated for their importance by an independent sample during Phase II; 36 items representing six theoretical HRQOL domains were retained. Test–retest I results indicated that four subscales showed adequate reliability coefficients (α = 0.60 to 0.91). Twenty-six items remained for test–retest II. Results indicated excellent internal consistency for emotional, social, appearance, and modified financial/work subscales (range 0.79 to 0.95); test–retest correlation coefficients were consistent across time (range 0.81 to 0.97; lifestyle omitted). CONCLUSION Pretesting afforded the opportunity to select items that optimally met our a priori conceptual and psychometric criteria for high data quality. Phase IV testing (validity and sensitivity before surgery and 4 months after Mohs micrographic surgery) for the 20-item FSCI is under way. PMID:16875475
Three controversies over item disclosure in medical licensure examinations.

PubMed

Park, Yoon Soo; Yang, Eunbae B

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
A Spanish version of the Skin Cancer Index: a questionnaire for measuring quality of life in patients with cervicofacial nonmelanoma skin cancer.

PubMed

de Troya-Martín, M; Rivas-Ruiz, F; Blázquez-Sánchez, N; Fernández-Canedo, I; Aguilar-Bernier, M; Repiso-Jiménez, J B; Toribio-Montero, J C; Jones-Caballero, M; Rhee, J

2015-01-01

The Skin Cancer Index (SCI) is the first specific patient-reported outcome measure for patients with cervicofacial nonmelanoma skin cancer. To date, only the original English version has been published. To develop a Spanish version of the SCI that is semantically and linguistically equivalent to the original, and to evaluate its measurement properties in this different cultural environment. A cross-sectional study was conducted of the cultural adaptation and empirical validation of the questionnaire, analysing the psychometric properties of the new index at different stages. Of 440 patients recruited to the study, 431 (95%) completed the Spanish version of the SCI questionnaire, in a mean time of 6·3 min (SD 2·9). Factor analysis of the scale revealed commonality and loading values of < 0·5 for three of the 15 items. The remaining 12 items converged into two components: appearance/social aspects (seven items) and emotional aspects (five items). Both domains presented a high level of internal consistency, with Cronbach's alpha values above 0·8. The convergent-discriminant validity analysis produced correlations higher than 0·3 for the mental component of the Short Form Health Survey-12v2 Health Questionnaire (correlation coefficient 0·39) and the Dermatology Quality of Life Index (correlation coefficient -0·30). In the test-retest, nine of the 12 items produced a weighted kappa value exceeding 0·4, and for the remaining three items, the absolute agreement percentage exceeded 60%. The Spanish version of the SCI quality of life scale has been satisfactorily adapted and validated for use in Spanish-speaking countries and populations. © 2014 British Association of Dermatologists.
[Development of a Questionnaire Measuring Sexual Mental Health of Tibetan University Students].

PubMed

Chen, Jun-cheng; Yan, Yu-ruo; Ai, Li; Guo, Xue-hua; He, Jian-xiu; Yuan, Ping

2016-05-01

To develop a questionnaire measuring sexual mental health of Tibetan university students. A draft questionnaire was developed with reference to the Sexual Civilization Survey for University Students of New Century and other published literature, and in consultation with experts. The questionnaire was tested in 230 students. Exploratory factor analyses with principal component and varimax orthogonal rotation were performed. Common factors with a > 1 eigenvalues and ≥ 3 loaded items (factor loading ≥ 0.4) were retained. Items with a < 0.4 factor loading, < 0.2 commonality, or falling into a common factor with < 3 items were excluded. The revised questionnaire was administered in another sample of 481 university students. Cronbach's α and split-half reliabilities were estimated. Confirmatory factor analyses were performed to test the construct validity of the questionnaire. Four rounds of exploratory factor analyses reduced the draft questionnaire items from 39 to 34 with a 7-factor structure. The questionnaire had a Cronbach's α of 0.920, 0.898, 0.812, 0.844, 0.787, 0.684, 0.703, and 0.608, and a Spearman-Brown coefficient of 0.763, 0.867, 0.742, 0838, 0.746, 0.822, 0.677, and 0.564 for the overall questionnaire and its 7 domains, respectively, suggesting good internal reliability. The structural equation of confirmatory factor analysis fitted well with the raw data: fit index χ²/df 3.736; root mean square residual (RMR) 0.081; root mean square error of approximation (RMSEA = 0.076; goodness of fit index (GFI) 0.805; adjusted goodness of fit index (AGFI) 0.770; normed fit index (NFI) = 0.774; relative fit index (RFI) 0.749; incremental fit index (IFI) 0.824; non-normed fit index (NNFI) = 0.803; comparative fit index (CFI) = 0.823; parsimony goodness of fit index (PGFI) = 0.684; parsimony normed fit index (PNFI) = 0.698; parsimony comparative fit index (PCFI) = 0.742, suggesting good construct validity of the questionnaire. The Sexual Mental Health Questionnaire for Tibetan University Student has demonstrated good reliability and validity.
Time manages interference in visual short-term memory.

PubMed

Smith, Amy V; McKeown, Denis; Bunce, David

2017-09-01

Emerging evidence suggests that age-related declines in memory may reflect a failure in pattern separation, a process that is believed to reduce the encoding overlap between similar stimulus representations during memory encoding. Indeed, behavioural pattern separation may be indexed by a visual continuous recognition task in which items are presented in sequence and observers report for each whether it is novel, previously viewed (old), or whether it shares features with a previously viewed item (similar). In comparison to young adults, older adults show a decreased pattern separation when the number of items between "old" and "similar" items is increased. Yet the mechanisms of forgetting underpinning this type of recognition task are yet to be explored in a cognitively homogenous group, with careful control over the parameters of the task, including elapsing time (a critical variable in models of forgetting). By extending the inter-item intervals, number of intervening items and overall decay interval, we observed in a young adult sample (N = 35, M age = 19.56 years) that the critical factor governing performance was inter-item interval. We argue that tasks using behavioural continuous recognition to index pattern separation in immediate memory will benefit from generous inter-item spacing, offering protection from inter-item interference.
Selective loss of verbal imagery.

PubMed

Mehta, Z; Newcombe, F

1996-05-01

This single case study of the ability to generate verbal and non-verbal imagery in a woman who sustained a gunshot wound to the brain reports a significant difficulty in generating images of word shapes but not a significant problem in generating object images. Further dissociation, however, was observed in her ability to generate images of living vs non-living material. She made more errors in imagery and factual information tasks for non-living items than for living items. This pattern contrasts with our previous report of the agnosic patient, M.S., who had severe difficulty in generating images of living material, whereas his ability to image the shape of words was comparable to that of normal control subjects. Furthermore, with regard to the generation of images of living compared with non-living material, M.S. shows more errors with living than nonliving items. In contrast, the present patient, S.M., made significantly more errors with non-living relative to living items. There appear to be two types of double dissociation which reinforce the growing evidence of dissociable impairments in the ability to generate images for different types of verbal and non-verbal material. Such dissociations, presumably related to sensory and cognitive processing demands, address the problem of the neural basis of imagery.
Indonesian teacher engagement index: a rasch model analysis

NASA Astrophysics Data System (ADS)

Sasmoko; Abbas, B. S.; Indrianti, Y.; Widhoyoko, S. A.

2018-01-01

The research aimed to calibrate Indonesian Teacher Engagement Index (ITEI) using instrument with RASCH MODEL. The respondents were 672 teachers of elementary, junior high, high school and vocational school. The number of items planned was 165 items with the initial reliability of 0.98. The ITEI scale uses Likert Scale (1 to 4) which was converted from ordinal scale to Equal Interval Scale. RASCH MODEL analysis was done by selecting based on Outfit Mean Square (MNSQ) between 0.5-1.5 as a good item, and measuring Point Measure Correlation (Pt Mean Corr) with the criterion of 0.4-0.85. Moderate Outfit Z-Standard (ZSTD) was ignored because the sample was >500. Conclusions: ITEI is valid with 30 items and reliability of 0.97, and less engage teachers significantly at α <0.05.
A Comparison Study of Item Exposure Control Strategies in MCAT

ERIC Educational Resources Information Center

Mao, Xiuzhen; Ozdemir, Burhanettin; Wang, Yating; Xiu, Tao

2016-01-01

Four item selection indexes with and without exposure control are evaluated and compared in multidimensional computerized adaptive testing (CAT). The four item selection indices are D-optimality, Posterior expectation Kullback-Leibler information (KLP), the minimized error variance of the linear combination score with equal weight (V1), and the…
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory.

PubMed

Pasca, Laura; Aragonés, Juan I; Coello, María T

2017-01-01

The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature.
Short-term memory in autism spectrum disorder.

PubMed

Poirier, Marie; Martin, Jonathan S; Gaigg, Sebastian B; Bowler, Dermot M

2011-02-01

Three experiments examined verbal short-term memory in comparison and autism spectrum disorder (ASD) participants. Experiment 1 involved forward and backward digit recall. Experiment 2 used a standard immediate serial recall task where, contrary to the digit-span task, items (words) were not repeated from list to list. Hence, this task called more heavily on item memory. Experiment 3 tested short-term order memory with an order recognition test: Each word list was repeated with or without the position of 2 adjacent items swapped. The ASD group showed poorer performance in all 3 experiments. Experiments 1 and 2 showed that group differences were due to memory for the order of the items, not to memory for the items themselves. Confirming these findings, the results of Experiment 3 showed that the ASD group had more difficulty detecting a change in the temporal sequence of the items. (c) 2010 APA, all rights reserved.
A new item response theory model to adjust data allowing examinee choice

PubMed Central

Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo

2018-01-01

In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996

Development of the Serenity Scale.

PubMed

Roberts, K T; Aspy, C B

1993-01-01

Serenity is a sustained inner peace. Nurses can use knowledge about serenity to help clients cope with harsh circumstances. The Serenity Scale is a 40-item self-report, summated scale that evaluates clients' serenity status. Critical attributes, identified by serenity experts, served as the theoretical framework. Sixty-five items were given to 542 male and female subjects age 20 to 95 (73% Caucasians and 27% minority) from varying income and educational levels yielding an alpha of .93. Forty items (SS.V2) were extracted for further analysis. The alpha coefficient was .92 with item-to-total correlations ranging from .25 to .67. Item means ranged from 2.6-3.7 (grand mean = 3.4). A principal components factor analysis with varimax rotation revealed nine factors explaining 58.2% of the variance. Limitations are that SS.V2 has not been tested with an independent sample and subjects with low educational levels had difficulty with some items.
Students’ understanding of forces: Force diagrams on horizontal and inclined plane

NASA Astrophysics Data System (ADS)

Sirait, J.; Hamdani; Mursyid, S.

2018-03-01

This study aims to analyse students’ difficulties in understanding force diagrams on horizontal surfaces and inclined planes. Physics education students (pre-service physics teachers) of Tanjungpura University, who had completed a Basic Physics course, took a Force concept test which has six questions covering three concepts: an object at rest, an object moving at constant speed, and an object moving at constant acceleration both on a horizontal surface and on an inclined plane. The test is in a multiple-choice format. It examines the ability of students to select appropriate force diagrams depending on the context. The results show that 44% of students have difficulties in solving the test (these students only could solve one or two items out of six items). About 50% of students faced difficulties finding the correct diagram of an object when it has constant speed and acceleration in both contexts. In general, students could only correctly identify 48% of the force diagrams on the test. The most difficult task for the students in terms was identifying the force diagram representing forces exerted on an object on in an inclined plane.
Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.

PubMed

Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A

2018-03-01

This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.
Reading Ability and Print Exposure: Item Response Theory Analysis of the Author Recognition Test

PubMed Central

Moore, Mariah; Gordon, Peter C.

2015-01-01

In the Author Recognition Test (ART) participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, with this predictive ability generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. This large-scale study (1012 college student participants) used Item Response Theory (IRT) to analyze item (author) characteristics to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and to optimize scoring of the ART. Factor analysis suggests a potential two factor structure of the ART differentiating between literary vs. popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of time spent encoding words as measured using eye-tracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Further, they show that frequency data can be used to select items of appropriate difficulty and that frequency data from corpora based on particular time periods and types of text may allow test adaptation for different populations. PMID:25410405
Reading ability and print exposure: item response theory analysis of the author recognition test.

PubMed

Moore, Mariah; Gordon, Peter C

2015-12-01

In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
Measuring disability across cultures — the psychometric properties of the WHODAS II in older people from seven low- and middle-income countries. The 10/66 Dementia Research Group population-based survey

PubMed Central

Sousa, Renata M; Dewey, Michael E; Acosta, Daisy; Jotheeswaran, AT; Castro-Costa, Erico; Ferri, Cleusa P; Guerra, Mariella; Huang, Yueqin; Jacob, KS; Pichardo, Juana Guillermina Rodriguez; Ramírez, Nayeli Garcia; Rodriguez, Juan Llibre; Rodriguez, Marina Calvo; Salas, Aquiles; Sosa, Ana Luisa; Williams, Joseph; Prince, Martin J

2010-01-01

We evaluated the psychometric properties of the 12-item interviewer-administered screener version of the World Health Organization Disability Assessment Schedule – version II (WHODAS II) among older people living in seven low- and middle-income countries. Principal component analysis (PCA), confirmatory factor analysis (CFA) and Mokken analyses were carried out to test for unidimensionality, hierarchical structure, and measurement invariance across 10/66 Dementia Research Group sites. PCA generated a one-factor solution in most sites. In CFA, the two-factor solution generated in Dominican Republic fitted better for all sites other than rural China. The two factors were not easily interpretable, and may have been an artefact of differing item difficulties. Strong internal consistency and high factor loadings for the one-factor solution supported unidimensionality. Furthermore, the WHODAS II was found to be a ‘strong’ Mokken scale. Measurement invariance was supported by the similarity of factor loadings across sites, and by the high between-site correlations in item difficulties. The Mokken results strongly support that the WHODAS II 12-item screener is a unidimensional and hierarchical scale confirming to item response theory (IRT) principles, at least at the monotone homogeneity model level. More work is needed to assess the generalizability of our findings to different populations. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20104493
Does Early Algebraic Reasoning Differ as a Function of Students’ Difficulty with Calculations versus Word Problems?

PubMed Central

Powell, Sarah R.; Fuchs, Lynn S.

2014-01-01

According to national mathematics standards, algebra instruction should begin at kindergarten and continue through elementary school. Most often, teachers address algebra in the elementary grades with problems related to solving equations or understanding functions. With 789 2nd- grade students, we administered (a) measures of calculations and word problems in the fall and (b) an assessment of pre-algebraic reasoning, with items that assessed solving equations and functions, in the spring. Based on the calculation and word-problem measures, we placed 148 students into 1 of 4 difficulty status categories: typically performing, calculation difficulty, word-problem difficulty, or difficulty with calculations and word problems. Analyses of variance were conducted on the 148 students; path analytic mediation analyses were conducted on the larger sample of 789 students. Across analyses, results corroborated the finding that word-problem difficulty is more strongly associated with difficulty with pre-algebraic reasoning. As an indicator of later algebra difficulty, word-problem difficulty may be a more useful predictor than calculation difficulty, and students with word-problem difficulty may require a different level of algebraic reasoning intervention than students with calculation difficulty. PMID:25309044
Impact of age-related macular degeneration in patients with glaucoma: understanding the patients' perspective.

PubMed

Skalicky, Simon E; Fenwick, Eva; Martin, Keith R; Crowston, Jonathan; Goldberg, Ivan; McCluskey, Peter

2016-07-01

The aim of the study is to measure the impact of age-related macular degeneration on vision-related activity limitation and preference-based status for glaucoma patients. This was a cross-sectional study. Two-hundred glaucoma patients of whom 73 had age-related macular degeneration were included in the research. Sociodemographic information, visual field parameters and visual acuity were collected. Age-related macular degeneration was scored using the Age-Related Eye Disease Study system. The Rasch-analysed Glaucoma Activity Limitation-9 and the Visual Function Questionnaire Utility Index measured vision-related activity limitation and preference-based status, respectively. Regression models determined factors predictive of vision-related activity limitation and preference-based status. Differential item functioning compared Glaucoma Activity Limitation-9 item difficulty for those with and without age-related macular degeneration. Mean age was 73.7 (±10.1) years. Lower better eye mean deviation (β: 1.42, 95% confidence interval: 1.24-1.63, P < 0.001) and age-related macular degeneration (β: 1.26 95% confidence interval: 1.10-1.44, P = 0.001) were independently associated with worse vision-related activity limitation. Worse eye visual acuity (β: 0.978, 95% confidence interval: 0.961-0.996, P = 0.018), high risk age-related macular degeneration (β: 0.981, 95% confidence interval: 0.965-0.998, P = 0.028) and severe glaucoma (β: 0.982, 95% confidence interval: 0.966-0.998, P = 0.032) were independently associated with worse preference-based status. Glaucoma patients with age-related macular degeneration found using stairs, walking on uneven ground and judging distances of foot to step/curb significantly more difficult than those without age-related macular degeneration. Vision-related activity limitation and preference-based status are negatively impacted by severe glaucoma and age-related macular degeneration. Patients with both conditions perceive increased difficulty walking safely compared with patients with glaucoma alone. © 2015 Royal Australian and New Zealand College of Ophthalmologists.
An index with improved diagnostic accuracy for the diagnosis of Crohn's disease derived from the Lennard-Jones criteria.

PubMed

Reinisch, S; Schweiger, K; Pablik, E; Collet-Fenetrier, B; Peyrin-Biroulet, L; Alfaro, I; Panés, J; Moayyedi, P; Reinisch, W

2016-09-01

The Lennard-Jones criteria are considered the gold standard for diagnosing Crohn's disease (CD) and include the items granuloma, macroscopic discontinuity, transmural inflammation, fibrosis, lymphoid aggregates and discontinuous inflammation on histology. The criteria have never been subjected to a formal validation process. To develop a validated and improved diagnostic index based on the items of Lennard-Jones criteria. Included were 328 adult patients with long-standing CD (median disease duration 10 years) from three centres and classified as 'established', 'probable' or 'non-CD' by Lennard-Jones criteria at time of diagnosis. Controls were patients with ulcerative colitis (n = 170). The performance of each of the six diagnostic items of Lennard-Jones criteria was modelled by logistic regression and a new index based on stepwise backward selection and cut-offs was developed. The diagnostic value of the new index was analysed by comparing sensitivity, specificity and accuracy vs. Lennard-Jones criteria. By Lennard-Jones criteria 49% (n = 162) of CD patients would have been diagnosed as 'non-CD' at time of diagnosis (sensitivity/specificity/accuracy, 'established' CD: 0.34/0.99/0.67; 'probable' CD: 0.51/0.95/0.73). A new index was derived from granuloma, fibrosis, transmural inflammation and macroscopic discontinuity, but excluded lymphoid aggregates and discontinuous inflammation on histology. Our index provided improved diagnostic accuracy for 'established' and 'probable' CD (sensitivity/specificity/accuracy, 'established' CD: 0.45/1/0.72; 'probable' CD: 0.8/0.85/0.82), including the subgroup isolated colonic CD ('probable' CD, new index: 0.73/0.85/0.79; Lennard-Jones criteria: 0.43/0.95/0.69). We developed an index based on items of Lennard-Jones criteria providing improved diagnostic accuracy for the differential diagnosis between CD and UC. © 2016 John Wiley & Sons Ltd.
Exploratory Item Classification Via Spectral Graph Clustering

PubMed Central

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2017-01-01

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
A 14-item Mediterranean diet assessment tool and obesity indexes among high-risk subjects: the PREDIMED trial.

PubMed

Martínez-González, Miguel Angel; García-Arellano, Ana; Toledo, Estefanía; Salas-Salvadó, Jordi; Buil-Cosiales, Pilar; Corella, Dolores; Covas, Maria Isabel; Schröder, Helmut; Arós, Fernando; Gómez-Gracia, Enrique; Fiol, Miquel; Ruiz-Gutiérrez, Valentina; Lapetra, José; Lamuela-Raventos, Rosa Maria; Serra-Majem, Lluís; Pintó, Xavier; Muñoz, Miguel Angel; Wärnberg, Julia; Ros, Emilio; Estruch, Ramón

2012-01-01

Independently of total caloric intake, a better quality of the diet (for example, conformity to the Mediterranean diet) is associated with lower obesity risk. It is unclear whether a brief dietary assessment tool, instead of full-length comprehensive methods, can also capture this association. In addition to reduced costs, a brief tool has the interesting advantage of allowing immediate feedback to participants in interventional studies. Another relevant question is which individual items of such a brief tool are responsible for this association. We examined these associations using a 14-item tool of adherence to the Mediterranean diet as exposure and body mass index, waist circumference and waist-to-height ratio (WHtR) as outcomes. Cross-sectional assessment of all participants in the "PREvención con DIeta MEDiterránea" (PREDIMED) trial. 7,447 participants (55-80 years, 57% women) free of cardiovascular disease, but with either type 2 diabetes or ≥ 3 cardiovascular risk factors. Trained dietitians used both a validated 14-item questionnaire and a full-length validated 137-item food frequency questionnaire to assess dietary habits. Trained nurses measured weight, height and waist circumference. Strong inverse linear associations between the 14-item tool and all adiposity indexes were found. For a two-point increment in the 14-item score, the multivariable-adjusted differences in WHtR were -0.0066 (95% confidence interval, -0.0088 to -0.0049) for women and -0.0059 (-0.0079 to -0.0038) for men. The multivariable-adjusted odds ratio for a WHtR>0.6 in participants scoring ≥ 10 points versus ≤ 7 points was 0.68 (0.57 to 0.80) for women and 0.66 (0.54 to 0.80) for men. High consumption of nuts and low consumption of sweetened/carbonated beverages presented the strongest inverse associations with abdominal obesity. A brief 14-item tool was able to capture a strong monotonic inverse association between adherence to a good quality dietary pattern (Mediterranean diet) and obesity indexes in a population of adults at high cardiovascular risk.
A 14-Item Mediterranean Diet Assessment Tool and Obesity Indexes among High-Risk Subjects: The PREDIMED Trial

PubMed Central

Martínez-González, Miguel Angel; García-Arellano, Ana; Toledo, Estefanía; Salas-Salvadó, Jordi; Buil-Cosiales, Pilar; Corella, Dolores; Covas, Maria Isabel; Schröder, Helmut; Arós, Fernando; Gómez-Gracia, Enrique; Fiol, Miquel; Ruiz-Gutiérrez, Valentina; Lapetra, José; Lamuela-Raventos, Rosa Maria; Serra-Majem, Lluís; Pintó, Xavier; Muñoz, Miguel Angel; Wärnberg, Julia; Ros, Emilio; Estruch, Ramón

2012-01-01

Objective Independently of total caloric intake, a better quality of the diet (for example, conformity to the Mediterranean diet) is associated with lower obesity risk. It is unclear whether a brief dietary assessment tool, instead of full-length comprehensive methods, can also capture this association. In addition to reduced costs, a brief tool has the interesting advantage of allowing immediate feedback to participants in interventional studies. Another relevant question is which individual items of such a brief tool are responsible for this association. We examined these associations using a 14-item tool of adherence to the Mediterranean diet as exposure and body mass index, waist circumference and waist-to-height ratio (WHtR) as outcomes. Design Cross-sectional assessment of all participants in the “PREvención con DIeta MEDiterránea” (PREDIMED) trial. Subjects 7,447 participants (55–80 years, 57% women) free of cardiovascular disease, but with either type 2 diabetes or ≥3 cardiovascular risk factors. Trained dietitians used both a validated 14-item questionnaire and a full-length validated 137-item food frequency questionnaire to assess dietary habits. Trained nurses measured weight, height and waist circumference. Results Strong inverse linear associations between the 14-item tool and all adiposity indexes were found. For a two-point increment in the 14-item score, the multivariable-adjusted differences in WHtR were −0.0066 (95% confidence interval, –0.0088 to −0.0049) for women and –0.0059 (–0.0079 to –0.0038) for men. The multivariable-adjusted odds ratio for a WHtR>0.6 in participants scoring ≥10 points versus ≤7 points was 0.68 (0.57 to 0.80) for women and 0.66 (0.54 to 0.80) for men. High consumption of nuts and low consumption of sweetened/carbonated beverages presented the strongest inverse associations with abdominal obesity. Conclusions A brief 14-item tool was able to capture a strong monotonic inverse association between adherence to a good quality dietary pattern (Mediterranean diet) and obesity indexes in a population of adults at high cardiovascular risk. PMID:22905215
Development of Elderly Quality of Life Index – Eqoli: Item Reduction and Distribution into Dimensions

PubMed Central

Paschoal, Sérgio Márcio Pacheco; Filho, Wilson Jacob; Litvoc, Júlio

2008-01-01

OBJECTIVE To describe item reduction and its distribution into dimensions in the construction process of a quality of life evaluation instrument for the elderly. METHODS The sampling method was chosen by convenience through quotas, with selection of elderly subjects from four programs to achieve heterogeneity in the “health status”, “functional capacity”, “gender”, and “age” variables. The Clinical Impact Method was used, consisting of the spontaneous and elicited selection by the respondents of relevant items to the construct Quality of Life in Old Age from a previously elaborated item pool. The respondents rated each item’s importance using a 5-point Likert scale. The product of the proportion of elderly selecting the item as relevant (frequency) and the mean importance score they attributed to it (importance) represented the overall impact of that item in their quality of life (impact). The items were ordered according to their impact scores and the top 46 scoring items were grouped in dimensions by three experts. A review of the negative items was performed. RESULTS One hundred and ninety three people (122 women and 71 men) were interviewed. Experts distributed the 46 items into eight dimensions. Closely related items were grouped and dimensions not reaching the minimum expected number of items received additional items resulting in eight dimensions and 43 items. DISCUSSION The sample was heterogeneous and similar to what was expected. The dimensions and items demonstrated the multidimensionality of the construct. The Clinical Impact Method was appropriate to construct the instrument, which was named Elderly Quality of Life Index - EQoLI. An accuracy process will be examined in the future. PMID:18438571
The Dutch-Flemish PROMIS Physical Function item bank exhibited strong psychometric properties in patients with chronic pain.

PubMed

Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D

2017-07-01

The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Different Characteristics of the Female Sexual Function Index in a Sample of Sexually Active and Inactive Women.

PubMed

Hevesi, Krisztina; Mészáros, Veronika; Kövi, Zsuzsanna; Márki, Gabriella; Szabó, Marianna

2017-09-01

The Female Sexual Function Index (FSFI) is a widely used measurement tool to assess female sexual function along the six dimensions of desire, arousal, lubrication, orgasm, satisfaction, and pain. However, the structure of the questionnaire is not clear, and several studies have found high correlations among the dimensions, indicating that a common underlying "sexual function" factor might be present. To investigate whether female sexual function is best understood as a multidimensional construct or, alternatively, whether a common underlying factor explains most of the variance in FSFI scores, and to investigate the possible effect of the common practice of including sexually inactive women in studies using the FSFI. The sample consisted of 508 women: 202 university students, 177 patients with endometriosis, and 129 patients with polycystic ovary syndrome. Participants completed the FSFI, and confirmatory factor analyses were used to test the underlying structure of this instrument in the total sample and in samples including sexually active women only. The FSFI is a multidimensional self-report questionnaire composed of 19 items. Strong positive correlations were found among five of the six original factors on the FSFI. Confirmatory factor analyses showed that in the total sample items loaded mainly on the general sexual function factor and very little variance was explained by the specific factors. However, when only sexually active women were included in the analyses, a clear factor structure emerged, with items loading on their six specific factors, and most of the variance in FSFI scores was explained by the specific factors, rather than the general factor. University students reported higher scores, indicating better functioning compared with the patient samples. The reliable and valid assessment of female sexual function can contribute to better understanding, prevention, and treatment of different sexual difficulties and dysfunctions. This study provides a rigorous statistical test of the structure of the FSFI and an explicit decision rule for categorizing sexually inactive women. Limitations include a lack of control over the circumstances of data collection. This study supports the use of the FSFI as a multidimensional measurement of female sexual function but highlights the need to establish clear decision rules for the inclusion or exclusion of sexually active and inactive respondents. Hevesi K, Mészáros V, Kövi Z, et al. Different Characteristics of the Female Sexual Function Index in a Sample of Sexually Active and Inactive Women. J Sex Med 2017;14:1133-1141. Copyright © 2017 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
A Comparison of Latent Growth Models for Constructs Measured by Multiple Items

ERIC Educational Resources Information Center

Leite, Walter L.

2007-01-01

Univariate latent growth modeling (LGM) of composites of multiple items (e.g., item means or sums) has been frequently used to analyze the growth of latent constructs. This study evaluated whether LGM of composites yields unbiased parameter estimates, standard errors, chi-square statistics, and adequate fit indexes. Furthermore, LGM was compared…
Selection Difficulty and Interitem Competition Are Independent Factors in Rapid Visual Stream Perception

ERIC Educational Resources Information Center

Kawahara, Jun-ichiro; Enns, James T.

2009-01-01

When observers try to identify successive targets in a visual stream at a rate of 100 ms per item, accuracy for the 2nd target is impaired for intertarget lags of 100-500 ms. Yet, when the same stream is presented more rapidly (e.g., 50 ms per item), this pattern reverses and a 1st-target deficit is obtained. M. C. Potter, A. Staub, and D. H.…
Extending item response theory to online homework

NASA Astrophysics Data System (ADS)

Kortemeyer, Gerd

2014-06-01

Item response theory (IRT) becomes an increasingly important tool when analyzing "big data" gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large-enrollment physics course for scientists and engineers, the study compares outcomes from IRT analyses of exam and homework data, and then proceeds to investigate the effects of each confounding factor introduced in the online realm. It is found that IRT yields the correct trends for learner ability and meaningful item parameters, yet overall agreement with exam data is moderate. It is also found that learner ability and item discrimination is robust over a wide range with respect to model assumptions and introduced noise. Item difficulty is also robust, but over a narrower range.
When students can choose easy, medium, or hard homework problems

NASA Astrophysics Data System (ADS)

Teodorescu, Raluca E.; Seaton, Daniel T.; Cardamone, Caroline N.; Rayyan, Saif; Abbott, Jonathan E.; Barrantes, Analia; Pawl, Andrew; Pritchard, David E.

2012-02-01

We investigate student-chosen, multi-level homework in our Integrated Learning Environment for Mechanics [1] built using the LON-CAPA [2] open-source learning system. Multi-level refers to problems categorized as easy, medium, and hard. Problem levels were determined a priori based on the knowledge needed to solve them [3]. We analyze these problems using three measures: time-per-problem, LON-CAPA difficulty, and item difficulty measured by item response theory. Our analysis of student behavior in this environment suggests that time-per-problem is strongly dependent on problem category, unlike either score-based measures. We also found trends in student choice of problems, overall effort, and efficiency across the student population. Allowing students choice in problem solving seems to improve their motivation; 70% of students worked additional problems for which no credit was given.
Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination

PubMed Central

Johnson, Timothy P.; Shariff-Marco, Salma; Willis, Gordon; Cho, Young Ik; Breen, Nancy; Gee, Gilbert C.; Krieger, Nancy; Grant, David; Alegria, Margarita; Mays, Vickie M.; Williams, David R.; Landrine, Hope; Liu, Benmei; Reeve, Bryce B.; Takeuchi, David; Ponce, Ninez A.

2014-01-01

Cross-cultural variability in respondent processing of survey questions may bias results from multiethnic samples. We analyzed behavior codes, which identify difficulties in the interactions of respondents and interviewers, from a discrimination module contained within a field test of the 2007 California Health Interview Survey. In all, 553 (English) telephone interviews yielded 13,999 interactions involving 22 items. Multilevel logistic regression modeling revealed that respondent age and several item characteristics (response format, customized questions, length, and first item with new response format), but not race/ethnicity, were associated with interactional problems. These findings suggest that item function within a multi-cultural, albeit English language, survey may be largely influenced by question features, as opposed to respondent characteristics such as race/ethnicity. PMID:26166949

Content validity of the NCCN-FACT ovarian symptom index-18 (NFOSI-18).

PubMed

Jensen, Sally E; Kaiser, Karen; Lacson, Leilani; Schink, Julian; Cella, David

2015-02-01

This study examined the content validity of the NCCN-FACT Ovarian Symptom Index-18 (NFOSI-18), an advanced ovarian cancer symptom index comprised of symptoms perceived as most important by clinical experts and women with advanced ovarian cancer. Eighteen women with advanced ovarian cancer completed the NFOSI-18 and participated in cognitive interviews to assess: (a) the understandability of the NFOSI-18; and (b) the things patients have in mind when responding to the item, "I am bothered by side effects of treatment;" and (c) the interpretation patients place on items relating to fatigue and lack of energy. Interviews were recorded and transcribed for qualitative analysis. All but 2 (89%) participants indicated that each item was clear and understandable and the same proportion (89%) stated they were "very confident" or "confident" about providing accurate answers to all but one item. When responding to the item, "I am bothered by side effects of treatment," fatigue, nausea, and neuropathy constituted the most frequently mentioned concerns. Among the participants who were asked, eight participants responded that "fatigue" and "lack of energy" were the same concept and nine responded they were different. Participants associated "fatigue" with tiredness and associated "lack of energy" with the inability to perform daily tasks and activities. The findings support the content validity of the NFOSI-18. Item revisions, deletions or additions do not appear warranted. Future research can address the reliability and validity of the NFOSI-18 in clinical research. Copyright © 2014 Elsevier Inc. All rights reserved.
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods.

PubMed

Diviani, Nicola; Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-04-11

The eHealth Literacy Scale (eHEALS) is a tool to assess consumers' comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers' eHealth literacy. ©Nicola Diviani, Alexandra Lelia Dima, Peter Johannes Schulz. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.04.2017.
Rasch analyses of the Activities-specific Balance Confidence Scale with individuals 50 years and older with lower limb amputations

PubMed Central

Sakakibara, Brodie M.; Miller, William C.; Backman, Catherine L.

2012-01-01

Objective To explore shortened response formats for use with the Activities-specific Balance Confidence scale and then: 1) evaluate the unidimensionality of the scale; 2) evaluate the item difficulty; 3) evaluate the scale for redundancy and content gaps; and 4) evaluate the item standard error of measurement (SEM) and internal consistency reliability among aging individuals (≥50 years) with a lower-limb amputation living in the community. Design Secondary analysis of cross-sectional survey and chart review data. Setting Out-patient amputee clinics, Ontario, Canada. Participants Four hundred forty eight community living adults, at least 50 years old (mean = 68 years), who have used a prosthesis for at least 6 months for a major unilateral lower limb amputation. Three hundred twenty five (72.5%) were men. Intervention N/a Main Outcome Measure(s) Activities-specific Balance Confidence Scale. Results A 5-option response format outperformed 4- and 6-option formats. Factor analyses confirmed a unidimensional scale. The distance between response options is not the same for all items on the scale, evident by the Partial Credit Model (PCM) having a better fit to the data than the Rating Scale Model. Two items, however, did not fit the PCM within statistical reason. Revising the wording of the two items may resolve the misfit, and improve the construct validity and lower the SEM. Overall, the difficulty of the scale’s items is appropriate for use with aging individuals with lower-limb amputation, and is most reliable (Cronbach ∝ = 0.94) for use with individuals with moderately low balance confidence levels. Conclusions The ABC-scale with a simplified 5-option response format is a valid and reliable measure of balance confidence for use with individuals aging with a lower limb amputation. PMID:21704978
Development of the Consumer Refrigerator Safety Questionnaire: A Measure of Consumer Perceptions and Practices.

PubMed

Cairnduff, Victoria; Dean, Moira; Koidis, Anastasios

2016-09-01

Food preparation and storage behaviors in the home deviating from the "best practice" food safety recommendations may result in foodborne illnesses. Currently, there are limited tools available to fully evaluate the consumer knowledge, perceptions, and behavior in the area of refrigerator safety. The current study aimed to develop a valid and reliable tool in the form of a questionnaire, the Consumer Refrigerator Safety Questionnaire (CRSQ), for assessing systematically all these aspects. Items relating to refrigerator safety knowledge (n =17), perceptions (n =46), and reported behavior (n =30) were developed and pilot tested by an expert reference group and various consumer groups to assess face and content validity (n =20), item difficulty and consistency (n =55), and construct validity (n =23). The findings showed that the CRSQ has acceptable face and content validity with acceptable levels of item difficulty. Item consistency was observed for 12 of 15 in refrigerator safety knowledge. Further, all 5 of the subscales of consumer perceptions of refrigerator safety practices relating to risk of developing foodborne disease showed acceptable internal consistency (Cronbach's α value > 0.8). Construct validity of the CRSQ was shown to be very good (P = 0.022). The CRSQ exhibited acceptable test-retest reliability at 14 days with the majority of knowledge items (93.3%) and reported behavior items (96.4%) having correlation coefficients of greater than 0.70. Overall, the CRSQ was deemed valid and reliable in assessing refrigerator safety knowledge and behavior; therefore, it has the potential for future use in identifying groups of individuals at increased risk of deviating from recommended refrigerator safety practices, as well as the assessment of refrigerator safety knowledge and behavior for use before and after an intervention.
Is Pornography Use Associated with Sexual Difficulties and Dysfunctions among Younger Heterosexual Men?

PubMed

Landripet, Ivan; Štulhofer, Aleksandar

2015-05-01

Recent epidemiological studies reported high prevalence rates of erectile dysfunction (ED) among younger heterosexual men (≤40). It has been suggested that this "epidemic" of ED is related to increased pornography use. However, empirical evidence for such association is currently lacking. This study analyzes associations between pornography use and sexual health disturbances among younger heterosexual men using four large-scale online samples from three European countries. The analyses were carried out using a 2011 cross-sectional online study of Croatian, Norwegian, and Portuguese men (Study 1; N = 2,737) and a 2014 cross-sectional online study of Croatian men (Study 2; N = 1,211). Chi-square test and multivariate logistic regression were used to explore the associations between pornography use and sexual difficulties. In Study 1, erectile difficulties, inability to reach orgasm, and a lack of sexual desire were measured using the Global Study of Sexual Attitudes and Behavior indicators. In Study 2, ED was measured with the abridged International Index of Erectile Function (IIEF-5). Delayed ejaculation and a decrease of sexual desire were assessed with one-item indicators. In Study 1, only the relationship between pornography use and ED among Croatian men was statistically significant (χ(2) [2] = 18.76, P < 0.01). The association was small and inconsistent. Compared with infrequent use of pornography, moderate but not high frequency of pornography use increased the odds of reporting ED (adjusted odds ratio = 0.53, P < 0.01). In Study 2, no significant associations both between either the frequency or the recent dynamics of pornography use and male sexual dysfunctions were observed. We found little evidence of the association between pornography use and male sexual health disturbances. Contrary to raising public concerns, pornography does not seem to be a significant risk factor for younger men's desire, erectile, or orgasmic difficulties. © 2015 International Society for Sexual Medicine.
Anger and postcombat mental health: validation of a brief anger measure with U.S. soldiers postdeployed from Iraq and Afghanistan.

PubMed

Novaco, Raymond W; Swanson, Rob D; Gonzalez, Oscar I; Gahm, Gregory A; Reger, Mark D

2012-09-01

The involvement of anger in the psychological adjustment of current war veterans, particularly in conjunction with combat-related posttraumatic stress disorder (PTSD), warrants greater research focus than it has received. The present study concerns a brief anger measure, Dimensions of Anger Reactions (DAR), intended for use in large sample studies and as a screening tool. The concurrent validity, discriminant validity, and incremental validity of the instrument were examined in conjunction with behavioral health data for 3,528 treatment-seeking soldiers who had been in combat in Iraq and Afghanistan. Criterion indices included multiple self-rated measures of psychological distress (including PTSD, depression, and anxiety), functional difficulties (relationships, daily activities, work problems, and substance use), and violence risk. Concurrent validity was established by strong correlations with single anger items on 4 other scales, and discriminant validity was found against anxiety and depression measures. Pertinent to the construct of anger, the DAR was significantly associated with psychosocial functional difficulties and with several indices of harm to self and to others. Hierarchical regression performed on a self/others harm index found incremental validity for the DAR, controlling for age, education, military component, officer rank, combat exposure, PTSD, and depression. The ability to efficiently assess anger in at-risk military populations can provide an indicator of many undesirable behavioral health outcomes. PsycINFO Database Record (c) 2012 APA, all rights reserved.
A Brief Survey of Patients' First Impression after CPAP Titration Predicts Future CPAP Adherence: A Pilot Study

PubMed Central

Balachandran, Jay S.; Yu, Xiaohong; Wroblewski, Kristen; Mokhlesi, Babak

2013-01-01

Background: CPAP adherence patterns are often established very early in the course of therapy. Our objective was to quantify patients' perception of CPAP therapy using a 6-item questionnaire administered in the morning following CPAP titration. We hypothesized that questionnaire responses would independently predict CPAP adherence during the first 30 days of therapy. Methods: We retrospectively reviewed the CPAP perception questionnaires of 403 CPAP-naïve adults who underwent in-laboratory titration and who had daily CPAP adherence data available for the first 30 days of therapy. Responses to the CPAP perception questionnaire were analyzed for their association with mean CPAP adherence and with changes in daily CPAP adherence over 30 days. Results: Patients were aged 52 ± 14 years, 53% were women, 54% were African American, the mean body mass index (BMI) was 36.3 ± 9.1 kg/m2, and most patients had moderate-severe OSA. Four of 6 items from the CPAP perception questionnaire— regarding difficulty tolerating CPAP, discomfort with CPAP pressure, likelihood of wearing CPAP, and perceived health benefit—were significantly correlated with mean 30-day CPAP adherence, and a composite score from these 4 questions was found to be internally consistent. Stepwise linear regression modeling demonstrated that 3 variables were significant and independent predictors of reduced mean CPAP adherence: worse score on the 4-item questionnaire, African American race, and non-sleep specialist ordering polysomnogram and CPAP therapy. Furthermore, a worse score on the 4-item CPAP perception questionnaire was consistently associated with decreased mean daily CPAP adherence over the first 30 days of therapy. Conclusions: In this pilot study, responses to a 4-item CPAP perception questionnaire administered to patients immediately following CPAP titration independently predicted mean CPAP adherence during the first 30 days. Further prospective validation of this questionnaire in different patient populations is warranted. Commentary: A commentary on this article appears in this issue on page 207. Citation: Balachandran JS; Yu X; Wroblewski K; Mokhlesi B. A brief survey of patients' first impression after CPAP titration predicts future CPAP adherence: a pilot study. J Clin Sleep Med 2013;9(3):199-205. PMID:23493772
Higher Education Prices and Price Indexes. 1978 Supplement.

ERIC Educational Resources Information Center

Halstead, D. Kent; Hickson, Lenel

The 1978 supplement to the basic study, Higher Education Prices and Price Indexes, presents higher education price index data for fiscal years 1971 through 1978. A price index series measures the effects of price change, and price change only, on a fixed group of items. The indexes reported here measure price changes from 1967, the reference date.…
The dialysis orders objective structured clinical examination (OSCE): a formative assessment for nephrology fellows

PubMed Central

Prince, Lisa K; Campbell, Ruth C; Gao, Sam W; Kendrick, Jessica; Lebrun, Christopher J; Little, Dustin J; Mahoney, David L; Maursetter, Laura A; Nee, Robert; Saddler, Mark; Watson, Maura A

2018-01-01

Abstract Background Few quantitative nephrology-specific simulations assess fellow competency. We describe the development and initial validation of a formative objective structured clinical examination (OSCE) assessing fellow competence in ordering acute dialysis. Methods The three test scenarios were acute continuous renal replacement therapy, chronic dialysis initiation in moderate uremia and acute dialysis in end-stage renal disease-associated hyperkalemia. The test committee included five academic nephrologists and four clinically practicing nephrologists outside of academia. There were 49 test items (58 points). A passing score was 46/58 points. No item had median relevance less than ‘important’. The content validity index was 0.91. Ninety-five percent of positive-point items were easy–medium difficulty. Preliminary validation was by 10 board-certified volunteers, not test committee members, a median of 3.5 years from graduation. The mean score was 49 [95% confidence interval (CI) 46–51], κ = 0.68 (95% CI 0.59–0.77), Cronbach’s α = 0.84. Results We subsequently administered the test to 25 fellows. The mean score was 44 (95% CI 43–45); 36% passed the test. Fellows scored significantly less than validators (P < 0.001). Of evidence-based questions, 72% were answered correctly by validators and 54% by fellows (P = 0.018). Fellows and validators scored least well on the acute hyperkalemia question. In self-assessing proficiency, 71% of fellows surveyed agreed or strongly agreed that the OSCE was useful. Conclusions The OSCE may be used to formatively assess fellow proficiency in three common areas of acute dialysis practice. Further validation studies are in progress. PMID:29644053
Three controversies over item disclosure in medical licensure examinations

PubMed Central

Park, Yoon Soo; Yang, Eunbae B.

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693
[Perceptions on item disclosure for the Korean medical licensing examination].

PubMed

Yang, Eunbae B

2015-09-01

This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
Feeding Practices in Infancy Associated with Caries Incidence in Early Childhood

PubMed Central

Chaffee, Benjamin W.; Feldens, Carlos Alberto; Rodrigues, Priscila Humbert; Vítolo, Márcia Regina

2015-01-01

Early-life feeding behaviors foretell later dietary habits and health outcomes. Few studies have examined infant dietary patterns and caries occurrence prospectively. OBJECTIVE Assess whether patterns in food and drink consumption before age 12 months are associated with caries incidence by preschool age. METHODS We collected early-life feeding data within a birth cohort from low-income families in Porto Alegre, Brazil. Three dietary indexes were defined, based on refined sugar content and/or previously reported caries associations: a count of sweet foods or drinks introduced <6-months (e.g., candy, cookies, soft drinks), a count of other, non-sweet items introduced <6-months (e.g., beans, meat), and a count of sweet items consumed at 12 months. Incidence of severe early childhood caries (S-ECC) at age 38 months (N=458) was compared by score tertile on each index, adjusted for family, maternal, and child characteristics using regression modeling. RESULTS Introduction to a greater number of presumably cariogenic items in infancy was positively associated with future caries. S-ECC incidence was highest in the uppermost tertile of the “6-month sweet index” (adjusted cumulative incidence ratio, RR, versus lowest tertile: 1.46; 95% CI: 0.97, 2.04) and the uppermost tertile of the “12-month sweet index” (RR: 1.55; 95% CI: 1.17, 2.23). The association was specific for sweet items: caries incidence did not differ by tertile of the “6-month non-sweet index” (RR: 1.00; 95% CI: 0.70, 1.40). Additionally, each one-unit increase on the 6-month and the 12-month sweet indexes, but not the 6-month non-sweet index, was statistically significantly associated with greater S-ECC incidence and associated with more decayed, missing or restored teeth. Results were robust to minor changes in the items constituting each index and persisted if liquid items were excluded. CONCLUSIONS Dietary factors observed before age 12-months were associated with S-ECC at preschool age, highlighting a need for timely, multi-level intervention. PMID:25753518
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Annotated bibliography of walnut and related species.

Treesearch

David T. Funk

1966-01-01

Includes 627 items constituting virtually all the technical literature dealing with the ecology, silviculture, and timber products of Juglans. Material is arranged in alphabetical order by author. An index lists the items by subject matters.
Team-based learning on a third-year pediatric clerkship improves NBME subject exam blood disorder scores.

PubMed

Saudek, Kris; Treat, Robert

2015-01-01

Purpose At our institution, speculation amongst medical students and faculty exists as to whether team-based learning (TBL) can improve scores on high-stakes examinations over traditional didactic lectures. Faculty with experience using TBL developed and piloted a required TBL blood disorders (BD) module for third-year medical students on their pediatric clerkship. The purpose of this study is to analyze the BD scores from the NBME subject exams before and after the introduction of the module. Methods We analyzed institutional and national item difficulties for BD items from the NBME pediatrics content area item analysis reports from 2011 to 2014 before (pre) and after (post) the pilot (October 2012). Total scores of 590 NBME subject examination students from examinee performance profiles were analyzed pre/post. t-Tests and Cohen's d effect sizes were used to analyze item difficulties for institutional versus national scores and pre/post comparisons of item difficulties and total scores. Results BD scores for our institution were 0.65 (±0.19) compared to 0.62 (±0.15) nationally (P=0.346; Cohen's d=0.15). The average of post-consecutive BD scores for our students was 0.70(±0.21) compared to examinees nationally [0.64 (±0.15)] with a significant mean difference (P=0.031; Cohen's d=0.43). The difference in our institutions pre [0.65 (±0.19)] and post [0.70 (±0.21)] BD scores trended higher (P=0.391; Cohen's d=0.27). Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms. Conclusions Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms.
A partner-related risk behavior index to identify people at elevated risk for sexually transmitted infections.

PubMed

Crosby, Richard; Shrier, Lydia A

2013-04-01

The purpose of this study was to develop and test a sexual-partner-related risk behavior index to identify high-risk individuals most likely to have a sexually transmitted infection (STI). Patients from five STI and adolescent medical clinics in three US cities were recruited (N = 928; M age = 29.2 years). Data were collected using audio-computer-assisted self-interviewing. Of seven sexual-partner-related variables, those that were significantly associated with the outcomes were combined into a partner-related risk behavior index. The dependent variables were laboratory-confirmed infection with Chlamydia trachomatis, Neisseria gonorrhoeae, and/or Trichomonas vaginalis. Nearly one-fifth of the sample (169/928; 18.4%) tested positive for an STI. Three of the seven items were significantly associated with having one or more STIs: sex with a newly released prisoner, sex with a person known or suspected of having an STI, and sexual concurrency. In combined form, this three-item index was significantly associated with STI prevalence (p < .001). In the presence of three covariates (gender, race, and age), those classified as being at-risk by the index were 1.8 times more likely than those not classified as such to test positive for an STI (p < .001). Among individuals at risk for STIs, a three-item index predicted testing positive for one or more of three STIs. This index could be used to prioritize and guide intensified clinic-based counseling for high-risk patients of STI and other clinics.
Evaluating the Effects of Differences in Group Abilities on the Tucker and the Levine Observed-Score Methods for Common-Item Nonequivalent Groups Equating. ACT Research Report Series 2010-1

ERIC Educational Resources Information Center

Chen, Hanwei; Cui, Zhongmin; Zhu, Rongchun; Gao, Xiaohong

2010-01-01

The most critical feature of a common-item nonequivalent groups equating design is that the average score difference between the new and old groups can be accurately decomposed into a group ability difference and a form difficulty difference. Two widely used observed-score linear equating methods, the Tucker and the Levine observed-score methods,…
[Difficulties with the prescription and administration of antibiotics in routine hospital emergency department care: a survey study].

PubMed

Monclús Cols, Ester; Nicolás Ocejo, David; Sánchez Sánchez, Miquel; Ortega Romero, Mar

2015-02-01

To detect the problems hospital emergency room staff have when prescribing and administering antibiotics. A 14-item questionnaire was designed to assess staff members' knowledge of the importance of starting antibiotic treatment promptly, assigning appropriate dosing intervals, adjusting for renal function, and switching to oral therapy. Agreement with each item was expressed on a 5-point Likert scale. Items with a rate of appropriate response of less than 75% were targeted for specific attention. Two hundred questionnaires were distributed to the staff and 150 were returned completed (response rate, 75%). The following items were targeted for attention based on rates of appropriate response of less than 75%: clear medical orders (65%), understanding the implication of early empirical antibiotic therapy on prognosis in serious infections (67%), estimation of the prevalence of renal insufficiency (42%), assumption that a creatinine serum level under < 1.6 mg/dL is safe (33%), use of glomerular filtration rate to adjust dose according to renal function (47%), and an understanding of switching from intravenous to oral treatment (60%). This study revealed the difficulties medical and nursing staff have in prescribing and administering antibiotics in a hospital emergency department. The results can facilitate improvements in antibiotic therapy by pinpointing areas to target for specific training interventions or the design of electronic prescribing aids.
Body mass index and physical fitness in Brazilian adolescents.

PubMed

Lopes, Vitor P; Malina, Robert M; Gomez-Campos, Rossana; Cossio-Bolaños, Marco; Arruda, Miguel de; Hobold, Edilson

2018-05-05

Evaluate the relationship between body mass index and physical fitness in a cross-sectional sample of Brazilian youth. Participants were 3849 adolescents (2027 girls) aged 10-17 years. Weight and height were measured; body mass index was calculated. Physical fitness was evaluated with a multistage 20m shuttle run (cardiovascular endurance), standing long jump (power), and push-ups (upper body strength). Participants were grouped by sex into four age groups: 10-11, 12-13, 14-15, and 16-17 years. Sex-specific ANOVA was used to evaluate differences in each physical fitness item among weight status categories by age group. Relationships between body mass index and each physical fitness item were evaluated with quadratic regression models by age group within each sex. The physical fitness of thin and normal youth was, with few exceptions, significantly better than the physical fitness of overweight and obese youth in each age group by sex. On the other hand, physical fitness performances did not consistently differ, on average, between thin and normal weight and between overweight and obese youths. Results of the quadratic regressions indicated a curvilinear (parabolic) relationship between body mass index and each physical fitness item in most age groups. Better performances were attained by adolescents in the mid-range of the body mass index distribution, while performances of youth at the low and high ends of the body mass index distribution were lower. Relationships between the body mass index and physical fitness were generally nonlinear (parabolic) in youth 10-17 years. Copyright © 2018 Sociedade Brasileira de Pediatria. Published by Elsevier Editora Ltda. All rights reserved.
The EvoDevoCI: A Concept Inventory for Gauging Students’ Understanding of Evolutionary Developmental Biology

PubMed Central

Perez, Kathryn E.; Hiatt, Anna; Davis, Gregory K.; Trujillo, Caleb; French, Donald P.; Terry, Mark; Price, Rebecca M.

2013-01-01

The American Association for the Advancement of Science 2011 report Vision and Change in Undergraduate Biology Education encourages the teaching of developmental biology as an important part of teaching evolution. Recently, however, we found that biology majors often lack the developmental knowledge needed to understand evolutionary developmental biology, or “evo-devo.” To assist in efforts to improve evo-devo instruction among undergraduate biology majors, we designed a concept inventory (CI) for evolutionary developmental biology, the EvoDevoCI. The CI measures student understanding of six core evo-devo concepts using four scenarios and 11 multiple-choice items, all inspired by authentic scientific examples. Distracters were designed to represent the common conceptual difficulties students have with each evo-devo concept. The tool was validated by experts and administered at four institutions to 1191 students during preliminary (n = 652) and final (n = 539) field trials. We used student responses to evaluate the readability, difficulty, discriminability, validity, and reliability of the EvoDevoCI, which included items ranging in difficulty from 0.22–0.55 and in discriminability from 0.19–0.38. Such measures suggest the EvoDevoCI is an effective tool for assessing student understanding of evo-devo concepts and the prevalence of associated common conceptual difficulties among both novice and advanced undergraduate biology majors. PMID:24297293

Some links on this page may take you to non-federal websites. Their policies may differ from this site.