science test items: Topics by Science.gov

Sample records for science test items

Student science achievement and the integration of Indigenous knowledge on standardized tests

NASA Astrophysics Data System (ADS)

Dupuis, Juliann; Abrams, Eleanor

2017-09-01

In this article, we examine how American Indian students in Montana performed on standardized state science assessments when a small number of test items based upon traditional science knowledge from a cultural curriculum, "Indian Education for All", were included. Montana is the first state in the US to mandate the use of a culturally relevant curriculum in all schools and to incorporate this curriculum into a portion of the standardized assessment items. This study compares White and American Indian student test scores on these particular test items to determine how White and American Indian students perform on culturally relevant test items compared to traditional standard science test items. The connections between student achievement on adapted culturally relevant science test items versus traditional items brings valuable insights to the fields of science education, research on student assessments, and Indigenous studies.
The development of a science process assessment for fourth-grade students

NASA Astrophysics Data System (ADS)

Smith, Kathleen A.; Welliver, Paul W.

In this study, a multiple-choice test entitled the Science Process Assessment was developed to measure the science process skills of students in grade four. Based on the Recommended Science Competency Continuum for Grades K to 6 for Pennsylvania Schools, this instrument measured the skills of (1) observing, (2) classifying, (3) inferring, (4) predicting, (5) measuring, (6) communicating, (7) using space/time relations, (8) defining operationally, (9) formulating hypotheses, (10) experimenting, (11) recognizing variables, (12) interpreting data, and (13) formulating models. To prepare the instrument, classroom teachers and science educators were invited to participate in two science education workshops designed to develop an item bank of test questions applicable to measuring process skill learning. Participants formed writing teams and generated 65 test items representing the 13 process skills. After a comprehensive group critique of each item, 61 items were identified for inclusion into the Science Process Assessment item bank. To establish content validity, the item bank was submitted to a select panel of science educators for the purpose of judging item acceptability. This analysis yielded 55 acceptable test items and produced the Science Process Assessment, Pilot 1. Pilot 1 was administered to 184 fourth-grade students. Students were given a copy of the test booklet; teachers read each test aloud to the students. Upon completion of this first administration, data from the item analysis yielded a reliability coefficient of 0.73. Subsequently, 40 test items were identified for the Science Process Assessment, Pilot 2. Using the test-retest method, the Science Process Assessment, Pilot 2 (Test 1 and Test 2) was administered to 113 fourth-grade students. Reliability coefficients of 0.80 and 0.82, respectively, were ascertained. The correlation between Test 1 and Test 2 was 0.77. The results of this study indicate that (1) the Science Process Assessment, Pilot 2, is a valid and reliable instrument applicable to measuring the science process skills of students in grade four, (2) using educational workshops as a means of developing item banks of test questions is viable and productive in the test development process, and (3) involving classroom teachers and science educators in the test development process is educationally efficient and effective.
Science Library of Test Items. Volume Two.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

The second volume of test items in the Science Library of Test Items is intended as a resource to assist teachers in implementing and evaluating science courses in the first 4 years of Australian secondary school. The items were selected from questions submitted to the School Certificate Development Unit by teachers in New South Wales. Only the…
Relevance of Item Analysis in Standardizing an Achievement Test in Teaching of Physical Science in B.Ed Syllabus

ERIC Educational Resources Information Center

Marie, S. Maria Josephine Arokia; Edannur, Sreekala

2015-01-01

This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Differential Item Functioning (DIF) among Spanish-Speaking English Language Learners (ELLs) in State Science Tests

NASA Astrophysics Data System (ADS)

Ilich, Maria O.

Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Science or Reading: What Is Being Measured by Standardized Tests?

ERIC Educational Resources Information Center

Visone, Jeremy D.

2010-01-01

This study examined reading issues associated with a standardized science test. Grade 11 students in Connecticut were shown released science test items and asked about the reading issues associated with the items. Findings suggested that students varied in their understanding of the nature of the items and in their ability to read for detail. The…
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

NASA Astrophysics Data System (ADS)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
Questions and Problems in Science.

ERIC Educational Resources Information Center

Dressel, Paul L.; Nelson, Clarence H.

This folio of test items, contributed by a number of colleges and universities from their course, placement, entrance, or other institutional examinations, was compiled to aid teachers in constructing tests. Only those science courses offered in the first two years of college are represented by the scope of the items. The test items may also serve…
Primary Science Assessment Item Setters' Misconceptions Concerning Biological Science Concepts

ERIC Educational Resources Information Center

Boo, Hong Kwen

2007-01-01

Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the question setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…
The Australian Science Item Bank Project

ERIC Educational Resources Information Center

Kings, Clive B.; Cropley, Murray C.

1974-01-01

Describes the development of multiple-choice test item bank for grade ten science by the Australian Council for Educational Research. Other item banks are also being developed at the grade ten level in mathematics and social science. (RH)
Assessment in Science Education

NASA Astrophysics Data System (ADS)

Rustaman, N. Y.

2017-09-01

An analyses study focusing on scientific reasoning literacy was conducted to strengthen the stressing on assessment in science by combining the important of the nature of science and assessment as references, higher order thinking and scientific skills in assessing science learning as well. Having background in developing science process skills test items, inquiry in its many form, scientific and STEM literacy, it is believed that inquiry based learning should first be implemented among science educators and science learners before STEM education can successfully be developed among science teachers, prospective teachers, and students at all levels. After studying thoroughly a number of science researchers through their works, a model of scientific reasoning was proposed, and also simple rubrics and some examples of the test items were introduced in this article. As it is only the beginning, further studies will still be needed in the future with the involvement of prospective science teachers who have interests in assessment, either on authentic assessment or in test items development. In balance usage of alternative assessment rubrics, as well as valid and reliable test items (standard) will be needed in accelerating STEM education in Indonesia.
Development of The Science Processes Test.

ERIC Educational Resources Information Center

Ludeman, Robert R.

Presented is a description and copy of a test manual developed to include items in the test on the basis of children's performance; each item correlated highly with performance on an external criterion. The external criterion was the Individual Competency Measures of the elementary science program Science - A Process Approach (SAPA). The test…
Science Library of Test Items. Volume Eight. Mastery Testing Program. Series 3 & 4 Supplements to Introduction and Manual.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

Continuing a series of short tests aimed at measuring student mastery of specific skills in the natural sciences, this supplementary volume includes teachers' notes, a users' guide and inspection copies of test items 27 to 50. Answer keys and test scoring statistics are provided. The items are designed for grades 7 through 10, and a list of the…
Science Literacy: How do High School Students Solve PISA Test Items?

NASA Astrophysics Data System (ADS)

Wati, F.; Sinaga, P.; Priyandoko, D.

2017-09-01

The Programme for International Students Assessment (PISA) does assess students’ science literacy in a real-life contexts and wide variety of situation. Therefore, the results do not provide adequate information for the teacher to excavate students’ science literacy because the range of materials taught at schools depends on the curriculum used. This study aims to investigate the way how junior high school students in Indonesia solve PISA test items. Data was collected by using PISA test items in greenhouse unit employed to 36 students of 9th grade. Students’ answer was analyzed qualitatively for each item based on competence tested in the problem. The way how students answer the problem exhibits their ability in particular competence which is influenced by a number of factors. Those are students’ unfamiliarity with test construction, low performance on reading, low in connecting available information and question, and limitation on expressing their ideas effectively and easy-read. As the effort, selected PISA test items can be used in accordance teaching topic taught to familiarize students with science literacy.
Prueba de Ciencia Primer Grado (Science Test for the First Grade). [In Spanish

ERIC Educational Resources Information Center

Puerto Rico State Dept. of Education, Hato Rey.

This document consists of three parts: (1) a manual for administering the science test to first graders (in Spanish), (2) a copy of the test itself (pictorial), and (3) a list of expected competencies in science for the first three grades (in English). The test consists of 25, four-choice items. For each item, the administrator reads a statement…
Measuring more than we know? An examination of the motivational and situational influences in science achievement

NASA Astrophysics Data System (ADS)

Haydel, Angela Michelle

The purpose of this dissertation was to advance theoretical understanding about fit between the personal resources of individuals and the characteristics of science achievement tasks. Testing continues to be pervasive in schools, yet we know little about how students perceive tests and what they think and feel while they are actually working on test items. This study focused on both the personal (cognitive and motivational) and situational factors that may contribute to individual differences in achievement-related outcomes. 387 eighth grade students first completed a survey including measures of science achievement goals, capability beliefs, efficacy related to multiple-choice items and performance assessments, validity beliefs about multiple-choice items and performance assessments, and other perceptions of these item formats. Students then completed science achievement tests including multiple-choice items and two performance assessments. A sample of students was asked to verbalize both thoughts and feelings as they worked through the test items. These think-alouds were transcribed and coded for evidence of cognitive, metacognitive and motivational engagement. Following each test, all students completed measures of effort, mood, energy level and strategy use during testing. Students reported that performance assessments were more challenging, authentic, interesting and valid than multiple-choice tests. They also believed that comparisons between students were easier using multiple-choice items. Overall, students tried harder, felt better, had higher levels of energy and used more strategies while working on performance assessments. Findings suggested that performance assessments might be more congruent with a mastery achievement goal orientation, while multiple-choice tests might be more congruent with a performance achievement goal orientation. A variable-centered analytic approach including regression analyses provided information about how students, on average, who differed in terms of their teachers' ratings of their science ability, achievement goals, capability beliefs and experiences with science achievement tasks perceived, engaged in, and performed on multiple-choice items and performance assessments. Person-centered analyses provided information about the perceptions, engagement and performance of subgroups of individuals who had different motivational characteristics. Generally, students' personal goals and capability beliefs related more strongly to test perceptions, but not performance, while teacher ratings of ability and test-specific beliefs related to performance.
Item Specifications, Science Grade 8. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Item Specifications, Science Grade 6. Blue Prints for Testing Minimum Performance Test.

ERIC Educational Resources Information Center

Arkansas State Dept. of Education, Little Rock.

These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

ERIC Educational Resources Information Center

Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey

2016-01-01

We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…
Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
The Use of Illustrations in Large-Scale Science Assessment: A Comparative Study

ERIC Educational Resources Information Center

Wang, Chao

2012-01-01

This dissertation addresses the complexity of test illustrations design across cultures. More specifically, it examines how the characteristics of illustrations used in science test items vary across content areas, assessment programs, and cultural origins. It compares a total of 416 Grade 8 illustrated items from the areas of earth science, life…
Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne

2017-01-01

Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…
Evaluating Instrument Quality in Science Education: Rasch-based analyses of a Nature of Science test

NASA Astrophysics Data System (ADS)

Neumann, Irene; Neumann, Knut; Nehm, Ross

2011-07-01

Given the central importance of the Nature of Science (NOS) and Scientific Inquiry (SI) in national and international science standards and science learning, empirical support for the theoretical delineation of these constructs is of considerable significance. Furthermore, tests of the effects of varying magnitudes of NOS knowledge on domain-specific science understanding and belief require the application of instruments validated in accordance with AERA, APA, and NCME assessment standards. Our study explores three interrelated aspects of a recently developed NOS instrument: (1) validity and reliability; (2) instrument dimensionality; and (3) item scales, properties, and qualities within the context of Classical Test Theory and Item Response Theory (Rasch modeling). A construct analysis revealed that the instrument did not match published operationalizations of NOS concepts. Rasch analysis of the original instrument-as well as a reduced item set-indicated that a two-dimensional Rasch model fit significantly better than a one-dimensional model in both cases. Thus, our study revealed that NOS and SI are supported as two separate dimensions, corroborating theoretical distinctions in the literature. To identify items with unacceptable fit values, item quality analyses were used. A Wright Map revealed that few items sufficiently distinguished high performers in the sample and excessive numbers of items were present at the low end of the performance scale. Overall, our study outlines an approach for how Rasch modeling may be used to evaluate and improve Likert-type instruments in science education.
Assessing the Life Science Knowledge of Students and Teachers Represented by the K–8 National Science Standards

PubMed Central

Sadler, Philip M.; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K–8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students’ performance and to have a high level of awareness of the particular misconceptions that their students hold on the K–4 standards, but a low level of awareness of misconceptions related to the 5–8 standards. PMID:24006402
Assessing the life science knowledge of students and teachers represented by the K-8 national science standards.

PubMed

Sadler, Philip M; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students' performance and to have a high level of awareness of the particular misconceptions that their students hold on the K-4 standards, but a low level of awareness of misconceptions related to the 5-8 standards.
Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
A Diagnostic-Remediation Teaching System for Enhancing Elementary Students' Science Listening Comprehension

ERIC Educational Resources Information Center

Lin, Sheau-Wen; Liu, Yu

2017-01-01

The purpose of this study was to explore elementary students' listening comprehension changes using a Web-based teaching system that can diagnose and remediate students' science listening comprehension problems during scientific inquiry. The 3-component system consisted of a 9-item science listening comprehension test, a 37-item diagnostic test,…
Helping Poor Readers Demonstrate Their Science Competence: Item Characteristics Supporting Text-Picture Integration

ERIC Educational Resources Information Center

Saß, Steffani; Schütte, Kerstin

2016-01-01

Solving test items might require abilities in test-takers other than the construct the test was designed to assess. Item and student characteristics such as item format or reading comprehension can impact the test result. This experiment is based on cognitive theories of text and picture comprehension. It examines whether integration aids, which…
Pedagogy of Science Teaching Tests: Formative assessments of science teaching orientations

NASA Astrophysics Data System (ADS)

Cobern, William W.; Schuster, David; Adams, Betty; Skjold, Brandy Ann; Zeynep Muğaloğlu, Ebru; Bentz, Amy; Sparks, Kelly

2014-09-01

A critical aspect of teacher education is gaining pedagogical content knowledge of how to teach science for conceptual understanding. Given the time limitations of college methods courses, it is difficult to touch on more than a fraction of the science topics potentially taught across grades K-8, particularly in the context of relevant pedagogies. This research and development work centers on constructing a formative assessment resource to help expose pre-service teachers to a greater number of science topics within teaching episodes using various modes of instruction. To this end, 100 problem-based, science pedagogy assessment items were developed via expert group discussions and pilot testing. Each item contains a classroom vignette followed by response choices carefully crafted to include four basic pedagogies (didactic direct, active direct, guided inquiry, and open inquiry). The brief but numerous items allow a substantial increase in the number of science topics that pre-service students may consider. The intention is that students and teachers will be able to share and discuss particular responses to individual items, or else record their responses to collections of items and thereby create a snapshot profile of their teaching orientations. Subsets of items were piloted with students in pre-service science methods courses, and the quantitative results of student responses were spread sufficiently to suggest that the items can be effective for their intended purpose.
Practical Implications of Test Dimensionality for Item Response Theory Calibration of the Medical College Admission Test. MCAT Monograph.

ERIC Educational Resources Information Center

Childs, Ruth A.; Oppler, Scott H.

The use of item response theory (IRT) in the Medical College Admission Test (MCAT) testing program has been limited. This study provides a basis for future IRT analyses of the MCAT by exploring the dimensionality of each of the MCAT's three multiple-choice test sections (Verbal Reasoning, Physical Sciences, and Biological Sciences) and the…
Teacher understanding of the nature of science and its impact on student learning about the nature of science in STS/Constructivist classrooms

NASA Astrophysics Data System (ADS)

Lieu, Sang-Chong

In the National Science Education Standards both STS/Constructivist teaching strategies and student understanding of the nature of science are stressed. If certain teaching practices can achieve both goals at one time, many problems will be solved. Such relationships were investigated in this study. Teacher subjects were selected based on two extremes of scores on the Testing on Understanding Science. The Secondary Teacher Analysis Matrix - Science Version was used to categorize teachers into their use of STS/Constructivist or more traditional strategies based on their teaching behaviors observed from video tapes. After the teacher subjects were selected, a non-equivalent control group design was adapted for the administration of items from the Views on Science-Technology-Society (VOSTS) to the students of these teachers. Pre- and post-test data were collected using 20 VOSTS items. VOSTS options were categorized into a Congruent/Partially Congruent/Naive format by a panel of six science educators. A special scoring procedure was devised for the VOSTS items to allow the use of inferential statistics. When performance on 17 VOSTS items were studied, more understanding of the nature of science by teachers, the presence of an STS/Constructivist learning environment in the classroom, or a combination of both factors was not found to help students learn more about the nature of science. Explanations for such results are offered. A McNemar test was performed to take a closer look at the 17 VOSTS items individually. The results indicated that students who were taught by STS/Constructivist teachers with high TOUS scores moved toward "congruent" views concerning the nature of science on a number of VOSTS items. Also, students who were taught by more traditional teachers with low TOUS scores moved toward "naive" views on other VOSTS items. The findings support the fact that teachers who know more about the nature of science and who practice many of the STS/Constructivist teaching strategies assist students in learning more about the nature of science.
Documentation of Assessment Instrumentation--The NORC/CRESST 12th Grade Science Assessment, Item Databases, and Test Booklets. Project 2.6: Analytic Models To Monitor Status & Progress of Learning & Performance & Their Antecedents: The School Science Assessment Project.

ERIC Educational Resources Information Center

Bock, H. Darrell

The hardware and software system used to create the National Opinion Research Center/Center for Research on Evaluation, Standards, and Student Testing (NORC/CRESST) item databases and test booklets for the 12th-grade science assessment are described. A general description of the capabilities of the system is given, with some specific information…
Rasch analysis for psychometric improvement of science attitude rating scales

NASA Astrophysics Data System (ADS)

Oon, Pey-Tee; Fan, Xitao

2017-04-01

Students' attitude towards science (SAS) is often a subject of investigation in science education research. Survey of rating scale is commonly used in the study of SAS. The present study illustrates how Rasch analysis can be used to provide psychometric information of SAS rating scales. The analyses were conducted on a 20-item SAS scale used in an existing dataset of The Trends in International Mathematics and Science Study (TIMSS) (2011). Data of all the eight-grade participants from Hong Kong and Singapore (N = 9942) were retrieved for analyses. Additional insights from Rasch analysis that are not commonly available from conventional test and item analyses were discussed, such as invariance measurement of SAS, unidimensionality of SAS construct, optimum utilization of SAS rating categories, and item difficulty hierarchy in the SAS scale. Recommendations on how TIMSS items on the measurement of SAS can be better designed were discussed. The study also highlights the importance of using Rasch estimates for statistical parametric tests (e.g. ANOVA, t-test) that are common in science education research for group comparisons.
Factor analysis for instruments of science learning motivation and its implementation for the chemistry and biology teacher candidates

NASA Astrophysics Data System (ADS)

Prasetya, A. T.; Ridlo, S.

2018-03-01

The purpose of this study is to test the learning motivation of science instruments and compare the learning motivation of science from chemistry and biology teacher candidates. Kuesioner Motivasi Sains (KMS) in Indonesian adoption of the Science Motivation Questionnaire II (SMQ II) consisting of 25 items with a 5-point Likert scale. The number of respondents for the Exploratory Factor Analysis (EFA) test was 312. The Kaiser-Meyer-Olkin (KMO), determinant, Bartlett’s Sphericity, Measures of Sampling Adequacy (MSA) tests against KMS using SPSS 20.0, and Lisrel 8.51 software indicate eligible indications. However testing of Communalities obtained results that there are 4 items not qualified, so the item is discarded. The second test, all parameters of eligibility and has a magnitude of Root Mean Square Error of Approximation (RMSEA), P-Value for the Test of Close Fit (RMSEA <0.05), Goodness of Fit Index (GFI) was good. The new KMS with 21 valid items and composite reliability of 0.9329 can be used to test the level of learning motivation of science which includes Intrinsic Motivation, Sefl-Efficacy, Self-Determination, Grade Motivation and Career Motivation for students who master the Indonesian language. KMS trials of chemistry and biology teacher candidates obtained no significant difference in the learning motivation between the two groups.
Marine Education Knowledge Inventory.

ERIC Educational Resources Information Center

Hounshell, Paul B.; Hampton, Carolyn

This 35-item, multiple-choice Marine Education Knowledge Inventory was developed for use in upper elementary/middle schools to measure a student's knowledge of marine science. Content of test items is drawn from oceanography, ecology, earth science, navigation, and the biological sciences (focusing on marine animals). Steps in the construction of…
The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

ERIC Educational Resources Information Center

Bennett, Randy Elliot; And Others

1990-01-01

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)

Home Science Library of Test Items. Volume One.

ERIC Educational Resources Information Center

Smith, Jan, Ed.

As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Language Effects in International Testing: The Case of PISA 2006 Science Items

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art

2016-01-01

We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
A Study on Detecting of Differential Item Functioning of PISA 2006 Science Literacy Items in Turkish and American Samples

ERIC Educational Resources Information Center

Çikirikçi Demirtasli, Nükhet; Ulutas, Seher

2015-01-01

Problem Statement: Item bias occurs when individuals from different groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test…
Primary Science Assessment Item Setters' Misconceptions Concerning the State Changes of Water

ERIC Educational Resources Information Center

Boo, Hong Kwen

2006-01-01

Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the questions setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…
Assessing the Life Science Knowledge of Students and Teachers Represented by the K-8 National Science Standards

ERIC Educational Resources Information Center

Sadler, Philip M.; Coyle, Harold; Cook Smith, Nancy; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test…
The Role of Content and Context in PISA Interest Scales: A study of the embedded interest items in the PISA 2006 science assessment

NASA Astrophysics Data System (ADS)

Drechsel, Barbara; Carstensen, Claus; Prenzel, Manfred

2011-01-01

This paper focuses interest in science as one of the attitudinal aspects of scientific literacy. Large-scale data from the Programme for International Student Assessment (PISA) 2006 are analysed in order to describe student interest more precisely. So far the analyses have provided a general indicator of interest, aggregated over all contexts and contents in the science test. With its innovative approach PISA embeds interest items within the cognitive test unit and its contents and contexts. The main difference from conventional interest measures is that in most questionnaires, a relatively small number of interest items cover broad fields of contents and contexts. The science units represent a number of systematically differentiated scientific contexts and contents. The units' stimulus texts allow for concrete descriptions of relevant content aspects, applications, and contexts. In the analyses, multidimensional item response models are applied in order to disentangle student interest. The results indicate that multidimensional models fit the data. A two-dimensional model separating interest into two different knowledge of science dimensions described in the PISA science framework is further analysed with respect to gender, performance differences, and country. The findings give a comprehensive description of students' interest in science. The paper deals with methodological problems and describes requirements of the test construction for further assessments. The results are discussed with regard to their significance for science education.
Exploring problem solving strategies on multiple-choice science items: Comparing native Spanish-speaking English Language Learners and mainstream monolinguals

NASA Astrophysics Data System (ADS)

Kachchaf, Rachel Rae

The purpose of this study was to compare how English language learners (ELLs) and monolingual English speakers solved multiple-choice items administered with and without a new form of testing accommodation---vignette illustration (VI). By incorporating theories from second language acquisition, bilingualism, and sociolinguistics, this study was able to gain more accurate and comprehensive input into the ways students interacted with items. This mixed methods study used verbal protocols to elicit the thinking processes of thirty-six native Spanish-speaking English language learners (ELLs), and 36 native-English speaking non-ELLs when solving multiple-choice science items. Results from both qualitative and quantitative analyses show that ELLs used a wider variety of actions oriented to making sense of the items than non-ELLs. In contrast, non-ELLs used more problem solving strategies than ELLs. There were no statistically significant differences in student performance based on the interaction of presence of illustration and linguistic status or the main effect of presence of illustration. However, there were significant differences based on the main effect of linguistic status. An interaction between the characteristics of the students, the items, and the illustrations indicates considerable heterogeneity in the ways in which students from both linguistic groups think about and respond to science test items. The results of this study speak to the need for more research involving ELLs in the process of test development to create test items that do not require ELLs to carry out significantly more actions to make sense of the item than monolingual students.
Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

ERIC Educational Resources Information Center

Alsadaawi, Abdullah Saleh

2017-01-01

The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
Applying Multidimensional Item Response Theory Models in Validating Test Dimensionality: An Example of K-12 Large-Scale Science Assessment

ERIC Educational Resources Information Center

Li, Ying; Jiao, Hong; Lissitz, Robert W.

2012-01-01

This study investigated the application of multidimensional item response theory (IRT) models to validate test structure and dimensionality. Multiple content areas or domains within a single subject often exist in large-scale achievement tests. Such areas or domains may cause multidimensionality or local item dependence, which both violate the…
Scientific literacy: Factor structure and gender differences

NASA Astrophysics Data System (ADS)

Manhart, James Joseph

The purpose of this study was to investigate the factor structure of scientific literacy and to document any gender differences with respect to each factor. Participants included 1139 students (574 females, 565 males) in grades 9 through 12 who were taking a science class at one of four Midwestern high schools. Based on National Science Education Standards, a 100 item multiple-choice test was constructed to assess scientific literacy. Confirmatory factor analysis of item parcels suggested a three factor model was the best way to explain the data resulting from the administration of this test. The factors were labeled constructs of science, abilities necessary to do scientific inquiry, and social aspects of science. Gender differences with respect to these factors were examined using analysis of variance procedures. Because differential enrollment in science classes could cause gender differences in grades 11 and 12, parallel analyses were conducted on the grades 9 and 10 subsample and the grades 11 and 12 subsample. However, the results of the two analyses were similar. The most consistent gender difference observed was that females performed better than males on the social aspects of science factor. Males tended to perform better than females on the constructs of science factor, although no consistent gender difference was noted for items dealing with life science. With respect to the abilities necessary to do scientific inquiry factor, females tended to perform better than males in grades 9 and 10, while no consistent gender difference was observed in grades 11 and 12. Gender differences were also examined using the Mantel-Haenszel procedure to flag individual items that functioned differently for females and males of the same ability. Twelve items were flagged for grades 9 and 10 (8 in favor of females, 4 in favor of males). Fourteen items were flagged for grades 11 and 12 (7 in favor of females, 7 in favor of males). All of the flagged items exhibited only small to moderate differential item functioning (DIF). Only three items were similarly flagged in both subsamples, one item from each factor.
Using Explanatory Item Response Models to Evaluate Complex Scientific Tasks Designed for the Next Generation Science Standards

NASA Astrophysics Data System (ADS)

Chiu, Tina

This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
Detecting Gender Bias Through Test Item Analysis

NASA Astrophysics Data System (ADS)

González-Espada, Wilson J.

2009-03-01

Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?

ERIC Educational Resources Information Center

DeMars, Christine

Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…
Attainment of Selected Earth Science Concepts by Texas High School Seniors.

ERIC Educational Resources Information Center

Rollins, Mavis M.; And Others

The purpose of this study was to determine whether high school seniors (N=492) had attained each of five selected earth science concepts and if said attainment was influenced by the number of science courses completed. A 72-item, multiple-choice format test (12 items for each concept) was developed and piloted previous to this study to measure…
Exploring differential item functioning (DIF) with the Rasch model: a comparison of gender differences on eighth grade science items in the United States and Spain.

PubMed

Babiar, Tasha Calvert

2011-01-01

Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth item-level analysis across two countries: Spain and the United States. This study investigated eighth-grade gender differences on science items across the two countries. A secondary purpose of the study was to explore the nature of gender differences using the many-faceted Rasch Model as a way to estimate gender DIF. A secondary analysis of data from the Third International Mathematics and Science Study (TIMSS) was used to address three questions: 1) Does gender DIF in science achievement exist? 2) Is there a relationship between gender DIF and characteristics of the science items? 3) Do the relationships between item characteristics and gender DIF in science items replicate across countries. Participants included 7,087 eight grade students from the United States and 3,855 students from Spain who participated in TIMSS. The Facets program (Linacre and Wright, 1992) was used to estimate gender DIF. The results of the analysis indicate that the content of the item seemed to be related to gender DIF. The analysis also suggests that there is a relationship between gender DIF and item format. No pattern of gender DIF related to cognitive demand was found. The general pattern of gender DIF was similar across the two countries used in the analysis. The strength of item-level analysis as opposed to group mean difference analysis is that gender differences can be detected at the item level, even when no mean differences can be detected at the group level.
Technology Use in Science Instruction (TUSI): Aligning the Integration of Technology in Science Instruction in Ways Supportive of Science Education Reform

NASA Astrophysics Data System (ADS)

Campbell, Todd; Abd-Hamid, Nor Hashidah

2013-08-01

This study describes the development of an instrument to investigate the extent to which technology is integrated in science instruction in ways aligned to science reform outlined in standards documents. The instrument was developed by: (a) creating items consistent with the five dimensions identified in science education literature, (b) establishing content validity with both national and international content experts, (c) refining the item pool based on content expert feedback, (d) piloting testing of the instrument, (e) checking statistical reliability and item analysis, and (f) subsequently refining and finalization of the instrument. The TUSI was administered in a field test across eleven classrooms by three observers, with a total of 33 TUSI ratings completed. The finalized instrument was found to have acceptable inter-rater intraclass correlation reliability estimates. After the final stage of development, the TUSI instrument consisted of 26-items separated into the original five categories, which aligned with the exploratory factor analysis clustering of the items. Additionally, concurrent validity of the TUSI was established with the Reformed Teaching Observation Protocol. Finally, a subsequent set of 17 different classrooms were observed during the spring of 2011, and for the 9 classrooms where technology integration was observed, an overall Cronbach alpha reliability coefficient of 0.913 was found. Based on the analyses completed, the TUSI appears to be a useful instrument for measuring how technology is integrated into science classrooms and is seen as one mechanism for measuring the intersection of technological, pedagogical, and content knowledge in science classrooms.
A study in the use of the position of discrepant events in the teaching of science

NASA Astrophysics Data System (ADS)

Frassinelli, John James

The purpose of this study was to determine whether alternative placement of discrepant events would impact affective and cognitive outcomes of ninth-grade physical science students grouped into intact classes and classified as either "high" or "low" in prior academic achievement. Although researchers have found discrepant events to be effective in terms of cognition and recall, their chronological placement within science lessons had not been empirically researched. In this study, discrepant events were presented before, during, and after specific science lessons involving thermodynamics and heat. Discrepant events were withheld from the control group. To measure affective outcomes, the "enjoyment" and "motivation" scales taken from Sandman's (1973) Attitudes Towards Science Inventory (ATSI) were used to index subjects' global feelings about studying science, while a 20-item set of Semantic Differential (SD) scales was employed to determine their attitudes regarding the specific subject matter taught. To measure cognitive outcomes, a 20-item, selected response test was constructed by the researcher, with 6 items intended to assess subjects' knowledge of unit materials, and 14 items designed to query their understanding of unit concepts. Each subject (N = 131) was administered identical forms of each test in both pre-and post-test formats, both before and after the four-week study. Analyzed using a 4 x 2 mixed Analysis of Variance (ANOVA) model, data pertinent to the ATSI suggested neither between- nor within-group differences in subjects' global attitudes about studying science, although data pertinent to the SD scales indicated generally improved attitudes about studying thermodynamics and heat (F (1,122) = 2.759, p < .10). On the cognitive pretests and posttests, significant two-way interactions were observed for the overall test and experimental condition (F (3,121) = 4.068, p < .01), as well as for the overall test and higher prior achievement in physical science (F (1,121) = 7.059,p < .01). As contrasted with negligible changes in the control group's scores, robust mean-difference effect sizes were observed for all three treatment groups---"beginning" (d = 1.24), "during" (d = 0.70), and "after" ( d = 0.78)---but particularly for the "beginning" group. Subsequent analysis revealed that the apparent advantage of the "beginning" group was largely attributable to a particularly strong showing on the six test items concerned with knowledge (d = 2.06).
Testing and Evaluating Student Success with Laboratory Blocks, A Resource Book for Teachers.

ERIC Educational Resources Information Center

Lee, Addison E.

Guidelines are given for the preparation of test items and tests for BSCS (Biological Sciences Curriculum Study) biology, including examples of items testing four major kinds of abilities: ability to repeat or use information and meanings, ability to apply principles, ability to apply intellectual skills crucial to the understanding of biological…
Science Library of Test Items. Volume Four: Practical Testing Guide.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As one in a series of test items collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, the guide gives a wide range of questions and activities for the manipulation of scientific equipment to allow assessment of students' practical laboratory skills. Instructions are given to make norm-referenced or…
Differential Performance by English Language Learners on an Inquiry-Based Science Assessment

NASA Astrophysics Data System (ADS)

Turkan, Sultan; Liu, Ou Lydia

2012-10-01

The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs.

Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Sources of difficulty in assessment: example of PISA science items

NASA Astrophysics Data System (ADS)

Le Hebel, Florence; Montpied, Pascale; Tiberghien, Andrée; Fontanieu, Valérie

2017-03-01

The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item's proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item's proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students' low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.
The Relationship between Item Context Characteristics and Student Performance: The Case of the 2006 and 2009 PISA Science Items

ERIC Educational Resources Information Center

Ruiz-Primo, Maria Araceli; Li, Min

2015-01-01

Background: A long-standing premise in test design is that contextualizing test items makes them concrete, less demanding, and more conducive to determining whether students can apply or transfer their knowledge. Purpose: We assert that despite decades of study and experience, much remains to be learned about how to construct effective and fair…
Comparing Science Achievement Constructs: Targeted and Achieved

ERIC Educational Resources Information Center

Ferrara, Steve; Duncan, Teresa

2011-01-01

This article illustrates how test specifications based solely on academic content standards, without attention to other cognitive skills and item response demands, can fall short of their targeted constructs. First, the authors inductively describe the science achievement construct represented by a statewide sixth-grade science proficiency test.…
Consensus on measurement properties and feasibility of performance tests for the exercise and sport sciences: a Delphi study.

PubMed

Robertson, Sam; Kremer, Peter; Aisbett, Brad; Tran, Jacqueline; Cerin, Ester

2017-12-01

Performance tests are used for multiple purposes in exercise and sport science. Ensuring that a test displays an appropriate level of measurement properties for use within a population is important to ensure confidence in test findings. The aim of this study was to obtain subject matter expert consensus on the measurement and feasibility properties that should be considered for performance tests used in the exercise and sport sciences and how these should be defined. This information was used to develop a checklist for broader dissemination. A two-round Delphi study was undertaken including 33 exercise scientists, academics and sport scientists. Participants were asked to rate the importance of a range of measurement properties relevant to performance tests in exercise and sport science. Responses were obtained in binary and Likert-scale formats, with consensus defined as achieving 67% agreement on each question. Consensus was reached on definitions and terminology for all items. Ten level 1 items (those that achieved consensus on all four questions) and nine level 2 items (those achieving consensus on ≥2 questions) were included. Both levels were included in the final checklist. The checklist developed from this study can be used to inform decision-making and test selection for practitioners and researchers in the exercise and sport sciences. This can facilitate knowledge sharing and performance comparisons across sub-disciplines, thereby improving existing field practice and research methodological quality.
Science Library of Test Items. Volume Twenty-Three. Geology (Part One). Free Response Testing Program.

ERIC Educational Resources Information Center

Hopley, Ken; And Others

The first of several planned volumes of Free Response Test Items contains geology questions developed by the Assessment and Evaluation Unit of the New South Wales Department of Education. Two additional geology volumes and biology and chemistry volumes are in preparation. The questions in this volume were written and reviewed by practicing…
Using Data Mining to Predict K-12 Students' Performance on Large-Scale Assessment Items Related to Energy

ERIC Educational Resources Information Center

Liu, Xiufeng; Ruiz, Miguel E.

2008-01-01

This article reports a study on using data mining to predict K-12 students' competence levels on test items related to energy. Data sources are the 1995 Third International Mathematics and Science Study (TIMSS), 1999 TIMSS-Repeat, 2003 Trend in International Mathematics and Science Study (TIMSS), and the National Assessment of Educational…
TEST BOOKLET FOR HIGH SCHOOL BIOLOGY, EXPERIMENTAL MATERIALS FOR USE 1966-1968.

ERIC Educational Resources Information Center

Biological Sciences Curriculum Study, Boulder, CO.

SUPPLEMENTARY TEST QUESTIONS FOR USE BY SECONDARY BIOLOGICAL SCIENCES CURRICULUM STUDY GREEN VERSION BIOLOGY TEACHERS IN THE CONSTRUCTION OF EXAMINATIONS ARE CONTAINED IN THIS EXPERIMENTAL MANUAL. THE ITEMS WERE PREPARED BY THE BIOLOGICAL SCIENCES CURRICULUM STUDY TEST CONSTRUCTION COMMITTEE IN RESPONSE TO TEACHER REQUESTS FOR SHORT-RANGE TESTS.…
Test Item Construction and Validation: Developing a Statewide Assessment for Agricultural Science Education

ERIC Educational Resources Information Center

Rivera, Jennifer E.

2011-01-01

The State of New York Agriculture Science Education secondary program is required to have a certification exam for students to assess their agriculture science education experience as a Regent's requirement towards graduation. This paper focuses on the procedure used to develop and validate two content sub-test questions within a…
Attitude measurement: Judging the emotional intensity of likert-type science attitude statements

NASA Astrophysics Data System (ADS)

Shrigley, Robert L.; Koballa, Thomas R., Jr.

Emotional intensity, that readiness of a teacher to respond favorably or unfavorably toward such psychological objects as science or the teaching of science, is the quality that distinguishes the attitude concept from other related psychological concepts. It would seem, then, that valid attitude statements, if they are to reflect the definition of attitude, would evoke emotional intensity, responses in both a favorable and unfavorable direction by a group of teachers on each item on a science attitude scale. Science educators who design or modify science attitude scales should continue using item-total correlations and other quantitative techniques to test for emotional intensity, but qualitative judgments are necessary, too. In addition, the frequency distribution of data generated by each statement should be examined for skewness and high percentages of neutral responses, both of which can impair the emotional intensity of an item.
Is It Working? Distractor Analysis Results from the Test Of Astronomy STandards (TOAST) Assessment Instrument

NASA Astrophysics Data System (ADS)

Slater, Stephanie

2009-05-01

The Test Of Astronomy STandards (TOAST) assessment instrument is a multiple-choice survey tightly aligned to the consensus learning goals stated by the American Astronomical Society - Chair's Conference on ASTRO 101, the American Association of the Advancement of Science's Project 2061 Benchmarks, and the National Research Council's National Science Education Standards. Researchers from the Cognition in Astronomy, Physics and Earth sciences Research (CAPER) Team at the University of Wyoming's Science and Math Teaching Center (UWYO SMTC) have been conducting a question-by-question distractor analysis procedure to determine the sensitivity and effectiveness of each item. In brief, the frequency each possible answer choice, known as a foil or distractor on a multiple-choice test, is determined and compared to the existing literature on the teaching and learning of astronomy. In addition to having statistical difficulty and discrimination values, a well functioning assessment item will show students selecting distractors in the relative proportions to how we expect them to respond based on known misconceptions and reasoning difficulties. In all cases, our distractor analysis suggests that all items are functioning as expected. These results add weight to the validity of the Test Of Astronomy STandards (TOAST) assessment instrument, which is designed to help instructors and researchers measure the impact of course-length duration instructional strategies for undergraduate science survey courses with learning goals tightly aligned to the consensus goals of the astronomy education community.
Science Library of Test Items. Volume Three. Mastery Testing Programme. Introduction and Manual.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

A set of short tests aimed at measuring student mastery of specific skills in the natural sciences are presented with a description of the mastery program's purposes, development, and methods. Mastery learning, criterion-referenced testing, and the scope of skills to be tested are defined. Each of the multiple choice tests for grades 7 through 10…
Science Library of Test Items. Volume Eleven. Mastery Testing Programme. [Mastery Tests Series 3.] Tests M27-M38.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of tests 27 through 38 include: (27) reading a grid plan; (28) identifying common invertebrates; (29) characteristics of invertebrates; (30) identifying elements; (31) using scientific notation part I; (32) classifying minerals; (33) predicting the…
Combined Common Person and Common Item Equating of Medical Science Examinations.

ERIC Educational Resources Information Center

Kelley, Paul R.

This equating study of the National Board of Medical Examiners Examinations was a combined common persons and common items equating, using the Rasch model. The 1,000-item test was administered to about 3,000 second-year medical students in seven equal-length subtests: anatomy, physiology, biochemistry, pathology, microbiology, pharmacology, and…
The influence of retrieval practice on memory and comprehension of science texts

NASA Astrophysics Data System (ADS)

Hinze, Scott R.

The testing effect, where retrieval practice aids performance on later tests, may be a powerful tool for improving learning and retention. Three experiments test the potentials and limitations of retrieval practice for retention and comprehension of the content of science texts. Experiment 1 demonstrated that cued recall of paragraphs, but not fill-in-the-blank tests, improved performance on new memory items. Experiment 2 manipulated test expectancy and extended cued recall benefits to inference items. Test expectancies established prior to retrieval altered processing to either be ineffective (when expecting a memory test) or effective (when expecting an inference test). In Experiment 3, the processing task engaged in during retrieval practice was manipulated. Explanation during retrieval practice led to more effective transfer than free recall instructions, especially when participants were compliant and effective in their explanations. These experiments demonstrate that some, but not all, processing during retrieval practice can influence both memory and understanding of science texts.
Language Games and Meaning as Used in Student Encounters with Scientific Literacy Test Items

ERIC Educational Resources Information Center

Serder, Margareta; Jakobsson, Anders

2016-01-01

Previous research in science education has suggested that difficulties among students learning science relate to challenges in framing its discourse. This article examines the role that language plays in a scientific literacy test for which everyday life is an augmented aspect. Video-recorded data was collected in four ninth-grade science classes…
Student Science Achievement and the Integration of Indigenous Knowledge on Standardized Tests

ERIC Educational Resources Information Center

Dupuis, Juliann; Abrams, Eleanor

2017-01-01

In this article, we examine how American Indian students in Montana performed on standardized state science assessments when a small number of test items based upon traditional science knowledge from a cultural curriculum, "Indian Education for All", were included. Montana is the first state in the US to mandate the use of a culturally…
Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation.

PubMed

Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel

2017-06-15

Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Fostering a student's skill for analyzing test items through an authentic task

NASA Astrophysics Data System (ADS)

Setiawan, Beni; Sabtiawan, Wahyu Budi

2017-08-01

Analyzing test items is a skill that must be mastered by prospective teachers, in order to determine the quality of test questions which have been written. The main aim of this research was to describe the effectiveness of authentic task to foster the student's skill for analyzing test items involving validity, reliability, item discrimination index, level of difficulty, and distractor functioning through the authentic task. The participant of the research is students of science education study program, science and mathematics faculty, Universitas Negeri Surabaya, enrolled for assessment course. The research design was a one-group posttest design. The treatment in this study is that the students were provided an authentic task facilitating the students to develop test items, then they analyze the items like a professional assessor using Microsoft Excel and Anates Software. The data of research obtained were analyzed descriptively, such as the analysis was presented by displaying the data of students' skill, then they were associated with theories or previous empirical studies. The research showed the task facilitated the students to have the skills. Thirty-one students got a perfect score for the analyzing, five students achieved 97% mastery, two students had 92% mastery, and another two students got 89% and 79% of mastery. The implication of the finding was the students who get authentic tasks forcing them to perform like a professional, the possibility of the students for achieving the professional skills will be higher at the end of learning.
Three approaches to investigating the multidimensional nature of a science assessment

NASA Astrophysics Data System (ADS)

Gokiert, Rebecca Jayne

The purpose of this study was to investigate a multi-method approach for collecting validity evidence about the underlying knowledge and skills measured by a large-scale science assessment. The three approaches included analysis of dimensionality, differential item functioning (DIF), and think-aloud interviews. The specific research questions addressed were: (1) Does the 4-factor model previously found by Hamilton et al. (1995) for the grade 8 sample explain the data? (2) Do the performances of male and female students systematically differ? Are these performance differences captured in the dimensions? (3) Can think-aloud reports aid in the generation of hypotheses about the underlying knowledge and skills that are measured by this test? A confirmatory factor analysis of the 4-factor model revealed good model data fit for both the AB and AC tests. Twenty-four of the 83 AB test items and 16 of the 77 AC test items displayed significant DIF, however, items were found, on average, to favour both males and females equally. There were some systematic differences found across the 4-factors; items favouring males tended to be related to earth and space sciences, stereotypical male related activities, and numerical operations. Conversely, females were found to outperform males on items that required careful reading and attention to detail. Concurrent and retrospective verbal reports (Ericsson & Simon, 1993) were collected from 16 grade 8 students (9 male and 7 female) while they solved 12 DIF items. Four general cognitive processing themes were identified from the student protocols that could be used to explain male and female problem solving. The themes included comprehension (verbal and visual), visualization, background knowledge/experience (school or life), and strategy use. There were systematic differences in cognitive processing between the students that answered the items correctly and the students who answered the items incorrectly; however, this did not always correspond with the statistical gender DIF results. Although the multifaceted approach produced interpretable and meaningful validity evidence about the knowledge and skills, these forms of validity evidence only begin to provide a basic understanding of the underlying construct(s) that are being measured.

Comparing Eighth-Grade Diagnostic Test Results for Korean, Czech, and American Students.

ERIC Educational Resources Information Center

Um, Eunkyoung; Dogan, Enis; Im, Seongah; Tatsuoka, Kimumi; Corter, James E.

Diagnostic analyses were conducted on data from the Third International Mathematics and Science Study second population (TIMSS-R; 1999) from the United States, Korea, and the Czech Republic in terms of test item attributes (i.e., content, processing skills, and item format) and inferred students' knowledge. The Rule Space model (K. Tatsuoka, 1998)…
Biological Science: An Ecological Approach. BSCS Green Version. Teacher's Resource Book and Test Item Bank. Sixth Edition.

ERIC Educational Resources Information Center

Biological Sciences Curriculum Study, Colorado Springs.

This book consists of four sections: (1) "Supplemental Materials"; (2) "Supplemental Investigations"; (3) "Test Item Bank"; and (4) "Blackline Masters." The first section provides additional background material related to selected chapters and investigations in the student book. Included are a periodic table of the elements, genetics problems and…
Science Library of Test Items. Volume Ten. Mastery Testing Programme. [Mastery Tests Series 2.] Tests M14-M26.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of tests 14 through 26 include: (14) calculating an average; (15) identifying parts of the scientific method; (16) reading a geological map; (17) identifying elements, mixtures and compounds; (18) using Ohm's law in calculation; (19) interpreting…
Who's on First? Gender Differences in Performance on the "SAT"® Test on Critical Reading Items with Sports and Science Content. Research Report. ETS RR-16-26

ERIC Educational Resources Information Center

Chubbuck, Kay; Curley, W. Edward; King, Teresa C.

2016-01-01

This study gathered quantitative and qualitative evidence concerning gender differences in performance by using critical reading material on the "SAT"® test with sports and science content. The fundamental research questions guiding the study were: If sports and science are to be included in a skills test, what kinds of material are…
Measuring Graph Comprehension, Critique, and Construction in Science

NASA Astrophysics Data System (ADS)

Lai, Kevin; Cabrera, Julio; Vitale, Jonathan M.; Madhok, Jacquie; Tinker, Robert; Linn, Marcia C.

2016-08-01

Interpreting and creating graphs plays a critical role in scientific practice. The K-12 Next Generation Science Standards call for students to use graphs for scientific modeling, reasoning, and communication. To measure progress on this dimension, we need valid and reliable measures of graph understanding in science. In this research, we designed items to measure graph comprehension, critique, and construction and developed scoring rubrics based on the knowledge integration (KI) framework. We administered the items to over 460 middle school students. We found that the items formed a coherent scale and had good reliability using both item response theory and classical test theory. The KI scoring rubric showed that most students had difficulty linking graphs features to science concepts, especially when asked to critique or construct graphs. In addition, students with limited access to computers as well as those who speak a language other than English at home have less integrated understanding than others. These findings point to the need to increase the integration of graphing into science instruction. The results suggest directions for further research leading to comprehensive assessments of graph understanding.
Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

NASA Astrophysics Data System (ADS)

Beggrow, Elizabeth P.; Ha, Minsu; Nehm, Ross H.; Pearl, Dennis; Boone, William J.

2014-02-01

The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.
An experimental study of a museum-based, science PD programme's impact on teachers and their students

NASA Astrophysics Data System (ADS)

Aaron Price, C.; Chiu, A.

2018-06-01

We present results of an experimental study of an urban, museum-based science teacher PD programme. A total of 125 teachers and 1676 of their students in grades 4-8 were tested at the beginning and end of the school year in which the PD programme took place. Teachers and students were assessed on subject content knowledge and attitudes towards science, along with teacher classroom behaviour. Subject content questions were mostly taken from standardised state tests and literature, with an 'Explain:' prompt added to some items. Teachers in the treatment group showed a 7% gain in subject content knowledge over the control group. Students of teachers in the treatment group showed a 4% gain in subject content knowledge over the control group on multiple-choice items and an 11% gain on the constructed response items. There was no overall change in science attitudes of teachers or students over the control groups but we did find differences in teachers' reported self-efficacy and teaching anxiety levels, plus PD teachers reported doing more student-centered science teaching activities than the control group. All teachers came into the PD with high initial excitement, perhaps reflecting its context within an informal learning environment.
Science Library of Test Items. Volume Nine. Mastery Testing Programme. [Mastery Tests Series 1.] Tests M1-M13.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of the first 13 tests are provided. Skills to be tested include: (1) reading a table; (2) using a biological key; (3) identifying chemical symbols; (4) identifying parts of a human body; (5) reading a line graph; (6) identifying electronic and…
Measuring Science Instructional Practice: A Survey Tool for the Age of NGSS

NASA Astrophysics Data System (ADS)

Hayes, Kathryn N.; Lee, Christine S.; DiStefano, Rachelle; O'Connor, Dawn; Seitz, Jeffery C.

2016-03-01

Ambitious efforts are taking place to implement a new vision for science education in the United States, in both Next Generation Science Standards (NGSS)-adopted states and those states creating their own, often related, standards. In-service and pre-service teacher educators are involved in supporting teacher shifts in practice toward the new standards. With these efforts, it will be important to document shifts in science instruction toward the goals of NGSS and broader science education reform. Survey instruments are often used to capture instructional practices; however, existing surveys primarily measure inquiry based on previous definitions and standards and with a few exceptions, disregard key instructional practices considered outside the scope of inquiry. A comprehensive survey and a clearly defined set of items do not exist. Moreover, items specific to the NGSS Science and Engineering practices have not yet been tested. To address this need, we developed and validated a Science Instructional Practices survey instrument that is appropriate for NGSS and other related science standards. Survey construction was based on a literature review establishing key areas of science instruction, followed by a systematic process for identifying and creating items. Instrument validity and reliability were then tested through a procedure that included cognitive interviews, expert review, exploratory and confirmatory factor analysis (using independent samples), and analysis of criterion validity. Based on these analyses, final subscales include: Instigating an Investigation, Data Collection and Analysis, Critique, Explanation and Argumentation, Modeling, Traditional Instruction, Prior Knowledge, Science Communication, and Discourse.
Item response theory analysis of the Utrecht Work Engagement Scale for Students (UWES-S) using a sample of Japanese university and college students majoring medical science, nursing, and natural science.

PubMed

Tsubakita, Takashi; Shimazaki, Kazuyo; Ito, Hiroshi; Kawazoe, Nobuo

2017-10-30

The Utrecht Work Engagement Scale for Students has been used internationally to assess students' academic engagement, but it has not been analyzed via item response theory. The purpose of this study was to conduct an item response theory analysis of the Japanese version of the Utrecht Work Engagement Scale for Students translated by authors. Using a two-parameter model and Samejima's graded response model, difficulty and discrimination parameters were estimated after confirming the factor structure of the scale. The 14 items on the scale were analyzed with a sample of 3214 university and college students majoring medical science, nursing, or natural science in Japan. The preliminary parameter estimation was conducted with the two parameter model, and indicated that three items should be removed because there were outlier parameters. Final parameter estimation was conducted using the survived 11 items, and indicated that all difficulty and discrimination parameters were acceptable. The test information curve suggested that the scale better assesses higher engagement than average engagement. The estimated parameters provide a basis for future comparative studies. The results also suggested that a 7-point Likert scale is too broad; thus, the scaling should be modified to fewer graded scaling structure.
Science Library of Test Items. Volume Twelve. Mastery Testing Programme. [Mastery Tests Series 4.] Tests M39-M50.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of tests 39 through 50 include: (39) using a code; (40) naming the parts of a microscope; (41) calculating density and predicting flotation; (42) estimating metric length; (43) using SI symbols; (44) using s=vt; (45) applying a novel theory; (46)…
Science Library of Test Items. Volume Thirteen. Mastery Testing Program. [Mastery Tests Series 5.] Tests M51-M65.

ERIC Educational Resources Information Center

New South Wales Dept. of Education, Sydney (Australia).

As part of a series of tests to measure mastery of specific skills in the natural sciences, copies of tests 51 through 65 include: (51) interpreting atomic and mass numbers; (52) extrapolating from a geological map; (53) matching geological sections and maps; (54) identifying parts of the human eye; (55) identifying the functions of parts of a…
Estimation of Two-Parameter Logistic Item Response Curves. Research Report 83-1. Mathematical Sciences Technical Report No. 130.

ERIC Educational Resources Information Center

Tsutakawa, Robert K.

This paper presents a method for estimating certain characteristics of test items which are designed to measure ability, or knowledge, in a particular area. Under the assumption that ability parameters are sampled from a normal distribution, the EM algorithm is used to derive maximum likelihood estimates to item parameters of the two-parameter…
Can business and economics students perform elementary arithmetic?

PubMed

Standing, Lionel G; Sproule, Robert A; Leung, Ambrose

2006-04-01

Business and economics majors (N=146) were tested on the D'Amore Test of Elementary Arithmetic, which employs third-grade test items from 1932. Only 40% of the subjects passed the test by answering 10 out of 10 items correctly. Self-predicted scores were a good predictor of actual scores, but performance was not associated with demographic variables, grades in calculus courses, liking for science or computers, or mathematics anxiety. Scores decreased over the subjects' initial years on campus. The hardest test item, with an error rate of 23%, required the subject to evaluate (36 x 7) + (33 x 7). The results are similar to those of Standing in 2006, despite methodological changes intended to maximize performance.
High school science teacher perceptions of the science proficiency testing as mandated by the State of Ohio Board of Education

NASA Astrophysics Data System (ADS)

Jeffery, Samuel Shird

There is a correlation between the socioeconomic status of secondary schools and scores on the State of Ohio's mandated secondary science proficiency tests. In low scoring schools many reasons effectively explain the low test scores as a result of the low socioeconomics. For example, one reason may be that many students are working late hours after school to help with family finances; parents may simply be too busy providing family income to realize the consequences of the testing program. There are many other personal issues students face that may cause them to score poorly an the test. The perceptions of their teachers regarding the science proficiency test program may be one significant factor. These teacher perceptions are the topic of this study. Two sample groups ware established for this study. One group was science teachers from secondary schools scoring 85% or higher on the 12th grade proficiency test in the academic year 1998--1999. The other group consisted of science teachers from secondary schools scoring 35% or less in the same academic year. Each group of teachers responded to a survey instrument that listed several items used to determine teachers' perceptions of the secondary science proficiency test. A significant difference in the teacher' perceptions existed between the two groups. Some of the ranked items on the form include teachers' opinions of: (1) Teaching to the tests; (2) School administrators' priority placed on improving average test scores; (3) Teacher incentive for improving average test scores; (4) Teacher teaching style change as a result of the testing mandate; (5) Teacher knowledge of State curriculum model; (6) Student stress as a result of the high-stakes test; (7) Test cultural bias; (8) The tests in general.
Primary Teachers' Changing Attitudes and Cognition during a Two-Year Science In-Service Programme and Their Effect on Pupils. Research Report

ERIC Educational Resources Information Center

Jarvis, Tina; Pell, Anthony

2004-01-01

Changes in 70 teachers' confidence, attitudes and science understanding were tested before and after a major in-service programme. Attitudes were assessed using a 49-item Likert-scale test that probed attitudes to practical science teaching and in-service training. Multi-choice and open-ended questions measured understanding of electricity;…
A Computer-Based Instrument That Identifies Common Science Misconceptions

ERIC Educational Resources Information Center

Larrabee, Timothy G.; Stein, Mary; Barman, Charles

2006-01-01

This article describes the rationale for and development of a computer-based instrument that helps identify commonly held science misconceptions. The instrument, known as the Science Beliefs Test, is a 47-item instrument that targets topics in chemistry, physics, biology, earth science, and astronomy. The use of an online data collection system…
Assessing the efficacy of the Measure of Understanding of Macroevolution as a valid tool for undergraduate non-science majors

NASA Astrophysics Data System (ADS)

Romine, William Lee; Walter, Emily Marie

2014-11-01

Efficacy of the Measure of Understanding of Macroevolution (MUM) as a measurement tool has been a point of contention among scholars needing a valid measure for knowledge of macroevolution. We explored the structure and construct validity of the MUM using Rasch methodologies in the context of a general education biology course designed with an emphasis on macroevolution content. The Rasch model was utilized to quantify item- and test-level characteristics, including dimensionality, reliability, and fit with the Rasch model. Contrary to previous work, we found that the MUM provides a valid, reliable, and unidimensional scale for measuring knowledge of macroevolution in introductory non-science majors, and that its psychometric behavior does not exhibit large changes across time. While we found that all items provide productive measurement information, several depart substantially from ideal behavior, warranting a collective effort to improve these items. Suggestions for improving the measurement characteristics of the MUM at the item and test levels are put forward and discussed.
Evaluting the Validity of Technology-Enhanced Educational Assessment Items and Tasks: An Emprical Approach to Studying Item Features and Scoring Rubrics

ERIC Educational Resources Information Center

Thomas, Ally

2016-01-01

With the advent of the newly developed Common Core State Standards and the Next Generation Science Standards, innovative assessments, including technology-enhanced items and tasks, will be needed to meet the challenges of developing valid and reliable assessments in a world of computer-based testing. In a recent critique of the next generation…
The development and validation of a test of science critical thinking for fifth graders.

PubMed

Mapeala, Ruslan; Siew, Nyet Moi

2015-01-01

The paper described the development and validation of the Test of Science Critical Thinking (TSCT) to measure the three critical thinking skill constructs: comparing and contrasting, sequencing, and identifying cause and effect. The initial TSCT consisted of 55 multiple choice test items, each of which required participants to select a correct response and a correct choice of critical thinking used for their response. Data were obtained from a purposive sampling of 30 fifth graders in a pilot study carried out in a primary school in Sabah, Malaysia. Students underwent the sessions of teaching and learning activities for 9 weeks using the Thinking Maps-aided Problem-Based Learning Module before they answered the TSCT test. Analyses were conducted to check on difficulty index (p) and discrimination index (d), internal consistency reliability, content validity, and face validity. Analysis of the test-retest reliability data was conducted separately for a group of fifth graders with similar ability. Findings of the pilot study showed that out of initial 55 administered items, only 30 items with relatively good difficulty index (p) ranged from 0.40 to 0.60 and with good discrimination index (d) ranged within 0.20-1.00 were selected. The Kuder-Richardson reliability value was found to be appropriate and relatively high with 0.70, 0.73 and 0.92 for identifying cause and effect, sequencing, and comparing and contrasting respectively. The content validity index obtained from three expert judgments equalled or exceeded 0.95. In addition, test-retest reliability showed good, statistically significant correlations ([Formula: see text]). From the above results, the selected 30-item TSCT was found to have sufficient reliability and validity and would therefore represent a useful tool for measuring critical thinking ability among fifth graders in primary science.

Examining the Effects of Science Manipulatives on Achievement, Attitudes, and Journal Writing of Elementary Science Students.

ERIC Educational Resources Information Center

Frederick, Lynda R.; Shaw, Edward L., Jr.

This study examined several aspects of elementary science students' achievement, attitudes, and journal writing in conjunction with an Alabama Hands-on Activity Science Program (HASP) grant utilizing the Full Option Science System (FOSS) kit. The sample of 56 fourth grade students in two classes was administered a 15-item pretest and post-test.…
ORES - Objective Referenced Evaluation in Science.

ERIC Educational Resources Information Center

Shaw, Terry

Science process skills considered important in making decisions and solving problems include: observing, classifying, measuring, using numbers, using space/time relationships, communicating, predicting, inferring, manipulating variables, making operational definitions, forming hypotheses, interpreting data, and experimenting. This 60-item test,…
Welch Science Process Inventory, Form D. Revised.

ERIC Educational Resources Information Center

Welch, Wayne W.

This inventory, developed for use with the Harvard Project Physics curriculum, consists of 135 two-choice (agree-disagree) items. Items cover perceptions of the role of scientists, the nature and functions of theories, underlying assumptions made by scientists, and other aspects of the scientific process. The test is suitable for high school…
The Generalized Multilevel Facets Model for Longitudinal Data

ERIC Educational Resources Information Center

Hung, Lai-Fa; Wang, Wen-Chung

2012-01-01

In the human sciences, ability tests or psychological inventories are often repeatedly conducted to measure growth. Standard item response models do not take into account possible autocorrelation in longitudinal data. In this study, the authors propose an item response model to account for autocorrelation. The proposed three-level model consists…
Development of the Exam of GeoloGy Standards, EGGS, to Measure Students' Conceptual Understanding of Geology Concepts

NASA Astrophysics Data System (ADS)

Guffey, S. K.; Slater, T. F.; Slater, S. J.

2017-12-01

Discipline-based geoscience education researchers have considerable need for criterion-referenced, easy-to-administer and easy-to-score, conceptual diagnostic surveys for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing discipline-based science education research to improve teaching and learning across the geosciences, this study establishes the reliability and validity of a 28-item, multiple-choice, pre- and post- Exam of GeoloGy Standards, hereafter simply called EGGS. The content knowledge EGGS addresses is based on 11 consensus concepts derived from a systematic, thematic analysis of the overlapping ideas presented in national science education reform documents including the Next Generation Science Standards, the AAAS Benchmarks for Science Literacy, the Earth Science Literacy Principles, and the NRC National Science Education Standards. Using community agreed upon best-practices for creating, field-testing, and iteratively revising modern multiple-choice test items using classical item analysis techniques, EGGS emphasizes natural student language over technical scientific vocabulary, leverages illustrations over students' reading ability, specifically targets students' misconceptions identified in the scholarly literature, and covers the range of topics most geology educators expect general education students to know at the end of their formal science learning experiences. The current version of EGGS is judged to be valid and reliable with college-level, introductory science survey students based on both standard quantitative and qualitative measures, including extensive clinical interviews with targeted students and systematic expert review.
Performance on large-scale science tests: Item attributes that may impact achievement scores

NASA Astrophysics Data System (ADS)

Gordon, Janet Victoria

Significant differences in achievement among ethnic groups persist on the eighth-grade science Washington Assessment of Student Learning (WASL). The WASL measures academic performance in science using both scenario and stand-alone question types. Previous research suggests that presenting target items connected to an authentic context, like scenario question types, can increase science achievement scores especially in underrepresented groups and thus help to close the achievement gap. The purpose of this study was to identify significant differences in performance between gender and ethnic subgroups by question type on the 2005 eighth-grade science WASL. MANOVA and ANOVA were used to examine relationships between gender and ethnic subgroups as independent variables with achievement scores on scenario and stand-alone question types as dependent variables. MANOVA revealed no significant effects for gender, suggesting that the 2005 eighth-grade science WASL was gender neutral. However, there were significant effects for ethnicity. ANOVA revealed significant effects for ethnicity and ethnicity by gender interaction in both question types. Effect sizes were negligible for the ethnicity by gender interaction. Large effect sizes between ethnicities on scenario question types became moderate to small effect sizes on stand-alone question types. This indicates the score advantage the higher performing subgroups had over the lower performing subgroups was not as large on stand-alone question types compared to scenario question types. A further comparison examined performance on multiple-choice items only within both question types. Similar achievement patterns between ethnicities emerged; however, achievement patterns between genders changed in boys' favor. Scenario question types appeared to register differences between ethnic groups to a greater degree than stand-alone question types. These differences may be attributable to individual differences in cognition, characteristics of test items themselves and/or opportunities to learn. Suggestions for future research are made.
Establishing Reliability and Validity of the Criterion Referenced Exam of GeoloGy Standards EGGS

NASA Astrophysics Data System (ADS)

Guffey, S. K.; Slater, S. J.; Slater, T. F.; Schleigh, S.; Burrows, A. C.

2016-12-01

Discipline-based geoscience education researchers have considerable need for a criterion-referenced, easy-to-administer and -score conceptual diagnostic survey for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing education research across the geosciences, we are continuing to rigorously and systematically work to firmly establish the reliability and validity of the recently released Exam of GeoloGy Standards, EGGS. In educational testing, reliability refers to the consistency or stability of test scores whereas validity refers to the accuracy of the inferences or interpretations one makes from test scores. There are several types of reliability measures being applied to the iterative refinement of the EGGS survey, including test-retest, alternate form, split-half, internal consistency, and interrater reliability measures. EGGS rates strongly on most measures of reliability. For one, Cronbach's alpha provides a quantitative index indicating the extent to which if students are answering items consistently throughout the test and measures inter-item correlations. Traditional item analysis methods further establish the degree to which a particular item is reliably assessing students is actually quantifiable, including item difficulty and item discrimination. Validity, on the other hand, is perhaps best described by the word accuracy. For example, content validity is the to extent to which a measurement reflects the specific intended domain of the content, stemming from judgments of people who are either experts in the testing of that particular content area or are content experts. Perhaps more importantly, face validity is a judgement of how representative an instrument is reflective of the science "at face value" and refers to the extent to which a test appears to measure a the targeted scientific domain as viewed by laypersons, examinees, test users, the public, and other invested stakeholders.
The Development and Validation of a Behaviorally Defined Interest Instrument for Science.

ERIC Educational Resources Information Center

Butzow, John W., Jr.

A semantic differential (SD) instrument, modified by replacing words or noun phrases with phrases describing a behavior, was administered to male freshmen students. Six items discriminated between two groups, 97 science majors and 161 non-science majors, on three axes, labelled as evaluation, potency, and activity. To test whether the instrument…
Progress Monitoring in Grade 5 Science for Low Achievers

ERIC Educational Resources Information Center

Vannest, Kimberly J.; Parker, Richard; Dyer, Nicole

2011-01-01

This article presents procedures and results from a 2-year project developing science key vocabulary (KV) short tests suitable for progress monitoring Grade 5 science in Texas public schools using computer-generated, -administered, and -scored assessments. KV items included KV definitions and important usages in a multiple-choice cloze format. A…
The Level of Test-Wiseness for the Students of Arts and Science Faculty at Sharourah and Its Relationship with Some Variables

ERIC Educational Resources Information Center

Otoum, Abedalqader; Khalaf, Hisham Bani; Bajbeer, Abedalqader; Hamad, Hassan Bani

2015-01-01

This study aimed to identify the level of using Test-wiseness strategies for the students of arts and sciences Faculty at Sharourah and its relationship with some variables. a questionnaire was designed which consisted of (29) items measuring three domains of Test-wiseness strategies. It was applied on a sample which consisted of (299) students.…
The Student Actions Coding Sheet (SACS): An instrument for illuminating the shifts toward student-centered science classrooms

NASA Astrophysics Data System (ADS)

Erdogan, Ibrahim; Campbell, Todd; Hashidah Abd-Hamid, Nor

2011-07-01

This study describes the development of an instrument to investigate the extent to which student-centered actions are occurring in science classrooms. The instrument was developed through the following five stages: (1) student action identification, (2) use of both national and international content experts to establish content validity, (3) refinement of the item pool based on reviewer comments, (4) pilot testing of the instrument, and (5) statistical reliability and item analysis leading to additional refinement and finalization of the instrument. In the field test, the instrument consisted of 26 items separated into four categories originally derived from student-centered instruction literature and used by the authors to sort student actions in previous research. The SACS was administered across 22 Grade 6-8 classrooms by 22 groups of observers, with a total of 67 SACS ratings completed. The finalized instrument was found to be internally consistent, with acceptable estimates from inter-rater intraclass correlation reliability coefficients at the p < 0.01 level. After the final stage of development, the SACS instrument consisted of 24 items separated into three categories, which aligned with the factor analysis clustering of the items. Additionally, concurrent validity of the SACS was established with the Reformed Teaching Observation Protocol. Based on the analyses completed, the SACS appears to be a useful instrument for inclusion in comprehensive assessment packages for illuminating the extent to which student-centered actions are occurring in science classrooms.
Science Cafés: Engaging Scientists and Community through Health and Science Dialogue

PubMed Central

Ahmed, Syed; Connors, Emily R.; Kissack, Anne; Franco, Zeno

2014-01-01

Abstract Engagement of the community through informal dialogue with researchers and physicians around health and science topics is an important avenue to build understanding and affect health and science literacy. Science Cafés are one model for this casual interchange; however the impact of this approach remains under researched. The Community Engagement Key Function of the Clinical and Translational Science Institute of Southeast Wisconsin hosted a series of Science Cafés in which topics were collaboratively decided upon by input from the community. Topics ranged from Personalized Medicine to Alzheimer's and Dementia to BioMedical Innovation. A systematic evaluation of the impact of Science Cafés on attendees' self‐confidence related to five health and scientific literacy concepts showed statistically significant increases across all items (Mean differences between mean retrospective pre‐scores and post‐scores, one tailed, paired samples t‐test, n = 141, p < .0001 for all items). The internal consistency of the five health and scientific literacy items was excellent (n = 126, α = 0.87). Thematic analysis of attendees' comments provides more nuance about positive experience and suggestions for possible improvements. The evaluation provides important evidence supporting the effectiveness of brief, casual dialogue as a way to increase the public's self‐rated confidence in health and science topics. PMID:24716626
Science cafés: engaging scientists and community through health and science dialogue.

PubMed

Ahmed, Syed; DeFino, Mia C; Connors, Emily R; Kissack, Anne; Franco, Zeno

2014-06-01

Engagement of the community through informal dialogue with researchers and physicians around health and science topics is an important avenue to build understanding and affect health and science literacy. Science Cafés are one model for this casual interchange; however the impact of this approach remains under researched. The Community Engagement Key Function of the Clinical and Translational Science Institute of Southeast Wisconsin hosted a series of Science Cafés in which topics were collaboratively decided upon by input from the community. Topics ranged from Personalized Medicine to Alzheimer's and Dementia to BioMedical Innovation. A systematic evaluation of the impact of Science Cafés on attendees' self-confidence related to five health and scientific literacy concepts showed statistically significant increases across all items (Mean differences between mean retrospective pre-scores and post-scores, one tailed, paired samples t-test, n=141, p<.0001 for all items). The internal consistency of the five health and scientific literacy items was excellent (n=126, α=0.87). Thematic analysis of attendees' comments provides more nuance about positive experience and suggestions for possible improvements. The evaluation provides important evidence supporting the effectiveness of brief, casual dialogue as a way to increase the public's self-rated confidence in health and science topics. © 2014 Wiley Periodicals, Inc.
GED Items. Volume 5, Numbers 1-6.

ERIC Educational Resources Information Center

GED Items, 1988

1988-01-01

The first of six issues of the GED Items Newsletter publishied in 1988 contains articles on General Educational Development (GED) mathematics instruction, suggestions for teaching writing, and public relations and marketing. Issue 2 has articles on GED science instruction, GED for Marines, holistic scoring, and a review of the new GED tests.…
Local Development of Subject Area Item Banks.

ERIC Educational Resources Information Center

Ward, Annie W.; Barlow, Gene

1984-01-01

It is feasible for school districts to develop and use subject area tests as reliable as those previously available only from commercial publishers. Three projects in local item development in a large school district are described. The first involved only Algebra 1. The second involved life science and career education at the elementary level; and…
Enhancing the Accessibility of High School Science Tests: A Multistate Experiment

ERIC Educational Resources Information Center

Kettler, Ryan J.; Dickenson, Tammiee S.; Bennett, Heather L.; Morgan, Grant B.; Gilmore, Joanna A.; Beddow, Peter A.; Swaffield, Suzanne; Turner, Linda; Herrera, Bill; Turner, Charlene; Palmer, Porter W.

2012-01-01

This study was inspired by the final regulations for the No Child Left Behind Act (NCLB) indicating that each state has the option to develop a new assessment for students whose disabilities have kept them from obtaining proficiency. Sets of high school science achievement items were enhanced for the new test. A 3-by-2, within subjects,…
Applying Item Response Theory methods to design a learning progression-based science assessment

NASA Astrophysics Data System (ADS)

Chen, Jing

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all the defined boundaries. This ensures the accuracy of the classification. Third, when item threshold parameters vary a bit, the scoring rubrics and the items need to be reviewed to make the threshold parameters similar across items. This is because one important design criterion of the learning progression-based items is that ideally, a student should be at the same level across items, which means that the item threshold parameters (d1, d 2 and d3) should be similar across items. To design a learning progression-based science assessment, we need to understand whether the assessment measures a single construct or several constructs and how items are associated with the constructs being measured. Results from dimension analyses indicate that items of different carbon transforming processes measure different aspects of the carbon cycle construct. However, items of different practices assess the same construct. In general, there are high correlations among different processes or practices. It is not clear whether the strong correlations are due to the inherent links among these process/practice dimensions or due to the fact that the student sample does not show much variation in these process/practice dimensions. Future data are needed to examine the dimensionalities in terms of process/practice in detail. Finally, based on item characteristics analysis, recommendations are made to write more discriminative CR items and better OMC, MTF options. Item writers can follow these recommendations to write better learning progression-based items.
Middle School Students' Conceptual Learning from the Implementation of a New NSF Supported Curriculum: Interactions in Physical Science[TM

ERIC Educational Resources Information Center

Eick, Charles J.; Dias, Michael; Smith, Nancy R. Cook

2009-01-01

A new National Science Foundation supported curriculum, Interactions in Physical Science[TM], was evaluated on students' conceptual change in the twelve concept areas of the national physical science content standard (B) for grades 5-8. Eighth grade students (N = 66) were evaluated pre and post on a 31-item multiple-choice test of conceptual…
Changes in Participants’ Scientific Attitudes and Epistemological Beliefs During an Astronomical Citizen Science Project

NASA Astrophysics Data System (ADS)

Price, Aaron

2012-01-01

Citizen science projects offer opportunities for non-scientists to take part in scientific research. While their contribution to scientific data collection has been well documented, there is limited research on changes that may occur to their volunteer participants. In this study, we investigated (1) how volunteers’ attitudes towards science and beliefs in the nature of science changed over six months of participation in an astronomy-themed citizen science project and (2) how the level of project participation accounted for these changes. To measure attitudes towards science and beliefs about the nature of science, identical pre- and post-tests were used. We used pre-test data from 1,375 participants and post-test data collected from 175 participants. Responses were analyzed using the Rasch Rating Scale Model. The pre-test sample was used to create the Rasch scales for the two scientific literacy measures. For the pre/post-test comparisons, data from those who completed both tests were used. Fourteen participants who took the pre/post-tests were interviewed. Results show that overall scientific attitudes did not change, p = .812. However, we did find significant changes related towards two scientific attitude items about science in the news (positive change; p < .001, p < .05) and one related to scientific self-efficacy (negative change, p < .05). These changes were related to the participants’ social activity in the project. Beliefs in the nature of science significantly increased between the pre- and post-tests, p = .014. Relative positioning of individual items on the belief scale did not change much and this change was not related to any of our recorded project activity variables. The interviews suggest that the social aspect of the project is important to participants and the change in self-efficacy is not due to a lowering of esteem but rather a greater appreciation for what they have yet to learn.
Testing primary-school children's understanding of the nature of science.

PubMed

Koerber, Susanne; Osterhaus, Christopher; Sodian, Beate

2015-03-01

Understanding the nature of science (NOS) is a critical aspect of scientific reasoning, yet few studies have investigated its developmental beginnings and initial structure. One contributing reason is the lack of an adequate instrument. Two studies assessed NOS understanding among third graders using a multiple-select (MS) paper-and-pencil test. Study 1 investigated the validity of the MS test by presenting the items to 68 third graders (9-year-olds) and subsequently interviewing them on their underlying NOS conception of the items. All items were significantly related between formats, indicating that the test was valid. Study 2 applied the same instrument to a larger sample of 243 third graders, and their performance was compared to a multiple-choice (MC) version of the test. Although the MC format inflated the guessing probability, there was a significant relation between the two formats. In summary, the MS format was a valid method revealing third graders' NOS understanding, thereby representing an economical test instrument. A latent class analysis identified three groups of children with expertise in qualitatively different aspects of NOS, suggesting that there is not a single common starting point for the development of NOS understanding; instead, multiple developmental pathways may exist. © 2014 The British Psychological Society.

Measuring Constructs in Family Science: How Can Item Response Theory Improve Precision and Validity?

PubMed Central

Gordon, Rachel A.

2014-01-01

This article provides family scientists with an understanding of contemporary measurement perspectives and the ways in which item response theory (IRT) can be used to develop measures with desired evidence of precision and validity for research uses. The article offers a nontechnical introduction to some key features of IRT, including its orientation toward locating items along an underlying dimension and toward estimating precision of measurement for persons with different levels of that same construct. It also offers a didactic example of how the approach can be used to refine conceptualization and operationalization of constructs in the family sciences, using data from the National Longitudinal Survey of Youth 1979 (n = 2,732). Three basic models are considered: (a) the Rasch and (b) two-parameter logistic models for dichotomous items and (c) the Rating Scale Model for multicategory items. Throughout, the author highlights the potential for researchers to elevate measurement to a level on par with theorizing and testing about relationships among constructs. PMID:25663714
The development and validation of the Self-Efficacy Beliefs about Equitable Science Teaching and learning instrument for prospective elementary teachers

NASA Astrophysics Data System (ADS)

Ritter, Jennifer M.

1999-12-01

The purpose of this study was to develop, validate and establish the reliability of an instrument to assess the self-efficacy beliefs of prospective elementary teachers with regards to science teaching and learning for diverse learners. The study used Bandura's theoretical framework, in that the instrument would use the self-efficacy construct to explore the beliefs of prospective elementary science teachers with regards to science teaching and learning to diverse learners: specifically the two dimensions of self-efficacy beliefs defined by Bandura (1977): personal self-efficacy and outcome expectancy. A seven step plan was designed and followed in the process of developing the instrument, which was titled the Self-Efficacy Beliefs about Equitable Science Teaching or SEBEST. Diverse learners as recognized by Science for All Americans (1989) are "those who in the past who have largely been bypassed in science and mathematics education: ethnic and language minorities and girls" (p. xviii). That definition was extended by this researcher to include children from low socioeconomic backgrounds based on the research by Gomez and Tabachnick (1992). The SEBEST was administered to 226 prospective elementary teachers at The Pennsylvania State University. Using the results from factor analyses, Coefficient Alpha, and Chi-Square a 34 item instrument was found to achieve the greatest balance across the construct validity, reliability and item balance with the content matrix. The 34 item SEBEST was found to load purely on four factors across the content matrix thus providing evidence construct validity. The Coefficient Alpha reliability for the 34 item SEBEST was .90 and .82 for the PSE sub-scale and .78 for the OE sub-scale. A Chi-Square test (X2 = 2.7 1, df = 7, p > .05) was used to confirm that the 34 items were balanced across the Personal Self-Efficacy/Outcome Expectancy and Ethnicity/LanguageMinority/Gender Socioeconomic Status/dimensions of the content matrix. Based on the standardized development procedures used and the associated evidence, the SEBEST appears to be a content and construct valid instrument, with high internal reliability and moderate test-retest reliability qualities, for use with prospective elementary teachers to assess self-efficacy beliefs for teaching and learning science for diverse learners.
Assessing Classroom Learning: How Students Use Their Knowledge and Experience to Answer Classroom Achievement Test Questions in Science and Social Studies.

ERIC Educational Resources Information Center

Nuthall, Graham; Alton-Lee, Adrienne

1995-01-01

Observational studies of student learning from classroom experience in science and social studies in elementary and middle school classrooms were carried out with 14 students. A model is described that explains how students use multilayered episodic and semantic memory for learning experience and related knowledge to answer achievement test items.…
Gender and Minority Achievement Gaps in Science in Eighth Grade: Item Analyses of Nationally Representative Data. Research Report. ETS RR-17-36

ERIC Educational Resources Information Center

Qian, Xiaoyu; Nandakumar, Ratna; Glutting, Joseoph; Ford, Danielle; Fifield, Steve

2017-01-01

In this study, we investigated gender and minority achievement gaps on 8th-grade science items employing a multilevel item response methodology. Both gaps were wider on physics and earth science items than on biology and chemistry items. Larger gender gaps were found on items with specific topics favoring male students than other items, for…
Fifth Graders' Learning About Simple Machines Through Engineering Design-Based Instruction Using LEGO™ Materials

NASA Astrophysics Data System (ADS)

Marulcu, Ismail; Barnett, Mike

2013-10-01

This study is part of a 5-year National Science Foundation-funded project, Transforming Elementary Science Learning Through LEGO™ Engineering Design. In this study, we report on the successes and challenges of implementing an engineering design-based and LEGO™-oriented unit in an urban classroom setting and we focus on the impact of the unit on students' content understanding of simple machines. The LEGO™ engineering-based simple machines module, which was developed for fifth graders by our research team, was implemented in an urban school in a large city in the Northeastern region of the USA. Thirty-three fifth grade students participated in the study, and they showed significant growth in content understanding. We measured students' content knowledge by using identical paper tests and semistructured interviews before and after instruction. Our paired t test analysis results showed that students significantly improved their test and interview scores (t = -3.62, p < 0.001 for multiple-choice items and t = -9.06, p < 0.000 for the open-ended items in the test and t = -12.11, p < 0.000 for the items in interviews). We also identified several alternative conceptions that are held by students on simple machines.
Anatomy of a physics test: Validation of the physics items on the Texas Assessment of Knowledge and Skills

NASA Astrophysics Data System (ADS)

Marshall, Jill A.; Hagedorn, Eric A.; O'Connor, Jerry

2009-06-01

We report the results of an analysis of the Texas Assessment of Knowledge and Skills (TAKS) designed to determine whether the TAKS is a valid indicator of whether students know and can do physics at the level necessary for success in future coursework, STEM careers, and life in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam, performing full-information factor analysis, calculating classical test indices, and determining each item's response curve using item response theory. Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing.
Measuring science or religion? A measurement analysis of the National Science Foundation sponsored science literacy scale 2006-2010.

PubMed

Roos, J Micah

2014-10-01

High scientific literacy is widely considered a public good. Methods of assessing public scientific knowledge or literacy are equally important. In an effort to measure lay scientific literacy in the United States, the National Science Foundation (NSF) science literacy scale has been a part of the last three waves of the General Social Survey. However, there has been debate over the validity of some survey items as indicators of science knowledge. While many researchers treat the NSF science scale as measuring a single dimension, previous work (Bann and Schwerin, 2004; Miller, 1998, 2004) suggests a bidimensional structure. This paper hypothesizes and tests a new measurement model for the NSF science knowledge scale and finds that two items about evolution and the big bang are more measures of a religious belief dimension termed "Young Earth Worldview" than they are measures of scientific knowledge. Results are replicated in seven samples. © The Author(s) 2013.
TESTING AND EVALUATION IN THE BIOLOGICAL SCIENCES.

ERIC Educational Resources Information Center

NELSON, CLARENCE H.

THIS REPORT OF THE CUEBS PANEL ON EDUCATION AND TESTING SERVES AS A RESOURCE FOR THE INSTRUCTOR PREPARING COURSE EXAMINATIONS. THE MAJOR TOPICS DISCUSSED ARE (1) THE PROCEDURES IN PREPARING AN ACHIEVEMENT TEST, (2) THE CATEGORIZATION AND CODING OF TEST ITEMS, AND (3) THE ADVANTAGES AND LIMITATIONS OF VARIOUS TESTING PROCEDURES. OVER 1300 OBJECTIVE…
Validation of Physics Standardized Test Items

NASA Astrophysics Data System (ADS)

Marshall, Jill

2008-10-01

The Texas Physics Assessment Team (TPAT) examined the Texas Assessment of Knowledge and Skills (TAKS) to determine whether it is a valid indicator of physics preparation for future course work and employment, and of the knowledge and skills needed to act as an informed citizen in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam using standard statistical methods employed by test developers (factor analysis and Item Response Theory). Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing..
The Act of Answering Questions Elicited Differentiated Responses in a Concealed Information Test.

PubMed

Otsuka, Takuro; Mizutani, Mitsuyoshi; Yagi, Akihiro; Katayama, Jun'ichi

2018-04-17

The concealed information test (CIT), a psychophysiological detection of deception test, compares physiological responses between crime-related and crime-unrelated items. In previous studies, whether the act of answering questions affected physiological responses was unclear. This study examined effects of both question-related and answer-related processes on physiological responses. Twenty participants received a modified CIT, in which the interval between presentation of questions and answering them was 27 s. Differentiated respiratory movements and cardiovascular responses between items were observed for both questions (items) and answers, while differentiated skin conductance response was observed only for questions. These results suggest that physiological responses to questions reflected orientation to a crime-related item, while physiological responses during answering reflected inhibition of psychological arousal caused by orienting. Regarding the CIT's accuracy, participants' perception of the questions themselves more strongly influenced physiological responses than answering them. © 2018 American Academy of Forensic Sciences.
Validating the Assessment for Measuring Indonesian Secondary School Students Performance in Ecology

NASA Astrophysics Data System (ADS)

Rachmatullah, A.; Roshayanti, F.; Ha, M.

2017-09-01

The aims of this current study are validating the American Association for the Advancement of Science (AAAS) Ecology assessment and examining the performance of Indonesian secondary school students on the assessment. A total of 611 Indonesian secondary school students (218 middle school students and 393 high school students) participated in the study. Forty-five items of AAAS assessment in the topic of Interdependence in Ecosystems were divided into two versions which every version has 21 similar items. Linking item method was used as the method to combine those two versions of assessment and further Rasch analyses were utilized to validate the instrument. Independent sample t-test was also run to compare the performance of Indonesian students and American students based on the mean of item difficulty. We found that from the total of 45 items, three items were identified as misfitting items. Later on, we also found that both Indonesian middle and high school students were significantly lower performance with very large and medium effect size compared to American students. We will discuss our findings in the regard of validation issue and the connection to Indonesian student’s science literacy.
Reliability of a science admission test (HAM-Nat) at Hamburg medical school.

PubMed

Hissbach, Johanna; Klusmann, Dietrich; Hampe, Wolfgang

2011-01-01

The University Hospital in Hamburg (UKE) started to develop a test of knowledge in natural sciences for admission to medical school in 2005 (Hamburger Auswahlverfahren für Medizinische Studiengänge, Naturwissenschaftsteil, HAM-Nat). This study is a step towards establishing the HAM-Nat. We are investigating parallel forms reliability, the effect of a crash course in chemistry on test results, and correlations of HAM-Nat test results with a test of scientific reasoning (similar to a subtest of the "Test for Medical Studies", TMS). 316 first-year students participated in the study in 2007. They completed different versions of the HAM-Nat test which consisted of items that had already been used (HN2006) and new items (HN2007). Four weeks later half of the participants were tested on the HN2007 version of the HAM-Nat again, while the other half completed the test of scientific reasoning. Within this four week interval students were offered a five day chemistry course. Parallel forms reliability for four different test versions ranged from r(tt)=.53 to r(tt)=.67. The retest reliabilities of the HN2007 halves were r(tt)=.54 and r(tt )=.61. Correlations of the two HAM-Nat versions with the test of scientific reasoning were r=.34 und r=.21. The crash course in chemistry had no effect on HAM-Nat scores. The results suggest that further versions of the test of natural sciences will not easily conform to the standards of internal consistency, parallel-forms reliability and retest reliability. Much care has to be taken in order to assemble items which could be used interchangeably for the construction of new test versions. The test of scientific reasoning and the HAM-Nat are tapping different constructs. Participation in a chemistry course did not improve students' achievement, probably because the content of the course was not coordinated with the test and many students lacked of motivation to do well in the second test.
Reliability of a science admission test (HAM-Nat) at Hamburg medical school

PubMed Central

Hissbach, Johanna; Klusmann, Dietrich; Hampe, Wolfgang

2011-01-01

Objective: The University Hospital in Hamburg (UKE) started to develop a test of knowledge in natural sciences for admission to medical school in 2005 (Hamburger Auswahlverfahren für Medizinische Studiengänge, Naturwissenschaftsteil, HAM-Nat). This study is a step towards establishing the HAM-Nat. We are investigating parallel forms reliability, the effect of a crash course in chemistry on test results, and correlations of HAM-Nat test results with a test of scientific reasoning (similar to a subtest of the "Test for Medical Studies", TMS). Methods: 316 first-year students participated in the study in 2007. They completed different versions of the HAM-Nat test which consisted of items that had already been used (HN2006) and new items (HN2007). Four weeks later half of the participants were tested on the HN2007 version of the HAM-Nat again, while the other half completed the test of scientific reasoning. Within this four week interval students were offered a five day chemistry course. Results: Parallel forms reliability for four different test versions ranged from rtt=.53 to rtt=.67. The retest reliabilities of the HN2007 halves were rtt=.54 and rtt =.61. Correlations of the two HAM-Nat versions with the test of scientific reasoning were r=.34 und r=.21. The crash course in chemistry had no effect on HAM-Nat scores. Conclusions: The results suggest that further versions of the test of natural sciences will not easily conform to the standards of internal consistency, parallel-forms reliability and retest reliability. Much care has to be taken in order to assemble items which could be used interchangeably for the construction of new test versions. The test of scientific reasoning and the HAM-Nat are tapping different constructs. Participation in a chemistry course did not improve students’ achievement, probably because the content of the course was not coordinated with the test and many students lacked of motivation to do well in the second test. PMID:21866246
Zambian pre-service junior high school science teachers' chemical reasoning and ability

NASA Astrophysics Data System (ADS)

Banda, Asiana

The purpose of this study was two-fold: examine junior high school pre-service science teachers' chemical reasoning; and establish the extent to which the pre-service science teachers' chemical abilities explain their chemical reasoning. A sample comprised 165 junior high school pre-service science teachers at Mufulira College of Education in Zambia. There were 82 males and 83 females. Data were collected using a Chemical Concept Reasoning Test (CCRT). Pre-service science teachers' chemical reasoning was established through qualitative analysis of their responses to test items. The Rasch Model was used to determine the pre-service teachers' chemical abilities and item difficulty. Results show that most pre-service science teachers had incorrect chemical reasoning on chemical concepts assessed in this study. There was no significant difference in chemical understanding between the Full-Time and Distance Education pre-service science teachers, and between second and third year pre-service science teachers. However, there was a significant difference in chemical understanding between male and female pre-service science teachers. Male pre-service science teachers showed better chemical understanding than female pre-service science teachers. The Rasch model revealed that the pre-service science teachers had low chemical abilities, and the CCRT was very difficult for this group of pre-service science teachers. As such, their incorrect chemical reasoning was attributed to their low chemical abilities. These results have implications on science teacher education, chemistry teaching and learning, and chemical education research.
Gender differences in national assessment of educational progress science items: What does i don't know really mean?

NASA Astrophysics Data System (ADS)

Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth

The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.
Item response theory and the measurement of motor behavior.

PubMed

Safrit, M J; Cohen, A S; Costa, M G

1989-12-01

Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.
Measuring the Test-Wiseness of Medical Students.

ERIC Educational Resources Information Center

Harvill, Leo M.

The objectives for this study were to: (1) develop a valid, reliable measure of test-wiseness with equivalent forms for use with students in the health sciences; and (2) determine the level of test-wiseness of entering medical students. The test-wiseness areas included in this study were: similar options, umbrella term, item give-away, convergence…
The development of an integrated assessment instrument for measuring analytical thinking and science process skills

NASA Astrophysics Data System (ADS)

Irwanto, Rohaeti, Eli; LFX, Endang Widjajanti; Suyanta

2017-05-01

This research aims to develop instrument and determine the characteristics of an integrated assessment instrument. This research uses 4-D model, which includes define, design, develop, and disseminate. The primary product is validated by expert judgment, tested it's readability by students, and assessed it's feasibility by chemistry teachers. This research involved 246 students of grade XI of four senior high schools in Yogyakarta, Indonesia. Data collection techniques include interview, questionnaire, and test. Data collection instruments include interview guideline, item validation sheet, users' response questionnaire, instrument readability questionnaire, and essay test. The results show that the integrated assessment instrument has Aiken validity value of 0.95. Item reliability was 0.99 and person reliability was 0.69. Teachers' response to the integrated assessment instrument is very good. Therefore, the integrated assessment instrument is feasible to be applied to measure the students' analytical thinking and science process skills.
Young adolescents' engagement in dietary behaviour - the impact of gender, socio-economic status, self-efficacy and scientific literacy. Methodological aspects of constructing measures in nutrition literacy research using the Rasch model.

PubMed

Guttersrud, Øystein; Petterson, Kjell Sverre

2015-10-01

The present study validates a revised scale measuring individuals' level of the 'engagement in dietary behaviour' aspect of 'critical nutrition literacy' and describes how background factors affect this aspect of Norwegian tenth-grade students' nutrition literacy. Data were gathered electronically during a field trial of a standardised sample test in science. Test items and questionnaire constructs were distributed evenly across four electronic field-test booklets. Data management and analysis were performed using the RUMM2030 item analysis package and the IBM SPSS Statistics 20 statistical software package. Students responded on computers at school. Seven hundred and forty tenth-grade students at twenty-seven randomly sampled public schools were enrolled in the field-test study. The engagement in dietary behaviour scale and the self-efficacy in science scale were distributed to 178 of these students. The dietary behaviour scale and the self-efficacy in science scale came out as valid, reliable and well-targeted instruments usable for the construction of measurements. Girls and students with high self-efficacy reported higher engagement in dietary behaviour than other students. Socio-economic status and scientific literacy - measured as ability in science by applying an achievement test - did not correlate significantly different from zero with students' engagement in dietary behaviour.
An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis.

PubMed

Tarrant, Marie; Ware, James; Mohammed, Ahmed M

2009-07-07

Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.

Science Inquiry, Academic Language, and Civic Engagement

ERIC Educational Resources Information Center

Buxton, Cory A.

2009-01-01

While some students have the opportunity to engage in the kinds of structured inquiry and real-world problem solving called for in the science education reform literature, many other students receive only a daily grind of note taking, end-of-chapter questions and sample test items from state assessments. The result is an engagement gap whereby…
A new course and textbook on Physical Models of Living Systems, for science and engineering undergraduates

NASA Astrophysics Data System (ADS)

Nelson, Philip

2015-03-01

I'll describe an intermediate-level course on ``Physical Models of Living Systems.'' The only prerequisite is first-year university physics and calculus. The course is a response to rapidly growing interest among undergraduates in a broad range of science and engineering majors. Students acquire several research skills that are often not addressed in traditional courses: Basic modeling skills Probabilistic modeling skills Data analysis methods Computer programming using a general-purpose platform like MATLAB or Python Dynamical systems, particularly feedback control. These basic skills, which are relevant to nearly any field of science or engineering, are presented in the context of case studies from living systems, including: Virus dynamics Bacterial genetics and evolution of drug resistance Statistical inference Superresolution microscopy Synthetic biology Naturally evolved cellular circuits. Work supported by NSF Grants EF-0928048 and DMR-0832802.
NIDA for Teens

MedlinePlus

... Test Your Knowledge Tech-wise: Discovering Medications by Computer Sleep Is Your Brain’s Best Friend See All Blog Items Activities, Games, and More Addiction Science Award Videos About Us Accessibility FOIA NIH Home ...
A multi-level differential item functioning analysis of trends in international mathematics and science study: Potential sources of gender and minority difference among U.S. eighth graders' science achievement

NASA Astrophysics Data System (ADS)

Qian, Xiaoyu

Science is an area where a large achievement gap has been observed between White and minority, and between male and female students. The science minority gap has continued as indicated by the National Assessment of Educational Progress and the Trends in International Mathematics and Science Studies (TIMSS). TIMSS also shows a gender gap favoring males emerging at the eighth grade. Both gaps continue to be wider in the number of doctoral degrees and full professorships awarded (NSF, 2008). The current study investigated both minority and gender achievement gaps in science utilizing a multi-level differential item functioning (DIF) methodology (Kamata, 2001) within fully Bayesian framework. All dichotomously coded items from TIMSS 2007 science assessment at eighth grade were analyzed. Both gender DIF and minority DIF were studied. Multi-level models were employed to identify DIF items and sources of DIF at both student and teacher levels. The study found that several student variables were potential sources of achievement gaps. It was also found that gender DIF favoring male students was more noticeable in the content areas of physics and earth science than biology and chemistry. In terms of item type, the majority of these gender DIF items were multiple choice than constructed response items. Female students also performed less well on items requiring visual-spatial ability. Minority students performed significantly worse on physics and earth science items as well. A higher percentage of minority DIF items in earth science and biology were constructed response than multiple choice items, indicating that literacy may be the cause of minority DIF. Three-level model results suggested that some teacher variables may be the cause of DIF variations from teacher to teacher. It is essential for both middle school science teachers and science educators to find instructional methods that work more effectively to improve science achievement of both female and minority students. Physics and earth science are two areas to be improved for both groups. Curriculum and instruction need to enhance female students' learning interests and give them opportunities to improve their visual perception skills. Science instruction should address improving minority students' literacy skills while teaching science.
Investigating Linguistic Sources of Differential Item Functioning Using Expert Think-Aloud Protocols in Science Achievement Tests

NASA Astrophysics Data System (ADS)

Roth, Wolff-Michael; Oliveri, Maria Elena; Dallie Sandilands, Debra; Lyons-Thomas, Juliette; Ercikan, Kadriye

2013-03-01

Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N = 20) and non-DIF (N = 20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.
A validation study of public health knowledge, skills, social responsibility and applied learning.

PubMed

Vackova, Dana; Chen, Coco K; Lui, Juliana N M; Johnston, Janice M

2018-06-22

To design and validate a questionnaire to measure medical students' Public Health (PH) knowledge, skills, social responsibility and applied learning as indicated in the four domains recommended by the Association of Schools & Programmes of Public Health (ASPPH). A cross-sectional study was conducted to develop an evaluation tool for PH undergraduate education through item generation, reduction, refinement and validation. The 74 preliminary items derived from the existing literature were reduced to 55 items based on expert panel review which included those with expertise in PH, psychometrics and medical education, as well as medical students. Psychometric properties of the preliminary questionnaire were assessed as follows: frequency of endorsement for item variance; principal component analysis (PCA) with varimax rotation for item reduction and factor estimation; Cronbach's Alpha, item-total correlation and test-retest validity for internal consistency and reliability. PCA yielded five factors: PH Learning Experience (6 items); PH Risk Assessment and Communication (5 items); Future Use of Evidence in Practice (6 items); Recognition of PH as a Scientific Discipline (4 items); and PH Skills Development (3 items), explaining 72.05% variance. Internal consistency and reliability tests were satisfactory (Cronbach's Alpha ranged from 0.87 to 0.90; item-total correlation > 0.59). Lower paired test-retest correlations reflected instability in a social science environment. An evaluation tool for community-centred PH education has been developed and validated. The tool measures PH knowledge, skills, social responsibilities and applied learning as recommended by the internationally recognised Association of Schools & Programmes of Public Health (ASPPH).
Colorado Student Assessment Program: 2001 Released Passages, Items, and Prompts. Grade 4 Reading and Writing, Grade 4 Lectura y Escritura, Grade 5 Mathematics and Reading, Grade 6 Reading, Grade 7 Reading and Writing, Grade 8 Mathematics, Reading and Science, Grade 9 Reading, and Grade 10 Mathematics and Reading and Writing.

ERIC Educational Resources Information Center

Colorado State Dept. of Education, Denver.

This document contains released reading comprehension passages, test items, and writing prompts from the Colorado Student Assessment Program for 2001. The sample questions and prompts are included without answers or examples of student responses. Test materials are included for: (1) Grade 4 Reading and Writing; (2) Grade 4 Lectura y Escritura…
Effects of Text Illustration on Children's Learning of a School Science Topic.

ERIC Educational Resources Information Center

Reid, D. J.; Beveridge, M.

1986-01-01

This study of 272 13-year-old science students in England focuses on the effect of varied text and picture content on learning. A criterion-referenced objective items test was used to measure the effect of pictures on students of varying abilities and compare the effectiveness of traditional worksheet presentation and microcomputer presentation.…
What Do the Prospective Science Teachers Know about the Human Eye?

ERIC Educational Resources Information Center

Sahin, Çigdem

2014-01-01

In this study, the views of the Prospective Science Teacher (PST)s about the human eye were examined. The following data collection tools were used: the Word Association Test (WAT), open ended questions, drawing technique, two tiered question item and an interview about concepts. The data of the study whose sample consisted of 34 PSTs were…
Influence of Particle Theory Conceptions on Pre-Service Science Teachers' Understanding of Osmosis and Diffusion

ERIC Educational Resources Information Center

AlHarbi, Nawaf N. S.; Treagust, David F.; Chandrasegaran, A. L.; Won, Mihye

2015-01-01

This study investigated the understanding of diffusion, osmosis and particle theory of matter concepts among 192 pre-service science teachers in Saudi Arabia using a 17-item two-tier multiple-choice diagnostic test. The data analysis showed that the pre-service teachers' understanding of osmosis and diffusion concepts was mildly correlated with…
Determination of Students' Alternative Conceptions about Chemical Equilibrium: A Review of Research and the Case of Turkey

ERIC Educational Resources Information Center

Ozmen, Haluk

2008-01-01

This study aims to determine prospective science student teachers' alternative conceptions of the chemical equilibrium concept. A 13-item pencil and paper, two-tier multiple choice diagnostic instrument, the Test to Identify Students' Alternative Conceptions (TISAC), was developed and administered to 90 second-semester science student teachers…
How an Inquiry-Based Classroom Lesson Intervenes in Science Efficacy, Career-Orientation and Self-Determination

ERIC Educational Resources Information Center

Schmid, S.; Bogner, F. X.

2017-01-01

Three subscales of the "Science Motivation Questionnaire II" (SMQII; motivational components: career motivation, self-efficacy and self-determination), with 4 items each, were applied to a sample of 209 secondary school students to monitor the impact of a 3-hour structured inquiry lesson. Four testing points (before, immediately after, 6…
Core Principles and Test Item Development for Advanced High School and Introductory University Level Food Science

ERIC Educational Resources Information Center

Laing-Kean, Claudine A. M.

2010-01-01

Programs supported by the Carl D. Perkins Act of 2006 are required to operate under the state or national content standards, and are expected to carry out evaluation procedures that address accountability. The Indiana high school course, "Advanced Life Science: Foods" ("ALS: Foods") operates under the auspices of the Perkins…
Complexity of Illustrations in PISA 2009 Science Items and Its Relationship to the Performance of Students from Shanghai-China, the United States, and Mexico

ERIC Educational Resources Information Center

Solano-Flores, Guillermo; Wang, Chao

2015-01-01

Background: While illustrations are widely used in international test comparisons, very scant research has been conducted on their design and on their influence on student performance. It is not clear how the features of illustration act in combination supporting students' access to the content of items or increasing their interpretation demands.…
Differential Item Functioning Analysis Using a Mixture 3-Parameter Logistic Model with a Covariate on the TIMSS 2007 Mathematics Test

ERIC Educational Resources Information Center

Choi, Youn-Jeng; Alexeev, Natalia; Cohen, Allan S.

2015-01-01

The purpose of this study was to explore what may be contributing to differences in performance in mathematics on the Trends in International Mathematics and Science Study 2007. This was done by using a mixture item response theory modeling approach to first detect latent classes in the data and then to examine differences in performance on items…
The Earthquake Information Test: Validating an Instrument for Determining Student Misconceptions.

ERIC Educational Resources Information Center

Ross, Katharyn E. K.; Shuell, Thomas J.

Some pre-instructional misconceptions held by children can persist through scientific instruction and resist changes. Identifying these misconceptions would be beneficial for science instruction. In this preliminary study, scores on a 60-item true-false test of knowledge and misconceptions about earthquakes were compared with previous interview…
Constructing objective tests

NASA Astrophysics Data System (ADS)

Aubrecht, Gordon J.; Aubrecht, Judith D.

1983-07-01

True-false or multiple-choice tests can be useful instruments for evaluating student progress. We examine strategies for planning objective tests which serve to test the material covered in science (physics) courses. We also examine strategies for writing questions for tests within a test blueprint. The statistical basis for judging the quality of test items are discussed. Reliability, difficulty, and discrimination indices are defined and examples presented. Our recommendation are rather easily put into practice.
Collective Protection (COLPRO) Novel Closures Testing

DTIC Science & Technology

2013-03-28

science and technology programs for future ColPro systems may include interfaces such as novel designs using zippers, hook-and-pile closures, and...necessitate new testing procedures. Additionally, stand- ards of performance must be adjusted as technologies advance. Test procedures and parameters...listed in this TOP may require updating to accommodate new technologies in test items or in test instrumentation. Any variation to the TOP procedures
Comprehensive Achievement Monitoring for Science. Symposium; National Association of Biology Teachers, San Francisco, California, October 27, 1972.

ERIC Educational Resources Information Center

White, Mona E.; And Others

Comprehensive Achievement Monitoring (CAM) is a system designed to provide a curriculum defined in terms of performance objectives, test items to measure student performance on each objective, a set of comparable test forms to evaluate performance, testing throughout the period of the course, computerized analysis and reporting of results after…
Developing a Test for Assessing Elementary Students' Comprehension of Science Texts

ERIC Educational Resources Information Center

Wang, Jing-Ru; Chen, Shin-Feng; Tsay, Reuy-Fen; Chou, Ching-Ting; Lin, Sheau-Wen; Kao, Huey-Lien

2012-01-01

This study reports on the process of developing a test to assess students' reading comprehension of scientific materials and on the statistical results of the verification study. A combination of classic test theory and item response theory approaches was used to analyze the assessment data from a verification study. Data analysis indicates the…

National Workshop on Astrobiology: The Life Science Involvement of AAS I Laben

NASA Astrophysics Data System (ADS)

Adami, Giorgio

2006-12-01

The search for traces of past and present life is a complex and multidisciplinary research activity involving several scientific heritages and a specific industrial ability for planetary exploration. Laben was established in 1958 to design and manufacture electronic instruments for research in nuclear physics. In the mid 2004 the company was merged with Alenia Spazio. It is now part of Alcatel Alenia Space, a French Italian joint venture. Alcatel Alenia Space Italia SpA is a Finmeccanica Company. Currently the plant of Vimodrone provides a wide heritage in life science oriented to space application. The experience in Space Life Science is consolidated in the following research areas: (1) Physiology: Mouse models related to studies on human physiology Human neuroscience research and dosimetry (2) Animal Adaptation and Behaviour: mice behaviour related to stabling stress (3) Developmental Biology: aquatic microorganisms cultivation (4) Cell culture & Biotechnology: Protein crystal growth General purpose Multiwell Next Biotechnology studies and development: Bio reactor, mainly oriented to tissue engineering Microsensor for tissue control (organ replacement) Multiwell for adherent cell culture or for automated biosensor based on cell culture Experiment Container for organic systems Experiment Container for small animals Instrumentation based on fluorescent Biosensors Sensors for Life science experiments for Biopan capsule and Space Vehicle Ray Shielding Materials Random Positioning Machine specialisation (Support ground equipment) The biological features of this heritage is at disposal for the exobiology multi science. The involvement of industries, from the beginning of the exobiology projects, allows a cost effective technologies closed loop development between Research Centres, Principal Investigators and industry.
Communicating Ocean & Climate Science: Promoting Knowledge, Responsible Decision-making and Interest in Geoscience Careers

NASA Astrophysics Data System (ADS)

Bruno, B. C.; Hsia, M.; Wiener, C.

2012-12-01

Climate change is not just an atmospheric phenomenon. It has serious impacts on the ocean, such as sea level rise, ocean acidification, and coral bleaching. Ocean FEST (Families Exploring Science Together) aims to educate participants about how increasing carbon dioxide is affecting our oceans, and to inspire students to pursue ocean, earth and environmental science careers. Throughout the program, participants examine their everyday decisions and the impact of their choices on the planet's climate and oceans. Ocean FEST is a two-hour program that explores the ocean and relevant environmental topics through six hands-on science activities. Activities are designed so students can see how globally important issues (e.g., climate change and ocean acidification) have local effects (e.g., sea level rise, coastal erosion, coral bleaching). The program ends with a career component, drawing parallels between the program activities and the activities done by "real scientists" in their jobs. Over the past three years, we have conducted over 60 Ocean FEST events. Evaluations are conducted at selected events using electronic surveys, which students and parents complete immediately prior to (pre-survey) and following (post-survey) the program. Survey items were developed and cognitively tested in collaboration with professional evaluators from the American Institute of Research. The nine-item survey includes items on science content knowledge, personal responsibility, and career interest. For each survey item, participants are asked to indicate agreement (coded as 2.0), disagreement (1.0) or don't know (1.5). By comparing the pre- and post-survey results, we can evaluate program efficacy. For example, one survey item is: "I can do something every day to help fight global climate change." Student mean data moved from 1.78 pre-survey to 1.89 post-survey, which is a statistically significant gain at p<.000. Mean parent data for this same item moved from 1.90 pre-survey to 1.96 post-survey, which is again a statistically significant gain at p<.000. In summary, we have found positive statistically significant gains on all survey items for students, and on all but one survey item for parents. These results strongly indicate program efficacy. For more information, please visit our web site: oceanfest.soest.hawaii.edu
Scientific Caricatures in the Earth Science Classroom: An Alternative Assessment for Meaningful Science Learning

NASA Astrophysics Data System (ADS)

Clary, Renee M.; Wandersee, James H.

2010-01-01

Archive-based, historical research of materials produced during the Golden Age of Geology (1788-1840) uncovered scientific caricatures (SCs) which may serve as a unique form of knowledge representation for students today. SCs played important roles in the past, stimulating critical inquiry among early geologists and fueling debates that addressed key theoretical issues. When historical SCs were utilized in a large-enrollment college Earth History course, student response was positive. Therefore, we offered SCs as an optional assessment tool. Paired t-tests that compared individual students’ performances with the SC option, as well as without the SC option, showed a significant positive difference favoring scientific caricatures ( α = 0.05). Content analysis of anonymous student survey responses revealed three consistent findings: (a) students enjoyed expressing science content correctly but creatively through SCs, (b) development of SCs required deeper knowledge integration and understanding of the content than conventional test items, and (c) students appreciated having SC item options on their examinations, whether or not they took advantage of them. We think that incorporation of SCs during assessment may effectively expand the variety of methods for probing understanding, thereby increasing the mode validity of current geoscience tests.
"The Role of the Unit in Physics and Psychometrics" by Stephen Humphry--One Small Step for the Rasch Model, but Possibly One Giant Leap for Measurement in the Social Sciences

ERIC Educational Resources Information Center

Salzberger, Thomas

2011-01-01

Compared to traditional test theory, where person measures are typically referenced to the distribution of a population, item response theory allows for a much more meaningful interpretation of measures as they can be directly compared to item locations. However, Stephen Humphry shows that the crucial role of the unit of measurement has been…
Impact of Jigsaw on the Achievement and Attitudes of Saudi Arabian Male High School Science Students

NASA Astrophysics Data System (ADS)

Alghamdi, Abdulmonem

The aim of the study is to investigate the impact of cooperative learning instruction, specifically by using the Jigsaw instructional strategy on science achievement and attitudes towards science among 11th grade students. Based upon previous research literature, it was hypothesized that significant differences existed on gains between general science achievement of experimental group and control group. The quasi-experimental design was chosen for this study. The study sample consisted of 50 students of 11th grade class who were equally distributed among experimental group and control group, matched on the basic of their annual examination at general science scores. The students' achievement was measured through the implementation of 30-item achievement test used as a pretest, as well as a posttest and deferred (follow-up) test. The experiment group was taught through cooperative learning while control group was taught through the instructions of "traditional teaching". The material was used such as lesson plans, worksheets and quizzes, designed to implement Jigsaw as a cooperative learning methodology. For the attitude scale towards science, a published 30-item Likert scale called Test of Science Related Attitudes (TOSRA) has been translated to Arabic in order to determine the students' attitudes ranging between strongly agree to strongly disagree. The data were analyzed through repeated measure analysis and multivariate analysis of variance with a .05 selected level of significance. The results of this study showed that using Jigsaw as a cooperative learning strategy has improved the students' achievement for the benefit of the experimental group. However, there was no significant change on the students' attitudes towards science for both groups, where the scores of all the attitude subscales were at or near the neutral level.
Assessment practices of third- and fifth-grade science teachers: A comparison to the style/format, process, and content of Ohio's proficiency tests

NASA Astrophysics Data System (ADS)

Janson, David C.

This descriptive study is addressed to policy-makers, textbook publisher, teachers, principals, and curriculum directors. It compares the assessment practices of ten elementary teachers over a period of 11 weeks with Ohio's fourth and sixth grade science Proficiency Tests. Results show that the teachers' assessment practices were not aligned with Ohio's Proficiency Test. The tests used in the participants' classroom contained a disproportionate number of items characterized as low-level in terms of their cognitive function. Classroom test items generally fell into three categories---true/false, completion, and matching. The remaining items were predominantly low-level multiple-choice items requiring simple recall of information. The teachers in this study showed a heavy reliance on the packaged assessments that accompanied their adopted textbook series with little use of teacher-designed instruments. This differs from the findings of previous researchers who reported that most teacher assessments were done with teacher-made tests. The lack of alignment between classroom tests and Ohio's Proficiency Test is a concern because previous researchers and the teachers in this study believe that aligning classroom tests with high-stakes assessment improves student performance. Other research shows teachers teach what they test suggesting that the curriculum would be better aligned with State expectations if classroom tests were more in line with the proficiency tests. This study found that textbooks and their assessment packages are not aligned to most state standards and that teachers need help developing better assessments. The results of this study suggest directions school administrators might take to facilitate inservice training for current teachers and could be helpful to textbook publishers as well as educators serving on adoption committees. Since high-stakes testing of students in the nation's public schools and school accountability seem destined to remain a part of the American educational system, educators at all levels---teachers and administrators at the local level, consultants and administrators at the state level, and policymakers at the state and national levels---may want to consider the implications of these findings.
Do large-scale assessments measure students' ability to integrate scientific knowledge?

NASA Astrophysics Data System (ADS)

Lee, Hee-Sun

2010-03-01

Large-scale assessments are used as means to diagnose the current status of student achievement in science and compare students across schools, states, and countries. For efficiency, multiple-choice items and dichotomously-scored open-ended items are pervasively used in large-scale assessments such as Trends in International Math and Science Study (TIMSS). This study investigated how well these items measure secondary school students' ability to integrate scientific knowledge. This study collected responses of 8400 students to 116 multiple-choice and 84 open-ended items and applied an Item Response Theory analysis based on the Rasch Partial Credit Model. Results indicate that most multiple-choice items and dichotomously-scored open-ended items can be used to determine whether students have normative ideas about science topics, but cannot measure whether students integrate multiple pieces of relevant science ideas. Only when the scoring rubric is redesigned to capture subtle nuances of student open-ended responses, open-ended items become a valid and reliable tool to assess students' knowledge integration ability.
Competency Test Items for Fundamentals of Agribusiness and Natural Resource Occupations. A Report of Research.

ERIC Educational Resources Information Center

McGhee, Max B.; Cheek, Jimmy G.

An activity was undertaken to develop written criterion-referenced tests for each of the instructional areas comprising the Fundamentals of Agribusiness and Natural Resources Occupations Program. Designed to be taught at the ninth grade level, the program consists of six major instructional areas: agribusiness management, animal science, plant…
Readability and Item Difficulty of the Texas Assessment of Knowledge and Skills Fifth-Grade Science Tests

ERIC Educational Resources Information Center

Thomas, Conn; Carpenter, Clint

2008-01-01

The development of the Texas Assessment of Knowledge and Skills test involves input from educators across the state. The development process attempts to create an assessment that reflects the skills and content understanding of students at the tested grade level. This study attempts to determine other factors that can affect student performance on…
Science knowledge and biblical literalism.

PubMed

Zigerell, L J

2012-04-01

Biblical literalists are often described as scientific illiterates, but little if any empirical research has tested this claim. Analysis of a sixteen-item battery from the 2008 US General Social Survey revealed that literalists possess less science knowledge than those with other views of Scripture, but that much of this deficit can be attributed to demographic factors and unequal educational attainment. The marginal direct effect of biblical belief suggests that literalism is not incompatible with knowledge of science and, therefore, the best avenue for increasing science knowledge among literalists may be to foster interest in science and design science courses to attenuate any perceived conflict between science and religion.
Capturing specific abilities as a window into human individuality: the example of face recognition.

PubMed

Wilmer, Jeremy B; Germine, Laura; Chabris, Christopher F; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken

2012-01-01

Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality.
The language of science and the high school student: The recognition of concept definitions: A comparison between hindi speaking students in India and english speaking students in Australia

NASA Astrophysics Data System (ADS)

Lynch, P. P.; Chipman, H. H.; Pachaury, A. C.

Sixteen concept words (mass, length, area, volume, solid, liquid, gas, element, compound, mixture, electron, proton, neutron, atom, molecule, and ion) associated with the theme, the nature of matter were described as simple text book definitions after examination of classroom notes and school texts of the last three decades. Sixteen multiple-choice items all of the same form were constructed for each of the concept definitions. The English version of the sixteen item test was given to 1635 high school students in Tasmania (where the language of instruction and the home language is English) and the Hindi version of the test was given to 826 students from the Bhopal/Barwani region of India where the medium of instruction is Hindi. The English and Hindi speaking data are compared from the point of view of development, performance for individual items, and overall performance at grade 10. A number of linguistic hypotheses are examined and reported upon. Although the overall score at grade 10 was identical (10.8/16) for both groups there are differences in development overall and for individual items which are of interest. Overall, the science specificity of the Hindi words does not appear to confer any clearly defined advantage or disadvantage though again there are some interesting individual anomolies.
A validation study of an alternate state science assessment: Alignment of the Pennsylvania Alternate System of Assessment (PASA) science assessment

NASA Astrophysics Data System (ADS)

Heh, Peter

The current study examined the validation and alignment of the PASA-Science by determining whether the alternate science assessment anchors linked to the regular education science anchors; whether the PASA-Science assessment items are science; whether the PASA-Science assessment items linked to the alternate science eligible content, and what PASA-Science assessment content was considered important by parents and teachers. Special education and science education university faculty determined all but one alternate science assessment anchor linked to the regular science assessment anchors. Special education and science education teachers determined that the PASA-Science assessment items are indeed science and linked to the alternate science eligible content. Finally, parents and teachers indicated the most important science content assessed in the PASA-Science involved safety and independence.
Relationships Between the Way Students Are Assessed in Science Classrooms and Science Achievement Across Canada

NASA Astrophysics Data System (ADS)

Chu, Man-Wai; Fung, Karen

2018-04-01

Canadian students experience many different assessments throughout their schooling (O'Connor 2011). There are many benefits to using a variety of assessment types, item formats, and science-based performance tasks in the classroom to measure the many dimensions of science education. Although using a variety of assessments is beneficial, it is unclear exactly what types, format, and tasks are used in Canadian science classrooms. Additionally, since assessments are often administered to help improve student learning, this study identified assessments that may improve student learning as measured using achievement scores on a standardized test. Secondary analyses of the students' and teachers' responses to the questionnaire items asked in the Pan-Canadian Assessment Program were performed. The results of the hierarchical linear modeling analyses indicated that both students and teachers identified teacher-developed classroom tests or quizzes as the most common types of assessments used. Although this ranking was similar across the country, statistically significant differences in terms of the assessments that are used in science classrooms among the provinces were also identified. The investigation of which assessment best predicted student achievement scores indicated that minds-on science performance-based tasks significantly explained 4.21% of the variance in student scores. However, mixed results were observed between the student and teacher responses towards tasks that required students to choose their own investigation and design their own experience or investigation. Additionally, teachers that indicated that they conducted more demonstrations of an experiment or investigation resulted in students with lower scores.
Construct Validation of the Self-Efficacy Teaching and Knowledge Instrument for Science Teachers-Revised (SETAKIST-R): Lessons Learned

NASA Astrophysics Data System (ADS)

Pruski, Linda A.; Blanco, Sharon L.; Riggs, Rosemary A.; Grimes, Kandi K.; Fordtran, Chase W.; Barbola, Gina M.; Cornell, John E.; Lichtenstein, Michael J.

2013-11-01

Described herein is the academic lineage and independent validation of the Self-Efficacy Teaching and Knowledge Instrument for Science Teachers-Revised (SETAKIST-R). Data from 334 K-12 science teachers were analyzed using Partial Credit Rasch models. Principal components analysis on the person-item residuals suggest two latent dimensions: Knowledge and Teaching Self-Efficacies. Item-fit statistics were used to select items for each subscale. Person and item separation (reliability) indices were quite low, and we noted disordered response patterns on the person-item maps that revealed problems with item content and/or scaling for both subscales. These issues include the presence of: verbal negatives, ambiguous modifiers, counter-intuitive scaling, and an "undecided/uncertain" option. The SETAKIST-R, in its current form, cannot be recommended as a measure of science teacher self-efficacy.
Science-Technology-Society literacy in college non-majors biology: Comparing problem/case studies based learning and traditional expository methods of instruction

NASA Astrophysics Data System (ADS)

Peters, John S.

This study used a multiple response model (MRM) on selected items from the Views on Science-Technology-Society (VOSTS) survey to examine science-technology-society (STS) literacy among college non-science majors' taught using Problem/Case Studies Based Learning (PBL/CSBL) and traditional expository methods of instruction. An initial pilot investigation of 15 VOSTS items produced a valid and reliable scoring model which can be used to quantitatively assess student literacy on a variety of STS topics deemed important for informed civic engagement in science related social and environmental issues. The new scoring model allows for the use of parametric inferential statistics to test hypotheses about factors influencing STS literacy. The follow-up cross-institutional study comparing teaching methods employed Hierarchical Linear Modeling (HLM) to model the efficiency and equitability of instructional methods on STS literacy. A cluster analysis was also used to compare pre and post course patterns of student views on the set of positions expressed within VOSTS items. HLM analysis revealed significantly higher instructional efficiency in the PBL/CSBL study group for 4 of the 35 STS attitude indices (characterization of media vs. school science; tentativeness of scientific models; cultural influences on scientific research), and more equitable effects of traditional instruction on one attitude index (interdependence of science and technology). Cluster analysis revealed generally stable patterns of pre to post course views across study groups, but also revealed possible teaching method effects on the relationship between the views expressed within VOSTS items with respect to (1) interdependency of science and technology; (2) anti-technology; (3) socioscientific decision-making; (4) scientific/technological solutions to environmental problems; (5) usefulness of school vs. media characterizations of science; (6) social constructivist vs. objectivist views of theories; (7) impact of cultural religious/ethical views on science; (8) tentativeness of scientific models, evidence and predictions; (9) civic control of technological developments. This analysis also revealed common relationships between student views which would not have been revealed under the original unique response model (URM) of VOSTS and also common viewpoint patterns that warrant further qualitative exploration.
Hong Kong Student Achievement in OECD-PISA Study: Gender Differences in Science Content, Literacy Skills, and Test Item Formats

ERIC Educational Resources Information Center

Yip, Din Yan; Chiu, Ming Ming; Ho, Esther Sui Chu

2004-01-01

This study examined gender differences in students' scientific literacy as measured by OECD-PISA. In particular, we focused on the 2437 students from 140 Hong Kong schools. Hong Kong boys' and girls' science scores did not differ overall. However, boys scored higher than girls at the higher percentiles (75th and above). Moreover, specific test…
What We Don't Test: What an Analysis of Unreleased ACS Exam Items Reveals about Content Coverage in General Chemistry Assessments

ERIC Educational Resources Information Center

Reed, Jessica J.; Villafan~e, Sachel M.; Raker, Jeffrey R.; Holme, Thomas A.; Murphy, Kristen L.

2017-01-01

General chemistry courses are often the foundation for the study of other science disciplines and upper-level chemistry concepts. Students who take introductory chemistry courses are more often from health and science-related fields than chemistry. As such, the content taught and assessed in general chemistry courses is envisioned as building…
Dimensionality and predictive validity of the HAM-Nat, a test of natural sciences for medical school admission

PubMed Central

2011-01-01

Background Knowledge in natural sciences generally predicts study performance in the first two years of the medical curriculum. In order to reduce delay and dropout in the preclinical years, Hamburg Medical School decided to develop a natural science test (HAM-Nat) for student selection. In the present study, two different approaches to scale construction are presented: a unidimensional scale and a scale composed of three subject specific dimensions. Their psychometric properties and relations to academic success are compared. Methods 334 first year medical students of the 2006 cohort responded to 52 multiple choice items from biology, physics, and chemistry. For the construction of scales we generated two random subsamples, one for development and one for validation. In the development sample, unidimensional item sets were extracted from the item pool by means of weighted least squares (WLS) factor analysis, and subsequently fitted to the Rasch model. In the validation sample, the scales were subjected to confirmatory factor analysis and, again, Rasch modelling. The outcome measure was academic success after two years. Results Although the correlational structure within the item set is weak, a unidimensional scale could be fitted to the Rasch model. However, psychometric properties of this scale deteriorated in the validation sample. A model with three highly correlated subject specific factors performed better. All summary scales predicted academic success with an odds ratio of about 2.0. Prediction was independent of high school grades and there was a slight tendency for prediction to be better in females than in males. Conclusions A model separating biology, physics, and chemistry into different Rasch scales seems to be more suitable for item bank development than a unidimensional model, even when these scales are highly correlated and enter into a global score. When such a combination scale is used to select the upper quartile of applicants, the proportion of successful completion of the curriculum after two years is expected to rise substantially. PMID:21999767
Dimensionality and predictive validity of the HAM-Nat, a test of natural sciences for medical school admission.

PubMed

Hissbach, Johanna C; Klusmann, Dietrich; Hampe, Wolfgang

2011-10-14

Knowledge in natural sciences generally predicts study performance in the first two years of the medical curriculum. In order to reduce delay and dropout in the preclinical years, Hamburg Medical School decided to develop a natural science test (HAM-Nat) for student selection. In the present study, two different approaches to scale construction are presented: a unidimensional scale and a scale composed of three subject specific dimensions. Their psychometric properties and relations to academic success are compared. 334 first year medical students of the 2006 cohort responded to 52 multiple choice items from biology, physics, and chemistry. For the construction of scales we generated two random subsamples, one for development and one for validation. In the development sample, unidimensional item sets were extracted from the item pool by means of weighted least squares (WLS) factor analysis, and subsequently fitted to the Rasch model. In the validation sample, the scales were subjected to confirmatory factor analysis and, again, Rasch modelling. The outcome measure was academic success after two years. Although the correlational structure within the item set is weak, a unidimensional scale could be fitted to the Rasch model. However, psychometric properties of this scale deteriorated in the validation sample. A model with three highly correlated subject specific factors performed better. All summary scales predicted academic success with an odds ratio of about 2.0. Prediction was independent of high school grades and there was a slight tendency for prediction to be better in females than in males. A model separating biology, physics, and chemistry into different Rasch scales seems to be more suitable for item bank development than a unidimensional model, even when these scales are highly correlated and enter into a global score. When such a combination scale is used to select the upper quartile of applicants, the proportion of successful completion of the curriculum after two years is expected to rise substantially.

Expanding the basic science debate: the role of physics knowledge in interpreting clinical findings.

PubMed

Goldszmidt, Mark; Minda, John Paul; Devantier, Sarah L; Skye, Aimee L; Woods, Nicole N

2012-10-01

Current research suggests a role for biomedical knowledge in learning and retaining concepts related to medical diagnosis. However, learning may be influenced by other, non-biomedical knowledge. We explored this idea using an experimental design and examined the effects of causal knowledge on the learning, retention, and interpretation of medical information. Participants studied a handout about several respiratory disorders and how to interpret respiratory exam findings. The control group received the information in standard "textbook" format and the experimental group was presented with the same information as well as a causal explanation about how sound travels through lungs in both the normal and disease states. Comprehension and memory of the information was evaluated with a multiple-choice exam. Several questions that were not related to the causal knowledge served as control items. Questions related to the interpretation of physical exam findings served as the critical test items. The experimental group outperformed the control group on the critical test items, and our study shows that a causal explanation can improve a student's memory for interpreting clinical details. We suggest an expansion of which basic sciences are considered fundamental to medical education.
Spatial ability mediates the gender difference in middle school students' science performance.

PubMed

Ganley, Colleen M; Vasilyeva, Marina; Dulaney, Alana

2014-01-01

Prior research has demonstrated a male advantage in spatial skills and science achievement. The present research integrated these findings by testing the potential role of spatial skills in gender differences in the science performance of eighth-grade students (13-15 years old). In (N = 113), the findings showed that mental rotation ability mediated gender differences in physical science and technology/engineering test scores. In (N = 73,245), science performance was examined in a state population of eighth-grade students. As in , the results revealed larger gender differences on items that showed higher correlations with mental rotation. These findings underscore the importance of considering spatial training interventions aimed at reducing gender differences in the science performance of school-aged children. © 2014 The Authors. Child Development © 2014 Society for Research in Child Development, Inc.
Using Classical Test Theory and Item Response Theory to Evaluate the LSCI

NASA Astrophysics Data System (ADS)

Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS

2011-01-01

Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Australian Item Bank Program: Science Item Bank. Book 3: Biology.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

The Australian Science Item Bank consists of three volumes of multiple-choice questions. Book 3 contains questions on the biological sciences. The questions are designed to be suitable for high school students (year 8 to year 12 in Australian schools). The questions are classified by the subject content of the question, the cognitive skills…
Using the Mixture Rasch Model to Explore Knowledge Resources Students Invoke in Mathematic and Science Assessments

ERIC Educational Resources Information Center

Zhang, Danhui; Orrill, Chandra; Campbell, Todd

2015-01-01

The purpose of this study was to investigate whether mixture Rasch models followed by qualitative item-by-item analysis of selected Programme for International Student Assessment (PISA) mathematics and science items offered insight into knowledge students invoke in mathematics and science separately and combined. The researchers administered an…
A Confirmatory Factor Analysis of the Life Orientation Test-Revised with Competitive Athletes

ERIC Educational Resources Information Center

Appaneal, Renee N.

2012-01-01

Current reviews outside of sport indicate that the Life Orientation Test-Revised (LOT-R) items load on two separate factors (optimism and pessimism) and, therefore, should be treated as independent constructs. However, researchers in the sport sciences continue to use the single composite score reflecting a unidimensional definition of optimism.…
76 FR 24923 - National Science Board; Sunshine Act Meetings; Notice

Federal Register 2010, 2011, 2012, 2013, 2014

2011-05-03

...: Some portions open, some portions closed. UPDATES: Please refer to the National Science Board Web site... Information Item: Status Deep Underground Science and Engineering Laboratory Information Item: High...
76 FR 9054 - National Science Board; Sunshine Act Meetings; Impromptu Notice of Change (Addition of Agenda Item)

Federal Register 2010, 2011, 2012, 2013, 2014

2011-02-16

... NATIONAL SCIENCE FOUNDATION National Science Board; Sunshine Act Meetings; Impromptu Notice of Change (Addition of Agenda Item) The National Science Board's (NSB) Audit & Oversight (A&O) Committee..., National Science Foundation, 4201 Wilson Blvd., Arlington, VA 22230. Telephone: (703) 292-7000. Daniel A...
What can we learn from PISA?: Investigating PISA's approach to scientific literacy

NASA Astrophysics Data System (ADS)

Schwab, Cheryl Jean

This dissertation is an investigation of the relationship between the multidimensional conception of scientific literacy and its assessment. The Programme for International Student Assessment (PISA), developed under the auspices of the Organization for Economic Cooperation and Development (OECD), offers a unique opportunity to evaluate the assessment of scientific literacy. PISA developed a continuum of performance for scientific literacy across three competencies (i.e., process, content, and situation). Foundational to the interpretation of PISA science assessment is PISA's definition of scientific literacy, which I argue incorporates three themes drawn from history: (a) scientific way of thinking, (b) everyday relevance of science, and (c) scientific literacy for all students. Three coordinated studies were conducted to investigate the validity of PISA science assessment and offer insight into the development of items to assess scientific 2 literacy. Multidimensional models of the internal structure of the PISA 2003 science items were found not to reflect the complex character of PISA's definition of scientific literacy. Although the multidimensional models across the three competencies significantly decreased the G2 statistic from the unidimensional model, high correlations between the dimensions suggest that the dimensions are similar. A cognitive analysis of student verbal responses to PISA science items revealed that students were using competencies of scientific literacy, but the competencies were not elicited by the PISA science items at the depth required by PISA's definition of scientific literacy. Although student responses contained only knowledge of scientific facts and simple scientific concepts, students were using more complex skills to interpret and communicate their responses. Finally the investigation of different scoring approaches and item response models illustrated different ways to interpret student responses to assessment items. These analyses highlighted the complexities of students' responses to the PISA science items and the use of the ordered partition model to accommodate different but equal item responses. The results of the three investigations are used to discuss ways to improve the development and interpretation of PISA's science items.
An examination of gender bias on the eighth-grade MEAP science test as it relates to the Hunter Gatherer Theory of Spatial Sex Differences

NASA Astrophysics Data System (ADS)

Armstrong-Hall, Judy Gail

The purpose of this study was to apply the Hunter-Gatherer Theory of sex spatial skills to responses to individual questions by eighth grade students on the Science component of the Michigan Educational Assessment Program (MEAP) to determine if sex bias was inherent in the test. The Hunter-Gatherer Theory on Spatial Sex Differences, an original theory, that suggested a spatial dimorphism concept with female spatial skill of pattern recall of unconnected items and male spatial skills requiring mental movement. This is the first attempt to apply the Hunter-Gatherer Theory on Spatial Sex Differences to a standardized test. An overall hypothesis suggested that the Hunter-Gatherer Theory of Spatial Sex Differences could predict that males would perform better on problems involving mental movement and females would do better on problems involving the pattern recall of unconnected items. Responses to questions on the 1994-95 MEAP requiring the use of male spatial skills and female spatial skills were analyzed for 5,155 eighth grade students. A panel composed of five educators and a theory developer determined which test items involved the use of male and female spatial skills. A MANOVA, using a random sample of 20% of the 5,155 students to compare male and female correct scores, was statistically significant, with males having higher scores on male spatial skills items and females having higher scores on female spatial skills items. Pearson product moment correlation analyses produced a positive correlation for both male and female performance on both types of spatial skills. The Hunter-Gatherer Theory of Spatial Sex Differences appears to be able to predict that males could perform better on the problems involving mental movement and females could perform better on problems involving the pattern recall of unconnected items. Recommendations for further research included: examination of male/female spatial skill differences at early elementary and high school levels to determine impact of gender on difficulties in solving spatial problems; investigation of the relationship between dominant female spatial skills for students diagnosed with ADHD; study effects of teaching male spatial skills to female students starting in early elementary school to determine the effect on standardized testing.
The Geoscience Concept Test: A New Assessment Tool Based on Student Misconceptions

NASA Astrophysics Data System (ADS)

Libarkin, J.; Anderson, S. W.; Boone, W. J.; Beilfuss, M.; Dahl, J.

2002-12-01

We developed and began pilot testing of an earth science assessment tool called the geoscience concept test (GCT). The GCT uses student misconceptions as distractors in a 30 item multiple-choice instrument. Student misconceptions were first assessed through the analysis of nearly 300 questionnaires administered in introductory geology courses at three institutions. Results from the questionnaires guided the development of an interview protocol that was used by four interviewers at four different institutions. Over 100 in-depth student interviews lasting from 0.5 to 1 hour probed topics related to the Earth's interior, geologic time, and the formation of Earth surface features such as mountains and volcanoes to better define misconceptions. Thematic content analysis of the interviews identified a number of widely held misconceptions, which were then incorporated into the GCT as multiple-choice distractors (wrong answers). For content validity, the initial GCT was reviewed by seven experts (3 geoscientists and 4 science educators) and revised before pilot testing. Approximately 100 introductory and non-science major college students from four institutions were assessed with the GCT pilot in the spring of 2002. Rasch model analysis of this data showed that students found the pilot test difficult, and the level of difficulty was consistent between the four institutions. Analysis of individual items showed that students had fewer misconceptions regarding the locations of earthquakes, and many misconceptions regarding the locations of volcanoes on the Earth's surface, suggesting a disconnect in their understanding of the role of plate tectonics in these phenomena. Analysis of the misfit statistic for each item showed that none of the questions misfit, although we dropped one question and modified the wording of another for clarity in the next round of piloting. A second round of piloting scheduled for the fall of 2002 includes nearly 3000 students from 34 institutions in 19 states.
Building the BIKE: Development and Testing of the Biotechnology Instrument for Knowledge Elicitation (BIKE)

NASA Astrophysics Data System (ADS)

Witzig, Stephen B.; Rebello, Carina M.; Siegel, Marcelle A.; Freyermuth, Sharyn K.; Izci, Kemal; McClure, Bruce

2014-10-01

Identifying students' conceptual scientific understanding is difficult if the appropriate tools are not available for educators. Concept inventories have become a popular tool to assess student understanding; however, traditionally, they are multiple choice tests. International science education standard documents advocate that assessments should be reform based, contain diverse question types, and should align with instructional approaches. To date, no instrument of this type targeting student conceptions in biotechnology has been developed. We report here the development, testing, and validation of a 35-item Biotechnology Instrument for Knowledge Elicitation (BIKE) that includes a mix of question types. The BIKE was designed to elicit student thinking and a variety of conceptual understandings, as opposed to testing closed-ended responses. The design phase contained nine steps including a literature search for content, student interviews, a pilot test, as well as expert review. Data from 175 students over two semesters, including 16 student interviews and six expert reviewers (professors from six different institutions), were used to validate the instrument. Cronbach's alpha on the pre/posttest was 0.664 and 0.668, respectively, indicating the BIKE has internal consistency. Cohen's kappa for inter-rater reliability among the 6,525 total items was 0.684 indicating substantial agreement among scorers. Item analysis demonstrated that the items were challenging, there was discrimination among the individual items, and there was alignment with research-based design principles for construct validity. This study provides a reliable and valid conceptual understanding instrument in the understudied area of biotechnology.
A comparison of rural high school students in Germany with rural Tennessee high school students' mathematics and science achievement

NASA Astrophysics Data System (ADS)

Harding, R. Fredrick

This descriptive study compared the science and mathematics aptitudes and achievement test scores for the final school year students in rural White County and Van Buren County, Tennessee with rural county students in Germany. In accordance with the previous research literature (Stevenson, 2002), German students outperformed U.S. students on The International Trends in Math and Science test (TIMSS). As reform in the U.S. education system has been underway, this study intended to compare German county student final school year performance with White County and Van Buren County (Grade 12) performance in science and mathematics. The entire populations of 176 White and Van Buren Counties senior high final school year students were compared with 120 school final year students from two rural German county high schools. The student responses to identical test and questionnaire items were compared using the t-test statistical analysis. In conclusion after t-test analyses, there was no significant difference (p>.05 level) in student attitudes on the 27 problem achievement and the 35 TIMSS questionnaire items between the sampled population of 120 German students compared with the population of 176 White and Van Buren students. Also, there was no statistically significant difference (p>.05 level) between the German, White, and Van Buren County rural science and math achievement in the TIMSS problem section of the final year test. Based on the research, recommendations to improve U.S. student scores to number one in the world include making changes in teaching methodology in mathematics and science; incorporating pamphlet lessons rather than heavily reliance on textbooks; focusing on problem solving; establishing an online clearinghouse for effective lessons; creating national standards in mathematics and science; matching students' course choices to job aspirations; tracking misbehaving students rather than mainstreaming them into the regular classroom; and designing individual educational plans for every student. Further study and future investigations are recommended from this study to compare White County and Van Buren County Students with other rural county schools in Tennessee, as well as other states. In addition, the Tennessee students' state mandated science and mathematics could be correlated to the TIMMS to identify trends and relationships. Future comparisons of White County and Van Buren County with higher scoring rural Asian students could be done in search of more effective methods of teaching science and mathematics.
Investigating Omani Science Teachers' Attitudes towards Teaching Science: The Role of Gender and Teaching Experiences

ERIC Educational Resources Information Center

Ambusaidi, Abdullah; Al-Farei, Khalid

2017-01-01

A 30-item questionnaire was designed to determine Omani science teachers' attitudes toward teaching science and whether or not these attitudes differ according to gender and teaching experiences of teachers. The questionnaire items were divided into 3 domains: classroom preparation, managing hands-on science, and development appropriateness. The…
An Adaptation of the Original Fresno Test to Measure Evidence-Based Practice Competence in Pediatric Bedside Nurses.

PubMed

Laibhen-Parkes, Natasha; Kimble, Laura P; Melnyk, Bernadette Mazurek; Sudia, Tanya; Codone, Susan

2018-06-01

Instruments used to assess evidence-based practice (EBP) competence in nurses have been subjective, unreliable, or invalid. The Fresno test was identified as the only instrument to measure all the steps of EBP with supportive reliability and validity data. However, the items and psychometric properties of the original Fresno test are only relevant to measure EBP with medical residents. Therefore, the purpose of this paper is to describe the development of the adapted Fresno test for pediatric nurses, and provide preliminary validity and reliability data for its use with Bachelor of Science in Nursing-prepared pediatric bedside nurses. General adaptations were made to the original instrument's case studies, item content, wording, and format to meet the needs of a pediatric nursing sample. The scoring rubric was also modified to complement changes made to the instrument. Content and face validity, and intrarater reliability of the adapted Fresno test were assessed during a mixed-methods pilot study conducted from October to December 2013 with 29 Bachelor of Science in Nursing-prepared pediatric nurses. Validity data provided evidence for good content and face validity. Intrarater reliability estimates were high. The adapted Fresno test presented here appears to be a valid and reliable assessment of EBP competence in Bachelor of Science in Nursing-prepared pediatric nurses. However, further testing of this instrument is warranted using a larger sample of pediatric nurses in diverse settings. This instrument can be a starting point for evaluating the impact of EBP competence on patient outcomes. © 2018 Sigma Theta Tau International.
Differential Item Functioning by Gender on a Large-Scale Science Performance Assessment: A Comparison across Grade Levels.

ERIC Educational Resources Information Center

Holweger, Nancy; Taylor, Grace

The fifth-grade and eighth-grade science items on a state performance assessment were compared for differential item functioning (DIF) due to gender. The grade 5 sample consisted of 8,539 females and 8,029 males and the grade 8 sample consisted of 7,477 females and 7,891 males. A total of 30 fifth grade items and 26 eighth grade items were…
Rasch Analysis of Scientific Literacy in an Astronomical Citizen Science Project

NASA Astrophysics Data System (ADS)

Price, A.

2012-06-01

(Abstract only) We investigate change in attitudes towards science and belief in the nature of science by participants in a citizen science project about astronomy. A pre-test was given to 1,385 participants and a post-test was given six months later to 165 participants. Nine participants were interviewed. Responses were analyzed using the Rasch Rating Scale Model to place Likert data on an interval scale allowing for more sensitive parametric analysis. Results show that overall attitudes did not change, p = .225. However, there was significant change towards attitudes relating to science news (positive) and scientific self efficacy (negative), p = .001 and p = .035, respectively. This change was related to social activity in the project. Beliefs in the nature of science exhibited a small but significant increase, p = .04. Relative positioning of scores on the belief items suggests the increase is mostly due to reinforcement of current beliefs.
Rasch Analysis of Scientific Literacy in an Astronomical Citizen Science Project

NASA Astrophysics Data System (ADS)

Price, Aaron

2011-05-01

We investigate change in attitudes towards science and belief in the nature of science by participants in a citizen science project about astronomy. A pre-test was given to 1,385 participants and a post-test was given six months later to 165 participants. Nine participants were interviewed. Responses were analyzed using the Rasch Rating Scale Model to place Likert data on an interval scale allowing for more sensitive parametric analysis. Results show that overall attitudes did not change, p = .225. However, there was significant change towards attitudes relating to science news (positive) and scientific self efficacy (negative), p < .001 and p = .035 respectively. This change was related to social activity in the project. Beliefs in the nature of science exhibited a small, but significant increase, p = .04. Relative positioning of scores on the belief items suggests the increase is mostly due to reinforcement of current beliefs.
Capturing specific abilities as a window into human individuality: The example of face recognition

PubMed Central

Wilmer, Jeremy B.; Germine, Laura; Chabris, Christopher F.; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken

2013-01-01

Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality. PMID:23428079
Building out a Measurement Model to Incorporate Complexities of Testing in the Language Domain

ERIC Educational Resources Information Center

Wilson, Mark; Moore, Stephen

2011-01-01

This paper provides a summary of a novel and integrated way to think about the item response models (most often used in measurement applications in social science areas such as psychology, education, and especially testing of various kinds) from the viewpoint of the statistical theory of generalized linear and nonlinear mixed models. In addition,…

Trends in computer applications in science assessment

NASA Astrophysics Data System (ADS)

Kumar, David D.; Helgeson, Stanley L.

1995-03-01

Seven computer applications to science assessment are reviewed. Conventional test administration includes record keeping, grading, and managing test banks. Multiple-choice testing involves forced selection of an answer from a menu, whereas constructed-response testing involves options for students to present their answers within a set standard deviation. Adaptive testing attempts to individualize the test to minimize the number of items and time needed to assess a student's knowledge. Figurai response testing assesses science proficiency in pictorial or graphic mode and requires the student to construct a mental image rather than selecting a response from a multiple choice menu. Simulations have been found useful for performance assessment on a large-scale basis in part because they make it possible to independently specify different aspects of a real experiment. An emerging approach to performance assessment is solution pathway analysis, which permits the analysis of the steps a student takes in solving a problem. Virtually all computer-based testing systems improve the quality and efficiency of record keeping and data analysis.
Vocabulary Learning in a Yorkshire Terrier: Slow Mapping of Spoken Words

PubMed Central

Griebel, Ulrike; Oller, D. Kimbrough

2012-01-01

Rapid vocabulary learning in children has been attributed to “fast mapping”, with new words often claimed to be learned through a single presentation. As reported in 2004 in Science a border collie (Rico) not only learned to identify more than 200 words, but fast mapped the new words, remembering meanings after just one presentation. Our research tests the fast mapping interpretation of the Science paper based on Rico's results, while extending the demonstration of large vocabulary recognition to a lap dog. We tested a Yorkshire terrier (Bailey) with the same procedures as Rico, illustrating that Bailey accurately retrieved randomly selected toys from a set of 117 on voice command of the owner. Second we tested her retrieval based on two additional voices, one male, one female, with different accents that had never been involved in her training, again showing she was capable of recognition by voice command. Third, we did both exclusion-based training of new items (toys she had never seen before with names she had never heard before) embedded in a set of known items, with subsequent retention tests designed as in the Rico experiment. After Bailey succeeded on exclusion and retention tests, a crucial evaluation of true mapping tested items previously successfully retrieved in exclusion and retention, but now pitted against each other in a two-choice task. Bailey failed on the true mapping task repeatedly, illustrating that the claim of fast mapping in Rico had not been proven, because no true mapping task had ever been conducted with him. It appears that the task called retention in the Rico study only demonstrated success in retrieval by a process of extended exclusion. PMID:22363421
Volume 42, Issue5 (May 2005)Articles in the Current Issue:Developmental growth in students' concept of energy: Analysis of selected items from the TIMSS database

NASA Astrophysics Data System (ADS)

Liu, Xiufeng; McKeough, Anne

2005-05-01

The aim of this study was to develop a model of students' energy concept development. Applying Case's (1985, 1992) structural theory of cognitive development, we hypothesized that students' concept of energy undergoes a series of transitions, corresponding to systematic increases in working memory capacity. The US national sample from the Third International Mathematics and Science Study (TIMSS) database was used to test our hypothesis. Items relevant to the energy concept in the TIMSS test booklets for three populations were identified. Item difficulty from Rasch modeling was used to test the hypothesized developmental sequence, and percentage of students' correct responses was used to test the correspondence between students' age/grade level and level of the energy concepts. The analysis supported our hypothesized sequence of energy concept development and suggested mixed effects of maturation and schooling on energy concept development. Further, the results suggest that curriculum and instruction design take into consideration the developmental progression of students' concept of energy.
Investigation of Science Inquiry Items for Use on an Alternate Assessment Based on Modified Achievement Standards Using Cognitive Lab Methodology

ERIC Educational Resources Information Center

Dickenson, Tammiee S.; Gilmore, Joanna A.; Price, Karen J.; Bennett, Heather L.

2013-01-01

This study evaluated the benefits of item enhancements applied to science-inquiry items for incorporation into an alternate assessment based on modified achievement standards for high school students. Six items were included in the cognitive lab sessions involving both students with and without disabilities. The enhancements (e.g., use of visuals,…
Examining a math-science professional development program for teachers in grades 7-12 in an urban school district in New York State

NASA Astrophysics Data System (ADS)

Kaszczak, Lesia

With the adoption of the Common Core State Standards in New York State and the Next Generation Science Standards, it is more important than ever for school districts to develop professional development programs to provide teachers with the resources that will assist them in incorporating the new standards into their classroom instruction. This study focused on a mathematics and science professional development program known as STEMtastic STEM. The two purposes of the study were: to determine if there is an increase in STEM content knowledge of the participants involved in year two of a three year professional development program and to examine the teachers' perceptions of the impact of the professional development program on classroom instruction. The sample included teachers of grades 7-12 from an urban school district in New York State. The scores of a content knowledge pre-test and post-test were analyzed using a paired sample t-test to determine any significant differences in scores. In order to determine mathematics and science teachers' perceptions of the impact of the professional development program, responses from a 22 item Likert-style survey were analyzed to establish patterns of responses and to determine positive and negative perceptions of participants of the professional development program. A single sample t-test was used to determine if the responses were significantly positive. The results of this study indicated that there was no significant increase in content knowledge as a result of participation in the STEMtastic STEM professional development program. Both mathematics and science teachers exhibited significant positive perceptions of items dealing with hands-on participation during the professional development; support provided by STEMtastic STEM specialists; and the support provided by the administration. It was concluded that both mathematics and science teachers responded positively to the training they received during the professional development sessions, but that their classroom practices did not change as a result of the professional development program.
Popularity and Novelty Dynamics in Evolving Networks.

PubMed

Abbas, Khushnood; Shang, Mingsheng; Abbasi, Alireza; Luo, Xin; Xu, Jian Jun; Zhang, Yu-Xia

2018-04-20

Network science plays a big role in the representation of real-world phenomena such as user-item bipartite networks presented in e-commerce or social media platforms. It provides researchers with tools and techniques to solve complex real-world problems. Identifying and predicting future popularity and importance of items in e-commerce or social media platform is a challenging task. Some items gain popularity repeatedly over time while some become popular and novel only once. This work aims to identify the key-factors: popularity and novelty. To do so, we consider two types of novelty predictions: items appearing in the popular ranking list for the first time; and items which were not in the popular list in the past time window, but might have been popular before the recent past time window. In order to identify the popular items, a careful consideration of macro-level analysis is needed. In this work we propose a model, which exploits item level information over a span of time to rank the importance of the item. We considered ageing or decay effect along with the recent link-gain of the items. We test our proposed model on four various real-world datasets using four information retrieval based metrics.
Latent class analysis of diagnostic science assessment data using Bayesian networks

NASA Astrophysics Data System (ADS)

Steedle, Jeffrey Thomas

2008-10-01

Diagnostic science assessments seek to draw inferences about student understanding by eliciting evidence about the mental models that underlie students' reasoning about physical systems. Measurement techniques for analyzing data from such assessments embody one of two contrasting assessment programs: learning progressions and facet-based assessments. Learning progressions assume that students have coherent theories that they apply systematically across different problem contexts. In contrast, the facet approach makes no such assumption, so students should not be expected to reason systematically across different problem contexts. A systematic comparison of these two approaches is of great practical value to assessment programs such as the National Assessment of Educational Progress as they seek to incorporate small clusters of related items in their tests for the purpose of measuring depth of understanding. This dissertation describes an investigation comparing learning progression and facet models. Data comprised student responses to small clusters of multiple-choice diagnostic science items focusing on narrow aspects of understanding of Newtonian mechanics. Latent class analysis was employed using Bayesian networks in order to model the relationship between students' science understanding and item responses. Separate models reflecting the assumptions of the learning progression and facet approaches were fit to the data. The technical qualities of inferences about student understanding resulting from the two models were compared in order to determine if either modeling approach was more appropriate. Specifically, models were compared on model-data fit, diagnostic reliability, diagnostic certainty, and predictive accuracy. In addition, the effects of test length were evaluated for both models in order to inform the number of items required to obtain adequately reliable latent class diagnoses. Lastly, changes in student understanding over time were studied with a longitudinal model in order to provide educators and curriculum developers with a sense of how students advance in understanding over the course of instruction. Results indicated that expected student response patterns rarely reflected the assumptions of the learning progression approach. That is, students tended not to systematically apply a coherent set of ideas across different problem contexts. Even those students expected to express scientifically-accurate understanding had substantial probabilities of reporting certain problematic ideas. The learning progression models failed to make as many substantively-meaningful distinctions among students as the facet models. In statistical comparisons, model-data fit was better for the facet model, but the models were quite comparable on all other statistical criteria. Studying the effects of test length revealed that approximately 8 items are needed to obtain adequate diagnostic certainty, but more items are needed to obtain adequate diagnostic reliability. The longitudinal analysis demonstrated that students either advance in their understanding (i.e., switch to the more advanced latent class) over a short period of instruction or stay at the same level. There was no significant relationship between the probability of changing latent classes and time between testing occasions. In all, this study is valuable because it provides evidence informing decisions about modeling and reporting on student understanding, it assesses the quality of measurement available from short clusters of diagnostic multiple-choice items, and it provides educators with knowledge of the paths that student may take as they advance from novice to expert understanding over the course of instruction.
Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

NASA Astrophysics Data System (ADS)

Greenberg, Ariela Caren

Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
The Impact of Cooperative Quizzes in a Large Introductory Astronomy Course for Non-Science Majors

ERIC Educational Resources Information Center

Zeilik, Michael; Morris, Vicky J.

2004-01-01

In Astronomy 101 at the University of New Mexico, we carried out a repeated-items experiment on quizzes and tests to investigate the impact of cooperative testing. This trial was the only change in a reformed course format that had been refined over previous semesters. Our research questions were: (1) Did cooperative quizzes result in gains for…
Profile of science process skills of Preservice Biology Teacher in General Biology Course

NASA Astrophysics Data System (ADS)

Susanti, R.; Anwar, Y.; Ermayanti

2018-04-01

This study aims to obtain portrayal images of science process skills among preservice biology teacher. This research took place in Sriwijaya University and involved 41 participants. To collect the data, this study used multiple choice test comprising 40 items to measure the mastery of science process skills. The data were then analyzed in descriptive manner. The results showed that communication aspect outperfomed the other skills with that 81%; while the lowest one was identifying variables and predicting (59%). In addition, basic science process skills was 72%; whereas for integrated skills was a bit lower, 67%. In general, the capability of doing science process skills varies among preservice biology teachers.
Transformational Play as a Curricular Scaffold: Using Videogames to Support Science Education

NASA Astrophysics Data System (ADS)

Barab, Sasha A.; Scott, Brianna; Siyahhan, Sinem; Goldstone, Robert; Ingram-Goble, Adam; Zuiker, Steven J.; Warren, Scott

2009-08-01

Drawing on game-design principles and an underlying situated theoretical perspective, we developed and researched a 3D game-based curriculum designed to teach water quality concepts. We compared undergraduate student dyads assigned randomly to four different instructional design conditions where the content had increasingly level of contextualization: (a) expository textbook condition, (b) simplistic framing condition, (c) immersive world condition, and (d) a single-user immersive world condition. Results indicated that the immersive-world dyad and immersive-world single user conditions performed significantly better than the electronic textbook group on standardized items. The immersive-world dyad condition also performed significantly better than either the expository textbook or the descriptive framing condition on a performance-based transfer task, and performed significantly better than the expository textbook condition on standardized test items. Implications for science education, and consistent with the goals of this special issue, are that immersive game-based learning environments provide a powerful new form of curriculum for teaching and learning science.
78 FR 25723 - National Assessment Governing Board; Meeting

Federal Register 2010, 2011, 2012, 2013, 2014

2013-05-02

..., assistive listening devices, materials in alternative format) should notify Munira Mwalimu at 202- 357-6938.... to review secure NAEP test materials for Science Interactive Computer Tasks (ICTs) at grades 4, 8... provided with secure items and materials which are not yet available for release to the general public...
Young Adults’ Belief in Genetic Determinism, and Knowledge and Attitudes towards Modern Genetics and Genomics: The PUGGS Questionnaire

PubMed Central

Carver, Rebecca Bruu; Castéra, Jérémy; Gericke, Niklas; Evangelista, Neima Alice Menezes

2017-01-01

In this paper we present the development and validation a comprehensive questionnaire to assess college students’ knowledge about modern genetics and genomics, their belief in genetic determinism, and their attitudes towards applications of modern genetics and genomic-based technologies. Written in everyday language with minimal jargon, the Public Understanding and Attitudes towards Genetics and Genomics (PUGGS) questionnaire is intended for use in research on science education and public understanding of science, as a means to investigate relationships between knowledge, determinism and attitudes about modern genetics, which are to date little understood. We developed a set of core ideas and initial items from reviewing the scientific literature on genetics and previous studies on public and student knowledge and attitudes about genetics. Seventeen international experts from different fields (e.g., genetics, education, philosophy of science) reviewed the initial items and their feedback was used to revise the questionnaire. We validated the questionnaire in two pilot tests with samples of university freshmen students. The final questionnaire contains 45 items, including both multiple choice and Likert scale response formats. Cronbach alpha showed good reliability for each section of the questionnaire. In conclusion, the PUGGS questionnaire is a reliable tool for investigating public understanding and attitudes towards modern genetics and genomic-based technologies. PMID:28114357
Young Adults' Belief in Genetic Determinism, and Knowledge and Attitudes towards Modern Genetics and Genomics: The PUGGS Questionnaire.

PubMed

Carver, Rebecca Bruu; Castéra, Jérémy; Gericke, Niklas; Evangelista, Neima Alice Menezes; El-Hani, Charbel N

2017-01-01

In this paper we present the development and validation a comprehensive questionnaire to assess college students' knowledge about modern genetics and genomics, their belief in genetic determinism, and their attitudes towards applications of modern genetics and genomic-based technologies. Written in everyday language with minimal jargon, the Public Understanding and Attitudes towards Genetics and Genomics (PUGGS) questionnaire is intended for use in research on science education and public understanding of science, as a means to investigate relationships between knowledge, determinism and attitudes about modern genetics, which are to date little understood. We developed a set of core ideas and initial items from reviewing the scientific literature on genetics and previous studies on public and student knowledge and attitudes about genetics. Seventeen international experts from different fields (e.g., genetics, education, philosophy of science) reviewed the initial items and their feedback was used to revise the questionnaire. We validated the questionnaire in two pilot tests with samples of university freshmen students. The final questionnaire contains 45 items, including both multiple choice and Likert scale response formats. Cronbach alpha showed good reliability for each section of the questionnaire. In conclusion, the PUGGS questionnaire is a reliable tool for investigating public understanding and attitudes towards modern genetics and genomic-based technologies.
Validating Measurement of Knowledge Integration in Science Using Multiple-Choice and Explanation Items

ERIC Educational Resources Information Center

Lee, Hee-Sun; Liu, Ou Lydia; Linn, Marcia C.

2011-01-01

This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item…
Newsletter on Science, Technology & Human Values, Number 22.

ERIC Educational Resources Information Center

Shelanski, Vivien B., Ed.

This publication contains many news items such as meeting and conference descriptions and dates, proposed and current legislative action, publication descriptions, and topical articles. The items all pertain to the impact on humanity of science and technology. In this issue, news items include NSF, NEH, and AAAS meeting and seminar notices,…
Forensic science: the truth is out there

NASA Astrophysics Data System (ADS)

Herold, Lynne D.

2002-06-01

Criminalistics, one of the many sub-divisions of forensic science, is an applied science in which items of evidence are analyzed to provide investigative information and scientific evidence to be used in courts of law. Laboratories associated with governmental public agencies are typically involved in criminal cases as opposed to civil cases, and those types of cases that fall within the jurisdiction of the particular agency. Common analytical divisions within criminalistics laboratories include blood alcohol testing, toxicology, narcotics, questioned documents, biology, firearms, latent fingerprints, physical and trace evidence sections. Specialized field investigative services may be provided in the areas of clandestine drug laboratories and major crimes (firearms, biology, trace, arson/explosives). Forensic science best practice requires the use of non-destructive testing whenever reasonably possible. Several technically difficult situations (bodies and evidence encased in cement and metal) are presented as a challenge to audience.
76 FR 58032 - Notice of Intent To Repatriate a Cultural Item: Denver Museum of Nature and Science, Denver, CO

Federal Register 2010, 2011, 2012, 2013, 2014

2011-09-19

... Denver Museum of Nature & Science, Denver, CO, that meets the definition of an object of cultural... Cultural Item: Denver Museum of Nature and Science, Denver, CO AGENCY: National Park Service, Interior. ACTION: Notice. SUMMARY: The Denver Museum of Nature & Science, in consultation with the appropriate...
Effects of designed learning strategies to enhance biology students' understanding of the nature of science

NASA Astrophysics Data System (ADS)

Reeves, Carolyn T.

This research attempted to test the effectiveness of strategies designed for teaching the nature of science to Biology I students and to examine the effects of frequency of use of the strategies. Some strategies were designed to identify misconceptions about the nature of science; others were designed to correct misconceptions or provide correct concepts about the nature of science. This research commenced during the 3rd week of the 2001--2002 school year after obtaining IRB approval and permissions from school officials. The study ended after the 15th week. All participating students were given a pretest and a posttest of the Nature of Scientific Knowledge Scale Enhanced (NSKSE) test. Part I, 48 items, consisted of the NSKS test by Rubba & Anderson (1978). Part II, 10 items, consisted of a test constructed by the researcher. Part I contained questions about 6 tenets of the nature of science. Part II contained questions about how science works. The strategies were tested in two Biology I experimental classes, n = 41, and compared with two Biology I control classes, n = 34, by means of an analysis of covariance with the pretest scores used as the covariate. The overall mean posttest scores of the experimental and the control group were not found to be significantly different on either Part I, F(1,72) = 1.059, p = .307, or Part II, F(1,72) = 3.136, p = .081, of the test instrument. The number of times a strategy was used in each experimental classroom was determined. It was found that strategies were used almost twice as often in one classroom than in the other. A second set of ANCOVA analyses compared mean scores between Experimental Class A, Experimental Class B, and the control group. There was no significant difference between the groups on Part I, F(2,71) = .921, p = .403, but the difference between groups on Part II, F(2,71) = 5.769, p = .005, was significant. A post hoc Scheffe analysis showed that the class using strategies most often differed significantly with the control group, p = .009, but the other class did not, p = .929. This study suggests that frequent use of the designed strategies was effective in helping Biology I students understand some aspects of the nature of science. It also suggests that minimal use of the strategies was not effective.
The Interaction with Disabled Persons scale: revisiting its internal consistency and factor structure, and examining item-level properties.

PubMed

Iacono, Teresa; Tracy, Jane; Keating, Jenny; Brown, Ted

2009-01-01

The Interaction with Disabled Persons scale (IDP) has been used in research into baseline attitudes and to evaluate whether a shift in attitudes towards people with developmental disabilities has occurred following some form of intervention. This research has been conducted on the assumption that the IDP measures attitudes as a multidimensional construct and has good internal consistency. Such assumptions about the IDP appear flawed, particularly in light of failures to replicate its underlying factor structure. The aim of this study was to evaluate the construct validity and dimensionality of the IDP. This study used a prospective survey approach. Participants were recruited from first and second year undergraduate university students enrolled in health sciences, occupational therapy, physiotherapy, community and emergency health, nursing, and combined degrees of nursing and midwifery, and health sciences and social work at a large Australian university (n=373). Students completed the IDP, a 20-item self-report scale of attitudes towards people with disabilities. The IDP data were analysed using a combination of factor analysis (Classical Test Theory approach) and Rasch analysis (Item Response Theory approach). The results indicated that the original IDP 6-factor solution was not supported. Instead, one factor consisting of five IDP items (9, 11, 12, 17, and 18) labelled Discomfort met the four criteria for empirical validation of test quality: interval level scaling (scalability), unidimensionality, lacked of DIF across the two participant groups and data collection occasions, and hierarchical ordering. Researchers should consider using the Discomfort subscale of the IDP in future attitude research since it exhibits sound measurement properties.

Preschool children's interests in science

NASA Astrophysics Data System (ADS)

Coulson, R. I.

1991-12-01

Studies of children's attitudes towards science indicate that a tendency for girls and boys to have different patterns of interest in science is established by upper primary school level. It is not know when these interest patterns develop. This paper presents the results of part of a project designed to investigate preschool children's interests in science. Individual 4 5 year-old children were asked to say what they would prefer to do from each of a series of paired drawings showing either a science and a non-science activity, or activities from two different areas of science. Girls and boys were very similar in their overall patterns of choice for science and non-science items. Within science, the average number of physical science items chosen by boys was significantly greater than the average number chosen by girls (p=.026). Girls tended to choose more biology items than did boys, but this difference was not quite significant at the .05 level (p=.054). The temporal stability of these choices was explored.
A historical examination of the nature of science and its consensus as presented in the Benchmarks for Science Literacy and National Science Education Standards

NASA Astrophysics Data System (ADS)

Felske, Daniel D.

Developing a scientific literate citizenry has fueled science education reforms for the past 40 years. A review of the literature reveals that definitions of scientific literacy during this period were greatly influenced by the goals, directions, and political agendas of the day. This approach has resulted in programs emphasizing certain aspects of scientific literacy while neglecting others. Additionally, consensus on what scientific literacy means or how to develop it has not been achieved. One aspect of scientific literacy that is agreed upon is the essential role that the nature of science (NOS) plays in its development. For this reason, an extensive review of the literature was conducted to develop a comprehensive background of this topic. The component structure of the NOS revealed in the literature was then synthesized into a NOS framework. The NOS framework served to guide the construction of a 21 item questionnaire taken from statements embedded in the consensual documents Benchmarks for Science Literacy (AAAS, 1993) and National Science Education Standards (NRC, 1996). A panel of five experts who have written extensively on the nature of science was then assembled and the degree of NOS consensus measured using a modified Delphi technique. The results of the survey indicated a high level of consensus (95%) at the ≥80% level. The panelists concurred positively on 19 of 21 NOS items, concurred negatively on one of 21 NOS items (item 10), and could not reach consensus on one of 21 NOS items (item 16). These findings, as well as, the NOS framework, are important first steps toward developing programs that foster the development of scientific literacy.
Ukrainian Program for Material Science in Microgravity

NASA Astrophysics Data System (ADS)

Fedorov, Oleg

Ukrainian Program for Material Sciences in Microgravity O.P. Fedorov, Space Research Insti-tute of NASU -NSAU, Kyiv, The aim of the report is to present previous and current approach of Ukrainian research society to the prospect of material sciences in microgravity. This approach is based on analysis of Ukrainian program of research in microgravity, preparation of Russian -Ukrainian experiments on Russian segment of ISS and development of new Ukrainian strategy of space activity for the years 2010-2030. Two parts of issues are discussed: (i) the evolution of our views on the priorities in microgravity research (ii) current experiments under preparation and important ground-based results. item1 The concept of "space industrialization" and relevant efforts in Soviet and post -Soviet Ukrainian research institutions are reviewed. The main topics are: melt supercooling, crystal growing, testing of materials, electric welding and study of near-Earth environment. The anticipated and current results are compared. item 2. The main experiments in the framework of Ukrainian-Russian Research Program for Russian Segment of ISS are reviewed. Flight installations under development and ground-based results of the experiments on directional solidification, heat pipes, tribological testing, biocorrosion study is presented. Ground-based experiments and theoretical study of directional solidification of transparent alloys are reviewed as well as preparation of MORPHOS installation for study of succinonitrile -acetone in microgravity.
The Need to Introduce System Thinking in Teaching Climate Change

ERIC Educational Resources Information Center

Roychoudhury, Anita; Shepardson, Daniel P.; Hirsch, Andrew; Niyogi, Devdutta; Mehta, Jignesh; Top, Sara

2017-01-01

Research related to teaching climate change, system thinking, current reform in science education, and the research on reform-oriented assessment indicate that we need to explore student understanding in greater detail instead of only testing for an incremental gain in disciplinary knowledge. Using open-ended items we assessed details in student…
Faculty and Graduate Student PBL Experiences

ERIC Educational Resources Information Center

McDonald, Betty

2008-01-01

This paper examines similarities and differences in faculty and student perceptions to PBL training. Faculty at a newly formed university participated in a four day PBL [Problem-Based Learning] workshop. A cohort of MSc [Master of Science] Petroleum Engineering students were PBL trained. Results from the pre/post test using a 15 item dichotomous…
How an inquiry-based classroom lesson intervenes in science efficacy, career-orientation and self-determination

NASA Astrophysics Data System (ADS)

Schmid, S.; Bogner, F. X.

2017-11-01

Three subscales of the 'Science Motivation Questionnaire II' (SMQII; motivational components: career motivation, self-efficacy and self-determination), with 4 items each, were applied to a sample of 209 secondary school students to monitor the impact of a 3-hour structured inquiry lesson. Four testing points (before, immediately after, 6 and 12 weeks after) were applied. The modified SMQII was factor-analyzed at each testing cycle and the structure confirmed. Only self-determination was shown to be influenced by an inquiry course, while self-efficacy and career motivation did not. Only self-efficacy and career motivation were intercorrelated and also correlated with science subject grades and subsequent achievement. Implications for using the modified SMQII subscales for research and teaching in secondary school are discussed.
Voyager spacecraft electrostatic discharge testing

NASA Technical Reports Server (NTRS)

Whittlesey, A.; Inouye, G.

1980-01-01

The program of environmental testing undergone by the Voyager spacecraft in order to simulate the transient voltage effects of electrostatic discharges expected in the energetic plasma environment of Jupiter is reported. The testing consists of studies of the electrostatic discharge characteristics of spacecraft dielectrics in a vacuum-chamber-electron beam facility, brief piece part sensitivity tests on such items as a MOSFET multiplexer and the grounding of the thermal blanket, and assembly tests of the magnetometer boom and the science boom. In addition, testing of a complete spacecraft was performed using two arc sources to simulate long and short duration discharge sources for successive spacecraft shielding and grounding improvements. Due to the testing program, both Voyager 1 and Voyager 2 experienced tolerable electrostatic discharge-caused transient anomalies in science and engineering subsystems, however, a closer duplication of the spacecraft environment is necessary to predict and design actual spacecraft responses more accurately.
Learning Pathways in Environmental Science Education: The Case of Hazardous Household Items

ERIC Educational Resources Information Center

Malandrakis, George N.

2006-01-01

The present study draws on environmental science education to explore aspects of children's conceptual change regarding hazardous household items. Twelve children from a fifth-grade class attended a 300-h teaching module of environmentally oriented science activities aimed at assessing their awareness about the environmental and health hazards…
Diagnosing Conceptions about the Epistemology of Science: Contributions of a Quantitative Assessment Methodology

ERIC Educational Resources Information Center

Vázquez-Alonso, Ángel; Manassero-Mas, María-Antonia; García-Carmona, Antonio; Montesano de Talavera, Marisa

2016-01-01

This study applies a new quantitative methodological approach to diagnose epistemology conceptions in a large sample. The analyses use seven multiple-rating items on the epistemology of science drawn from the item pool Views on Science-Technology-Society (VOSTS). The bases of the new methodological diagnostic approach are the empirical…
Validation of science virtual test to assess 8th grade students' critical thinking on living things and environmental sustainability theme

NASA Astrophysics Data System (ADS)

Rusyati, Lilit; Firman, Harry

2017-05-01

This research was motivated by the importance of multiple-choice questions that indicate the elements and sub-elements of critical thinking and implementation of computer-based test. The method used in this research was descriptive research for profiling the validation of science virtual test to measure students' critical thinking in junior high school. The participant is junior high school students of 8th grade (14 years old) while science teacher and expert as the validators. The instrument that used as a tool to capture the necessary data are sheet of an expert judgment, sheet of legibility test, and science virtual test package in multiple choice form with four possible answers. There are four steps to validate science virtual test to measure students' critical thinking on the theme of "Living Things and Environmental Sustainability" in 7th grade Junior High School. These steps are analysis of core competence and basic competence based on curriculum 2013, expert judgment, legibility test and trial test (limited and large trial test). The test item criterion based on trial test are accepted, accepted but need revision, and rejected. The reliability of the test is α = 0.747 that categorized as `high'. It means the test instruments used is reliable and high consistency. The validity of Rxy = 0.63 means that the validity of the instrument was categorized as `high' according to interpretation value of Rxy (correlation).
Fermilab Friends for Science Education Store

Science.gov Websites

items mugs t-shirts posters sweatshirts for sale Fermilab logo items, mugs, t-shirts, sweatshirts and posters for sale. The Fermilab Friends for Science Education makes this website available to you to obtain
The integrated learning management using the STEM education for improve learning achievement and creativity in the topic of force and motion at the 9th grade level

NASA Astrophysics Data System (ADS)

Kakarndee, Nampetch; Kudthalang, Nukool; Jansawang, Natchanok

2018-01-01

The aims of this research study were to investigate and analyze the processing performances and the performance results (E1/E2) efficiency at the determining criteria for planning students' improvements to their learning processes toward their scientific knowledge were investigated, carry out the investigations, gathering evidence, and proposing explanations were developed and predicted. Students' engagements to their needs in unambiguous and clearly content of science teaching onto the instructional processes were attempted for establishing a national approach with the STEM education instructional method were strategized. Research administrations were designed to a sample size consisted of 40 secondary students in science class at the 9th grade level in Borabu School with the purposive sampling technique was selected. Using the STEM Education instructional innovation's lesson plans were managed learning activities. Students' learning achievements were assessed with the Pre-Test and Post-Test designs of 30 items. Students' creative thinking abilities were determined of their perceptions that obtained of the 3-item Creative Thinking Ability Test. The results for the effectiveness of the innovative instructional lesson plans based on the STEM Education Method, the lessoning effectiveness (E1/E2) evidences of 78.95/76.58 over the threshold setting is 75/75. Pretest-posttest designs for assessing students' learning achievements that impact a student's ability to achieve and explains with the STEM education instructional method were differences, significantly (ρ<.001) and the posttest of the 3-item Creative Thinking Ability Test designs for assessing Students' creative thinking abilities that impact a student's ability to have a good skill level in originality, fluency and flexibility thinking with the STEM education instructional method were differences, significantly (ρ<.001).
Science Teachers' Thinking About the Nature of Science: A New Methodological Approach to Its Assessment

NASA Astrophysics Data System (ADS)

Vázquez-Alonso, Ángel; García-Carmona, Antonio; Manassero-Mas, María Antonia; Bennàssar-Roig, Antoni

2013-04-01

This paper describes Spanish science teachers' thinking about issues concerning the nature of science (NOS) and the relationships connecting science, technology, and society (STS). The sample consisted of 774 in-service and pre-service teachers. The participants responded to a selection of items from the Questionnaire of Opinions on Science, Technology & Society in a multiple response model. These data were processed to generate the invariant indices that are used as the bases for subsequent quantitative and qualitative analyses. The overall results reflect moderately informed conceptions, and a detailed analysis by items, categories, and positions reveals a range of positive and negative conceptions about the topics of NOS dealt with in the questionnaire items. The implications of the findings for teaching and teacher training on the themes of NOS are discussed.
Assessing Student Outcomes of Undergraduate Research with URSSA, the Undergraduate Student Self-Assessment Instrument

NASA Astrophysics Data System (ADS)

Laursen, S. L.; Weston, T. J.; Thiry, H.

2012-12-01

URSSA is the Undergraduate Research Student Self-Assessment, an online survey instrument for programs and departments to use in assessing the student outcomes of undergraduate research (UR). URSSA focuses on what students learn from their UR experience, rather than whether they liked it. The online questionnaire includes both multiple-choice and open-ended items that focus on students' gains from undergraduate research. These gains include skills, knowledge, deeper understanding of the intellectual and practical work of science, growth in confidence, changes in identity, and career preparation. Other items probe students' participation in important research-related activities that lead to these gains (e.g. giving presentations, having responsibility for a project). These activities, and the gains themselves, are based in research and thus constitute a core set of items. Using these items as a group helps to align a particular program assessment with research-demonstrated outcomes. Optional items may be used to probe particular features that are augment the research experience (e.g. field trips, career seminars, housing arrangements). The URSSA items are based on extensive, interview-based research and evaluation work on undergraduate research by our group and others. This grounding in research means that URSSA measures what we know to be important about the UR experience The items were tested with students, revised and re-tested. Data from a large pilot sample of over 500 students enabled statistical testing of the items' validity and reliability. Optional items about UR program elements were developed in consultation with UR program developers and leaders. The resulting instrument is flexible. Users begin with a set of core items, then customize their survey with optional items to probe students' experiences of specific program elements. The online instrument is free and easy to use, with numeric results available as raw data, summary statistics, cross-tabs, and graphs, and as raw, downloadable data. Finally, URSSA has high content validity based on its research grounding and rigorous development. We will present examples of how URSSA has been used in evaluations of UR programs. A multi-year evaluation of a university-based UR program shows that URSSA items are sensitive to differences in students' prior level of experience with research. For example, experienced student researchers reported greater gains than did their peers new to UR in understanding the process of research and in coming to see themselves as scientists. These differences are consistent with interview data that suggest a developmental progression of gains as students pursue research and gain confidence in their ability to contribute meaningfully. A second example comes from a multi-site evaluation of sites funded by the National Science Foundation's Research Experience for Undergraduates (REU) program in Biology. This study acquired data from nearly 800 students at some 60 Bio REU sites in 2010 and 2011. Results reveal differences in gains among demographic groups, and the general strength of these well-planned programs relative to a comparison sample of UR programs that are not part of REU. Our presentation will demonstrate the evaluative use of URSSA and its potential applications to undergraduate research in the geosciences.
Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions.

PubMed

Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan

2016-01-01

This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.
Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions

PubMed Central

Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan

2016-01-01

This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability. PMID:26941699
A Method of Utilizing Small Astronomical Telescopes in Earth Science Instruction

NASA Astrophysics Data System (ADS)

Kim, Kyung-Im; Lee, Young Bom

1985-12-01

Four observational astronomical items have been pilottested with a 150mm refracting telescope in order to lay out the detailed procedures for the suggested(inquiry) activities listed in the high school earth science curriculum and to contrive some adequate instructions for students stressed on how to make proper treatments with the collected materials. The tested items were of sunspots' motion, the size of lunar craters, the Galilean satellites' revolution, and the galactic distribution of stars. Following series of activities are suggested with respect to the way of collecting observational data and of giving proper instruction to students in class: 1) Photography and other materials be made by teacher and/or extracurricular group of students; 2) Replicas(xeroxed, photographs, or slides) be made from the collected materials, so that they are available to all the students in class; 3) Quantative analyses be taken as sutdents' activities
A Study of STEM Assessments in Engineering, Science, and Mathematics for Elementary and Middle School Students

ERIC Educational Resources Information Center

Harwell, Michael; Moreno, Mario; Phillips, Alison; Guzey, S. Selcen; Moore, Tamara J.; Roehrig, Gillian H.

2015-01-01

The purpose of this study was to develop, scale, and validate assessments in engineering, science, and mathematics with grade appropriate items that were sensitive to the curriculum developed by teachers. The use of item response theory to assess item functioning was a focus of the study. The work is part of a larger project focused on increasing…
Development of Teachers' Attitude Scale towards Science Fair

ERIC Educational Resources Information Center

Tortop, Hasan Said

2013-01-01

This study was conducted to develop a new scale for measuring teachers' attitude towards science fair. Teacher Attitude Scale towards Science Fair (TASSF) is an inventory made up of 19 items and five dimensions. The study included such stages as literature review, the preparation of the item pool and the reliability and validity analysis. First of…
A study of Korean students' creativity in science using structural equation modeling

NASA Astrophysics Data System (ADS)

Jo, Son Mi

Through the review of creativity research I have found that studies lack certain crucial parts: (a) a theoretical framework for the study of creativity in science, (b) studies considering the unique components related to scientific creativity, and (c) studies of the interactions among key components through simultaneous analyses. The primary purpose of this study is to explore the dynamic interactions among four components (scientific proficiency, intrinsic motivation, creative competence, context supporting creativity) related to scientific creativity under the framework of scientific creativity. A total of 295 Korean middle school students participated. Well-known and commonly used measurements were selected and developed. Two scientific achievement scores and one score measured by performance-based assessment were used to measure student scientific knowledge/inquiry skills. Six items selected from the study of Lederman, Abd-El-Khalick, Bell, and Schwartz (2002) were used to assess how well students understand the nature of science. Five items were selected from the subscale of the scientific attitude inventory version II (Moore & Foy, 1997) to assess student attitude toward science. The Test of Creative Thinking-Drawing Production (Urban & Jellen, 1996) was used to measure creative competence. Eight items chosen from the 15 items of the Work Preference Inventory (1994) were applied to measure students' intrinsic motivation. To assess the level of context supporting creativity, eight items were adapted from measurement of the work environment (Amabile, Conti, Coon, Lazenby, and Herron, 1996). To assess scientific creativity, one open-ended science problem was used and three raters rated the level of scientific creativity through the Consensual Assessment Technique (Amabile, 1996). The results show that scientific proficiency and creative competence correlates with scientific creativity. Intrinsic motivation and context components do not predict scientific creativity. The strength of relationships between scientific proficiency and scientific creativity (estimate parameter=0.43) and creative competence and scientific creativity (estimate parameter=0.17) are similar [chi2.05(1)=0.670, P>.05]. In specific analysis of structural model, I found that creative competence and scientific proficiency play a role of partial mediators among three components (general creativity, scientific proficiency, and scientific creativity). The moderate effects of intrinsic motivation and context component were investigated, but the moderation effects were not found.

Scientific literacy of adult participants in an online citizen science project

NASA Astrophysics Data System (ADS)

Price, Charles Aaron

Citizen Science projects offer opportunities for non-scientists to take part in scientific research. Scientific results from these projects have been well documented. However, there is limited research about how these projects affect their volunteer participants. In this study, I investigate how participation in an online, collaborative astronomical citizen science project can be associated with the scientific literacy of its participants. Scientific literacy is measured through three elements: attitude towards science, belief in the nature of science and competencies associated with learning science. The first two elements are measured through a pre-test given to 1,385 participants when they join the project and a post-test given six months later to 125 participants. Attitude towards science was measured using nine Likert-items custom designed for this project and beliefs in the nature of science were measured using a modified version of the Nature of Science Knowledge scale. Responses were analyzed using the Rasch Rating Scale Model. Competencies are measured through analysis of discourse occurring in online asynchronous discussion forums using the Community of Inquiry framework, which describes three types of presence in the online forums: cognitive, social and teaching. Results show that overall attitudes did not change, p = .225. However, there was significant change towards attitudes about science in the news (positive) and scientific self efficacy (negative), p < .001 and p = .035 respectively. Beliefs in the nature of science exhibited a small, but significant increase, p = .04. Relative positioning of scores on the belief items did not change much, suggesting the increase is mostly due to reinforcement of current beliefs. The cognitive and teaching presence in the online forums did not change, p = .807 and p = .505 respectively. However, the social presence did change, p = .011. Overall, these results suggest that multi-faceted, collaborative citizen science projects can have an impact on some aspects of scientific literacy. Using the Rasch Model allowed us to uncover effects that may have otherwise been hidden. Future projects may want to include social interactivity between participants and also make participants specifically aware of how they are contributing to the entire scientific process.
77 FR 19699 - Notice of Intent to Repatriate Cultural Items: Rochester Museum & Science Center, Rochester, NY

Federal Register 2010, 2011, 2012, 2013, 2014

2012-04-02

... Indian tribe, has determined that the cultural items meet the definition of both sacred objects and... Rochester Museum & Science Center that meet the definition of both sacred objects and [[Page 19700
Testing the Zimbardo Time Perspective Inventory in the Chinese context.

PubMed

Wang, Ya; Chen, Xing-Jie; Cui, Ji-Fang; Liu, Lu-Lu

2015-09-01

In this study, the authors evaluated the Chinese version of the Zimbardo Time Perspective Inventory (ZTPI). The ZTPI was tested among a sample of 303 university students. A subsample of 51 participants was then asked to complete the ZTPI again along with another set of questionnaires. The five-factor model of a 20-item short version of the ZTPI showed good model fit, internal consistency, and test-retest reliability. The 20-item Chinese version of the ZTPI also provided good validity, showing correlations with other variables in expected directions. Past-Positive was positively correlated with reappraisal and negatively correlated with suppression emotion regulation strategies, and Present-Hedonistic was positively correlated with reappraisal emotion regulation strategies. These findings indicate that the ZTPI is a reliable and valid instrument for measuring time perspective in the Chinese setting. © 2015 The Institute of Psychology, Chinese Academy of Sciences and Wiley Publishing Asia Pty Ltd.
Maximum Likelihood Item Easiness Models for Test Theory without an Answer Key

ERIC Educational Resources Information Center

France, Stephen L.; Batchelder, William H.

2015-01-01

Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…
Thai Grade 11 Students' Alternative Conceptions for Acid-Base Chemistry

ERIC Educational Resources Information Center

Artdej, Romklao; Ratanaroutai, Thasaneeya; Coll, Richard Kevin; Thongpanchang, Tienthong

2010-01-01

This study involved the development of a two-tier diagnostic instrument to assess Thai high school students' understanding of acid-base chemistry. The acid-base diagnostic test (ABDT) comprising 18 items was administered to 55 Grade 11 students in a science and mathematics programme during the second semester of the 2008 academic year. Analysis of…
Trends in TIMSS Responses over Time: Evidence of Global Forces in Education?

ERIC Educational Resources Information Center

Rutkowski, Leslie; Rutkowski, David

2009-01-01

In this article, the influence of global processes on international mathematics curricula as evidenced by item responses to 3 Trends in International Mathematics and Science Study (TIMSS) administrations (1995, 1999, and 2003) is considered. Based on Dale's (2000) argument, we set out to test 2 plausible impacts of global processes on education.…
Understanding the Reading Attributes and Their Cognitive Relationships on a High-Stakes Biology Assessment

NASA Astrophysics Data System (ADS)

Rawlusyk, Kevin James

Test items used to assess learners' knowledge on high-stakes science examinations contain contextualized questions that unintentionally assess reading skill along with conceptual knowledge. Therefore, students who are not proficient readers are unable to comprehend the text within the test item to demonstrate effectively their level of science knowledge. The purpose of this quantitative study was to understand what reading attributes were required to successfully answer the Biology 30 Diploma Exam. Furthermore, the research sought to understand the cognitive relationships among the reading attributes through quantitative analysis structured by the Attribute Hierarchy Model (AHM). The research consisted of two phases: (1) Cognitive development, where the cognitive attributes of the Biology 30 Exam were specified and hierarchy structures were developed; and (2) Psychometric analysis, that statistically tested the attribute hierarchy using the Hierarchy Consistency Index (HCI), and calculate attribute probabilities. Phase one of the research used January 2011, Biology 30 Diploma Exam, while phase two accessed archival data for the 9985 examinees who took the assessment on January 24th, 2011. Phase one identified ten specific reading attributes, of which five were identified as unique subsets of vocabulary, two were identified as reading visual representations, and three corresponded to general reading skills. Four hierarchical cognitive model were proposed then analyzed using the HCI as a mechanism to explain the relationship among the attributes. Model A had the highest HCI value (0.337), indicating an overall poor data fit, yet for the top achieving examinees the model had an excellent model fit with an HCI value of 0.888, and for examinees that scored over 60% there was a moderate model fit (HCI = 0.592). Linear regressions of the attribute probability estimates suggest that there is a cognitive relationship among six of the ten reading attributes (R2 = 0.958 and 0.922). The results conclude that the Biology 30 Diploma Exam requires examinee to understand specific reading attributes to answer test items successfully. Knowing the specific reading attributes associated with the Biology 30 Diploma Exam allows for teachers and test developers to better assess learners and to be aware that there are other cognitive processes that influence test results other than the examinees science knowledge.
Emotional intelligence in medical laboratory science

NASA Astrophysics Data System (ADS)

Price, Travis

The purpose of this study was to explore the role of emotional intelligence (EI) in medical laboratory science, as perceived by laboratory administrators. To collect and evaluate these perceptions, a survey was developed and distributed to over 1,400 medical laboratory administrators throughout the U.S. during January and February of 2013. In addition to demographic-based questions, the survey contained a list of 16 items, three skills traditionally considered important for successful work in the medical laboratory as well as 13 EI-related items. Laboratory administrators were asked to rate each item for its importance for job performance, their satisfaction with the item's demonstration among currently working medical laboratory scientists (MLS) and the amount of responsibility college-based medical laboratory science programs should assume for the development of each skill or attribute. Participants were also asked about EI training in their laboratories and were given the opportunity to express any thoughts or opinions about EI as it related to medical laboratory science. This study revealed that each EI item, as well as each of the three other items, was considered to be very or extremely important for successful job performance. Administrators conveyed that they were satisfied overall, but indicated room for improvement in all areas, especially those related to EI. Those surveyed emphasized that medical laboratory science programs should continue to carry the bulk of the responsibility for the development of technical skills and theoretical knowledge and expressed support for increased attention to EI concepts at the individual, laboratory, and program levels.
Practice-Based Measures of Elementary Science Teachers' Content Knowledge for Teaching: Initial Item Development and Validity Evidence. Research Report. ETS RR-17-43

ERIC Educational Resources Information Center

Mikeska, Jamie N.; Phelps, Geoffrey; Croft, Andrew J.

2017-01-01

This report describes efforts by a group of science teachers, teacher educators, researchers, and content specialists to conceptualize, develop, and pilot practice-based assessment items designed to measure elementary science teachers' content knowledge for teaching (CKT). The report documents the framework used to specify the content-specific…
The development and validation of the Instructional Practices Log in Science: a measure of K-5 science instruction

NASA Astrophysics Data System (ADS)

Adams, Elizabeth L.; Carrier, Sarah J.; Minogue, James; Porter, Stephen R.; McEachin, Andrew; Walkowiak, Temple A.; Zulli, Rebecca A.

2017-02-01

The Instructional Practices Log in Science (IPL-S) is a daily teacher log developed for K-5 teachers to self-report their science instruction. The items on the IPL-S are grouped into scales measuring five dimensions of science instruction: Low-level Sense-making, High-level Sense-making, Communication, Integrated Practices, and Basic Practices. As part of the current validation study, 206 elementary teachers completed 4137 daily log entries. The purpose of this paper is to provide evidence of validity for the IPL-S's scales, including (a) support for the theoretical framework; (b) cognitive interviews with logging teachers; (c) item descriptive statistics; (d) comparisons of 28 pairs of teacher and rater logs; and (e) an examination of the internal structure of the IPL-S. We present evidence to describe the extent to which the items and the scales are completed accurately by teachers and differentiate various types of science instructional strategies employed by teachers. Finally, we point to several practical implications of our work and potential uses for the IPL-S. Overall, results provide neutral to positive support for the validity of the groupings of items or scales.
The Profiles in Science Digital Library: Behind the Scenes.

PubMed

Gallagher, Marie E; Moffatt, Christie

2012-01-01

This demonstration shows the Profiles in Science ® digital library. Profiles in Science contains digitized selections from the personal manuscript collections of prominent biomedical researchers, medical practitioners, and those fostering science and health. The Profiles in Science Web site is the delivery mechanism for content derived from the digital library system. The system is designed according to our basic principles for digital library development [1]. The digital library includes the rules and software used for digitizing items, creating and editing database records and performing quality control as well as serving the digital content to the public. Among the types of data managed by the digital library are detailed item-level, collection-level and cross-collection metadata, digitized photographs, papers, audio clips, movies, born-digital electronic files, optical character recognized (OCR) text, and annotations (see Figure 1). The digital library also tracks the status of each item, including digitization quality, sensitivity of content, and copyright. Only items satisfying all required criteria are released to the public through the World Wide Web. External factors have influenced all aspects of the digital library's infrastructure.
Applications of Derandomization Theory in Coding

NASA Astrophysics Data System (ADS)

Cheraghchi, Mahdi

2011-07-01

Randomized techniques play a fundamental role in theoretical computer science and discrete mathematics, in particular for the design of efficient algorithms and construction of combinatorial objects. The basic goal in derandomization theory is to eliminate or reduce the need for randomness in such randomized constructions. In this thesis, we explore some applications of the fundamental notions in derandomization theory to problems outside the core of theoretical computer science, and in particular, certain problems related to coding theory. First, we consider the wiretap channel problem which involves a communication system in which an intruder can eavesdrop a limited portion of the transmissions, and construct efficient and information-theoretically optimal communication protocols for this model. Then we consider the combinatorial group testing problem. In this classical problem, one aims to determine a set of defective items within a large population by asking a number of queries, where each query reveals whether a defective item is present within a specified group of items. We use randomness condensers to explicitly construct optimal, or nearly optimal, group testing schemes for a setting where the query outcomes can be highly unreliable, as well as the threshold model where a query returns positive if the number of defectives pass a certain threshold. Finally, we design ensembles of error-correcting codes that achieve the information-theoretic capacity of a large class of communication channels, and then use the obtained ensembles for construction of explicit capacity achieving codes. [This is a shortened version of the actual abstract in the thesis.
Clinical learning environments (actual and expected): perceptions of Iran University of Medical Sciences nursing students

PubMed Central

Bigdeli, Shoaleh; Pakpour, Vahid; Aalaa, Maryam; Shekarabi, Robabeh; Sanjari, Mahnaz; Haghani, Hamid; Mehrdad, Neda

2015-01-01

Background: Educational clinical environment has an important role in nursing students' learning. Any difference between actual and expected clinical environment will decrease nursing students’ interest in clinical environments and has a negative correlation with their clinical performance. Methods: This descriptive cross-sectional study is an attempt to compare nursing students' perception of the actual and expected status of clinical environments in medical-surgical wards. Participants of the study were 127 bachelor nursing students of Iran University of Medical Sciences in the internship period. Data gathering instruments were a demographic questionnaire (including sex, age, and grade point average), and the Clinical Learning Environment Inventory (CLEI) originally developed by Professor Chan (2001), in which its modified Farsi version (Actual and Preferred forms) consisting 42 items, 6 scales and 7 items per scale was used. Descriptive and inferential statistics (t-test, paired t-test, ANOVA) were used for data analysis through SPSS version 16. Results: The results indicated that there were significant differences between the preferred and actual form in all six scales. In other word, comparing with the actual form, the mean scores of all items in the preferred form were higher. The maximum mean difference was in innovation and the highest mean difference was in involvement scale. Conclusion: It is concluded that nursing students do not have a positive perception of their actual clinical teaching environment and this perception is significantly different from their perception of their expected environment. PMID:26034726
Effect of individual thinking styles on item selection during study time allocation.

PubMed

Jia, Xiaoyu; Li, Weijian; Cao, Liren; Li, Ping; Shi, Meiling; Wang, Jingjing; Cao, Wei; Li, Xinyu

2018-04-01

The influence of individual differences on learners' study time allocation has been emphasised in recent studies; however, little is known about the role of individual thinking styles (analytical versus intuitive). In the present study, we explored the influence of individual thinking styles on learners' application of agenda-based and habitual processes when selecting the first item during a study-time allocation task. A 3-item cognitive reflection test (CRT) was used to determine individuals' degree of cognitive reliance on intuitive versus analytical cognitive processing. Significant correlations between CRT scores and the choices of first item selection were observed in both Experiment 1a (study time was 5 seconds per triplet) and Experiment 1b (study time was 20 seconds per triplet). Furthermore, analytical decision makers constructed a value-based agenda (prioritised high-reward items), whereas intuitive decision makers relied more upon habitual responding (selected items from the leftmost of the array). The findings of Experiment 1a were replicated in Experiment 2 notwithstanding ruling out the possible effects from individual intelligence and working memory capacity. Overall, the individual thinking style plays an important role on learners' study time allocation and the predictive ability of CRT is reliable in learners' item selection strategy. © 2016 International Union of Psychological Science.
Opportunity integrated assessment facilitating critical thinking and science process skills measurement on acid base matter

NASA Astrophysics Data System (ADS)

Sari, Anggi Ristiyana Puspita; Suyanta, LFX, Endang Widjajanti; Rohaeti, Eli

2017-05-01

Recognizing the importance of the development of critical thinking and science process skills, the instrument should give attention to the characteristics of chemistry. Therefore, constructing an accurate instrument for measuring those skills is important. However, the integrated instrument assessment is limited in number. The purpose of this study is to validate an integrated assessment instrument for measuring students' critical thinking and science process skills on acid base matter. The development model of the test instrument adapted McIntire model. The sample consisted of 392 second grade high school students in the academic year of 2015/2016 in Yogyakarta. Exploratory Factor Analysis (EFA) was conducted to explore construct validity, whereas content validity was substantiated by Aiken's formula. The result shows that the KMO test is 0.714 which indicates sufficient items for each factor and the Bartlett test is significant (a significance value of less than 0.05). Furthermore, content validity coefficient which is based on 8 experts is obtained at 0.85. The findings support the integrated assessment instrument to measure critical thinking and science process skills on acid base matter.
Item Feature Effects in Evolution Assessment

ERIC Educational Resources Information Center

Nehm, Ross H.; Ha, Minsu

2011-01-01

Despite concerted efforts by science educators to understand patterns of evolutionary reasoning in science students and teachers, the vast majority of evolution education studies have failed to carefully consider or control for item feature effects in knowledge measurement. Our study explores whether robust contextualization patterns emerge within…
The influence of classroom experiences on community college students self-efficacy, attitude, and future intentions

NASA Astrophysics Data System (ADS)

Dawkins, Linda Mulderig

Science and technology are an integral part of everyday life. Therefore it is necessary that the general population have some understanding and appreciation for science. Participating in activities that are science-related is one way a person could enhance their understanding and appreciation for science. According to the Theory of Planned Behavior (TPB), the attitude and self-efficacy beliefs a person holds regarding an object or activity will influence behavioral intentions (Ajzen, 1991). Therefore, if science educators can have a positive influence on their students' attitude and sense of efficacy toward science, perhaps the result will be a populace who willingly participates in science-related activities, ultimately gaining a better understanding and appreciation for science. The present study examined the relationships between the classroom environment students experienced during a ten week period of introductory chemistry and their attitudes toward chemistry (and general science), chemistry self-efficacy, and intentions to participate in chemistry-related activities in the future. The participants of this study (N = 189) were Midwestern community college students enrolled in an introductory chemistry course. The efficacy scale of the Chemistry Attitude and Experiences Questionnaire (CAEQ) developed by Dalgety, Coll, and Jones (2003) was used to measure student chemistry self-efficacy. The attitude scale used in this study consisted of the attitude toward chemistry items of CAEQ and five additional items pertaining to general science attitude. The classroom environment scale was defined by two measures: (1) instructional pedagogies and (2) teacher immediacy behaviors. The items within the instructional pedagogies and teacher immediacy measures were based on previous research that focused on identifying teaching techniques and teacher attributes that were conducive to promoting an engaging, supportive classroom environment that would promote better attitude toward science and stronger science self-efficacy beliefs. Exploratory factor analysis of the attitude items revealed that students did not differentiate between general science attitude and chemistry attitude. Therefore, all twenty-six attitude items were combined into one attitude measure. Additionally, factor analysis revealed that the items designed to measure the separate dimensions of instructional pedagogies and teacher immediacy behavior both loaded highly on the same factor, resulting in the combing of these two sets of items into one measure of classroom environment. Structural equations modeling (SEM) analyses of the relationships between student perceptions of the classroom environment and their attitude, efficacy and intentions to participate in chemistry-related activities revealed that a positive classroom environment was associated with positive changes in both attitude toward chemistry/science and chemistry self-efficacy, as hypothesized. These analyses also supported the hypothesis that a positive change in chemistry self-efficacy beliefs mediated student intentions to participate in chemistry-related activities. However, the findings did not support the hypothesis that positive changes in attitude toward chemistry/science would mediate participation in chemistry-related activities.
[Development of an Instrument to Assess the Quality of Childbirth Care from the Mother's Perspective].

PubMed

Jeong, Geum Hee; Kim, Hyun Kyoung; Kim, Young Hee; Kim, Sun Hee; Lee, Sun Hee; Kim, Kyung Won

2018-02-01

This study aimed to develop an instrument to assess the quality of childbirth care from the perspective of a mother after delivery. The instrument was developed from a literature review, interviews, and item validation. Thirty-eight items were compiled for the instrument. The data for validity and reliability testing were collected using a questionnaire survey conducted on 270 women who had undergone normal vaginal delivery in Korea and analyzed with descriptive statistics, exploratory factor analysis, and reliability coefficients. The exploratory factor analysis reduced the number of items in the instrument to 28 items that were factored into four subscales: family-centered care, personal care, emotional empowerment, and information provision. With respect to convergence validation, there was positive correlation between this instrument and birth satisfaction scale (r=.34, p<.001). The internal consistency reliability was acceptable (Cronbach's alpha =.96). This instrument could be used as a measure of the quality of nursing care for women who have a normal vaginal delivery. © 2018 Korean Society of Nursing Science.
Textbooks vs. techbooks: Effectiveness of digital textbooks on elementary student motivation for learning

NASA Astrophysics Data System (ADS)

Oman, Auna

This action research project investigated fourth grade students¡¦ motivation to learn science using a digital science techbook. Participants in the study included 29 fourth grade students in two different classrooms. One classroom of 16 students used a digital science techbook to learn science while the other classroom of 13 students used a traditional paper science textbook to learn science. Students in both classrooms answered five sets of questions regarding their experience using a digital science techbook and a paper science techbook to understand science, find science information, solve science problems, learn science, and assess learning science was fun. Results were compiled and coded based on positive and negative responses to conditions. A chi-square was used to analyze the ordinal data. Overall differences between techbooks vs. textbook were significant, X2 (1, N = 29) = 23.84, p = .000, justifying further examination of individual survey items. Three items had statistically significant difference for finding science information, solving science problems, and learning science. A gender difference was also found in one item. Females preferred to use paper science textbooks to understand science, while males preferred digital techbooks to learn science. The fourth graders in this study indicated that digital techbooks were a powerful learning tool for increasing interest, excitement and learning science. Even though students reported paper science textbooks as easy to use, they found using digital science techbooks a far more appealing way to learn science.
A teaching intervention for reading laboratory experiments in college-level introductory chemistry

NASA Astrophysics Data System (ADS)

Kirk, Maria Kristine

The purpose of this study was to determine the effects that a pre-laboratory guide, conceptualized as a "scientific story grammar," has on college chemistry students' learning when they read an introductory chemistry laboratory manual and perform the experiments in the chemistry laboratory. The participants (N = 56) were students enrolled in four existing general chemistry laboratory sections taught by two instructors at a women's liberal arts college. The pre-laboratory guide consisted of eight questions about the experiment, including the purpose, chemical species, variables, chemical method, procedure, and hypothesis. The effects of the intervention were compared with those of the traditional pre-laboratory assignment for the eight chemistry experiments. Measures included quizzes, tests, chemistry achievement test, science process skills test, laboratory reports, laboratory average, and semester grade. The covariates were mathematical aptitude and prior knowledge of chemistry and science processes, on which the groups differed significantly. The study captured students' perceptions of their experience in general chemistry through a survey and interviews with eight students. The only significant differences in the treatment group's performance were in some subscores on lecture items and laboratory items on the quizzes. An apparent induction period was noted, in that significant measures occurred in mid-semester. Voluntary study with the pre-laboratory guide by control students precluded significant differences on measures given later in the semester. The groups' responses to the survey were similar. Significant instructor effects on three survey items were corroborated by the interviews. The researcher's students were more positive about their pre-laboratory tasks, enjoyed the laboratory sessions more, and were more confident about doing chemistry experiments than the laboratory instructor's groups due to differences in scaffolding by the instructors.

The development and validation of a three-tier diagnostic test measuring pre-service elementary education and secondary science teachers' understanding of the water cycle

NASA Astrophysics Data System (ADS)

Schaffer, Dannah Lynn

The main goal of this research study was to develop and validate a three-tier diagnostic test to determine pre-service teachers' (PSTs) conceptual knowledge of the water cycle. For a three-tier diagnostic test, the first tier assesses content knowledge; in the second tier, a reason is selected for the content answer; and the third tier allows test-takers to select how confident they are in their answers for the first two tiers. The second goal of this study was to diagnose any alternative conceptions PSTs might have about the water cycle. The Water Cycle Diagnostic Test (WCDT) was developed using the theoretical framework by Treagust (1986, 1988, and 1995), and in similar studies that developed diagnostic tests (e.g., Calean & Subramaniam, 2010a; Odom & Barrow, 2007; Pesman & Eryilmaz, 2010). The final instrument consisted of 15 items along with a demographic survey that examined PSTs' weather-related experiences that may or may not have affected the PSTs' understanding of the water cycle. The WCDT was administered to 77 PSTs enrolled in science methods courses during the fall of 2012. Among the 77 participants, 37 of the PSTs were enrolled in elementary education (EPST) and 40 in secondary science (SPST). Using exploratory factor analysis, five categories were factored out for the WCDT: Phase Change of Water; Condensation and Storage; Clouds; Global Climate Change; and Movement through the Water Cycle. Analysis of the PSTs' responses demonstrated acceptable reliability (alpha = 0.62) for the instrument, and acceptable difficulty indices and discrimination indices for 12 of the items. Analysis indicated that the majority of the PSTs had a limited understanding of the water cycle. Of the PSTs sampled, SPSTs were significantly more confident in their answers' on the WCDT than the EPSTs. Completion of an undergraduate atmospheric science and/or meteorology course, as well as a higher interest in listening and/or viewing weather-related programs, resulted in PSTs having greater understanding and confidence in their answers on the WCDT. The analysis of the PSTs' responses revealed 49 potential alternative conceptions and areas where PSTs' lack of knowledge was revealed from the WCDT.
Equating TIMSS Mathematics Subtests with Nonlinear Equating Methods Using NEAT Design: Circle-Arc Equating Approaches

ERIC Educational Resources Information Center

Ozdemir, Burhanettin

2017-01-01

The purpose of this study is to equate Trends in International Mathematics and Science Study (TIMSS) mathematics subtest scores obtained from TIMSS 2011 to scores obtained from TIMSS 2007 form with different nonlinear observed score equating methods under Non-Equivalent Anchor Test (NEAT) design where common items are used to link two or more test…
A Cognitive Diagnostic Modeling of Attribute Mastery in Massachusetts, Minnesota, and the U.S. National Sample Using the TIMSS 2007

ERIC Educational Resources Information Center

Lee, Young-Sun; Park, Yoon Soo; Taylan, Didem

2011-01-01

Studies of international mathematics achievement such as the Trends in Mathematics and Science Study (TIMSS) have employed classical test theory and item response theory to rank individuals within a latent ability continuum. Although these approaches have provided insights into comparisons between countries, they have yet to examine how specific…
Rotation Criteria and Hypothesis Testing for Exploratory Factor Analysis: Implications for Factor Pattern Loadings and Interfactor Correlations

ERIC Educational Resources Information Center

Schmitt, Thomas A.; Sass, Daniel A.

2011-01-01

Exploratory factor analysis (EFA) has long been used in the social sciences to depict the relationships between variables/items and latent traits. Researchers face many choices when using EFA, including the choice of rotation criterion, which can be difficult given that few research articles have discussed and/or demonstrated their differences.…
High-School Students' Epistemic Knowledge of Science and Its Relation to Learner Factors in Science Learning

NASA Astrophysics Data System (ADS)

Yang, Fang-Ying; Liu, Shiang-Yao; Hsu, Chung-Yuan; Chiou, Guo-Li; Wu, Hsin-Kai; Wu, Ying-Tien; Chen, Sufen; Liang, Jyh-Chong; Tsai, Meng-Jung; Lee, Silvia W.-Y.; Lee, Min-Hsien; Lin, Che-Li; Chu, Regina Juchun; Tsai, Chin-Chung

2017-04-01

The purpose of this study was to develop and validate an online contextualized test for assessing students' understanding of epistemic knowledge of science. In addition, how students' understanding of epistemic knowledge of science interacts with learner factors, including time spent on science learning, interest, self-efficacy, and gender, was also explored. The participants were 489 senior high school students (244 males and 245 females) from eight different schools in Taiwan. Based on the result of an extensive literature review, we first identified six factors of epistemic knowledge of science, such as status of scientific knowledge, the nature of scientific enterprise, measurement in science, and so on. An online test was then created for assessing students' understanding of the epistemic knowledge of science. Also, a learner-factor survey was developed by adopting previous PISA survey items to measure the abovementioned learner factors. The results of this study show that; (1) by factor analysis, the six factors of epistemic knowledge of science could be grouped into two dimensions which reflect the nature of scientific knowledge and knowing in science, respectively; (2) there was a gender difference in the participants' understanding of the epistemic knowledge of science; and (3) students' interest in science learning and the time spent on science learning were positively correlated to their understanding of the epistemic knowledge of science.
The effect of participation in an extended inquiry project on general chemistry student laboratory interactions, confidence, and process skills

NASA Astrophysics Data System (ADS)

Krystyniak, Rebecca A.

2001-12-01

This study explored the effect of participation by second-semester general chemistry students in an extended open-inquiry laboratory investigation on their use of science process skills and confidence in performing specific aspects of laboratory investigations. In addition, verbal interactions of a student lab team among team members and with their instructor over three open-inquiry laboratory sessions and two non-inquiry sessions were investigated. Instruments included the Test of Integrated Skills (TIPS), a 36-item multiple-choice instrument, and the Chemistry Laboratory Survey (CLS), a researcher co-designed 20-item 8-point instrument. Instruments were administered at the beginning and close of the semester to 157 second-semester general chemistry students at the two universities; students at only one university participated in open-inquiry activity. A MANCOVA was performed to investigate relationships among control and experimental students, TIPS, and CLS post-test scores. Covariates were TIPS and CLS pre-test scores and prior high school and college science experience. No significant relationships were found. Wilcoxen analyses indicated both groups showed increase in confidence; experimental-group students with below-average TIPS pre-test scores showed a significant increase in science process skills. Transcribed audio tapes of all laboratory-based verbal interactions were analyzed. Coding categories, developed using the constant comparison method, led to an inter-rater reliability of .96. During open-inquiry activities, the lab team interacted less often, sought less guidance from their instructor, and talked less about chemistry concepts than during non-inquiry activities. Evidence confirmed that students used science process skills and engaged in higher-order thinking during both types of activities. A four-student focus shared their experiences with open-inquiry activities, indicating that they enjoyed the experience, viewed it as worthwhile, and believed it helped them gain understanding of the nature of chemistry research. Research results indicate that participation in open-inquiry laboratory increases student confidence and, for some students, the ability to use science process skills. Evidence documents differences in student laboratory interactions and behavior that are attributable to the type of laboratory experience. Further research into aspects of open-inquiry laboratory experiences is recommended.
WFPC2 Science Capability Report

NASA Astrophysics Data System (ADS)

Brown, David I.

2001-01-01

In the following pages, a brief outline of the salient science features of Wide Field/Planetary Camera 2 (WFPC2) that impact the proposal writing process and conceptual planning of observations is presented. At the time of writing, WFPC2, while having been better defined than in the past, is far from being at the stage where science and engineering details are well enough known that concrete observational/operational sequences can be plannned with assurance. Conceptual issues are another matter. The thrust of the Science Capability Report at this time is to outline the known performance parameters and capabilities of WFPC2, filling in with specifications when necessary to hold a place for these items as they become known. Also, primary scientific and operational differences between WFPC 1 and 2 are discussed section-by-section, along with issues that remain to be determined and idiosyncrasies when known. Clearly the determination of the latter awaits some form of testing, most likely thermal/vacuum testing. All data in this report should be viewed with a jaundiced eye at this time.
Program on Public Conceptions of Science, Newsletter 10.

ERIC Educational Resources Information Center

Blanpied, William A., Ed.; Shelanski, Vivien, Ed.

This newsletter is divided into six sections: an introduction; general news items and communications from readers; news items and communications more specifically in the ethical and human values areas; an annotated, selective checklist of imaginative literature concerning the relationship between science, technology and human values; and a general…
Analysing task design and students' responses to context-based problems through different analytical frameworks

NASA Astrophysics Data System (ADS)

Broman, Karolina; Bernholt, Sascha; Parchmann, Ilka

2015-05-01

Background:Context-based learning approaches are used to enhance students' interest in, and knowledge about, science. According to different empirical studies, students' interest is improved by applying these more non-conventional approaches, while effects on learning outcomes are less coherent. Hence, further insights are needed into the structure of context-based problems in comparison to traditional problems, and into students' problem-solving strategies. Therefore, a suitable framework is necessary, both for the analysis of tasks and strategies. Purpose:The aim of this paper is to explore traditional and context-based tasks as well as students' responses to exemplary tasks to identify a suitable framework for future design and analyses of context-based problems. The paper discusses different established frameworks and applies the Higher-Order Cognitive Skills/Lower-Order Cognitive Skills (HOCS/LOCS) taxonomy and the Model of Hierarchical Complexity in Chemistry (MHC-C) to analyse traditional tasks and students' responses. Sample:Upper secondary students (n=236) at the Natural Science Programme, i.e. possible future scientists, are investigated to explore learning outcomes when they solve chemistry tasks, both more conventional as well as context-based chemistry problems. Design and methods:A typical chemistry examination test has been analysed, first the test items in themselves (n=36), and thereafter 236 students' responses to one representative context-based problem. Content analysis using HOCS/LOCS and MHC-C frameworks has been applied to analyse both quantitative and qualitative data, allowing us to describe different problem-solving strategies. Results:The empirical results show that both frameworks are suitable to identify students' strategies, mainly focusing on recall of memorized facts when solving chemistry test items. Almost all test items were also assessing lower order thinking. The combination of frameworks with the chemistry syllabus has been found successful to analyse both the test items as well as students' responses in a systematic way. The framework can therefore be applied in the design of new tasks, the analysis and assessment of students' responses, and as a tool for teachers to scaffold students in their problem-solving process. Conclusions:This paper gives implications for practice and for future research to both develop new context-based problems in a structured way, as well as providing analytical tools for investigating students' higher order thinking in their responses to these tasks.
Primary and Secondary School Science.

ERIC Educational Resources Information Center

Educational Documentation and Information, 1984

1984-01-01

This 344-item annotated bibliography presents overview of science teaching in following categories: science education; primary school science; integrated science teaching; teaching of biology, chemistry, physics, earth/space science; laboratory work; computer technology; out-of-school science; science and society; science education at…
Improving Factor Score Estimation Through the Use of Observed Background Characteristics

PubMed Central

Curran, Patrick J.; Cole, Veronica; Bauer, Daniel J.; Hussong, Andrea M.; Gottfredson, Nisha

2016-01-01

A challenge facing nearly all studies in the psychological sciences is how to best combine multiple items into a valid and reliable score to be used in subsequent modelling. The most ubiquitous method is to compute a mean of items, but more contemporary approaches use various forms of latent score estimation. Regardless of approach, outside of large-scale testing applications, scoring models rarely include background characteristics to improve score quality. The current paper used a Monte Carlo simulation design to study score quality for different psychometric models that did and did not include covariates across levels of sample size, number of items, and degree of measurement invariance. The inclusion of covariates improved score quality for nearly all design factors, and in no case did the covariates degrade score quality relative to not considering the influences at all. Results suggest that the inclusion of observed covariates can improve factor score estimation. PMID:28757790
Genethics: project accountability via evaluation of teacher and student growth.

PubMed

Hendrix, J R; Mertens, T R

1992-10-01

Accountability through demonstrated learning is increasingly being demanded by agencies funding science education projects. For example, the National Science Foundation requires evidence of the educational impact of programs designed to increase the scientific understanding and competencies of teachers and their students. The purpose of this paper is to share our human genetics educational experiences and accountability model with colleagues interested in serving the genetics educational needs of in-service secondary school science teachers and their students. Our accountability model is facilitated through (1) identifying the educational needs of the population of teachers to be served, (2) articulating goals and measurable objectives to meet these needs, and (3) then designing and implementing pretest/posttest questions to measure whether the objectives have been achieved. Comparison of entry and exit levels of performance on a 50-item test showed that teacher-participants learned a statistically significant amount of genetics content in our NSF-funded workshops. Teachers, in turn, administered a 25-item pretest/posttest to their secondary school students, and collective data from 121 classrooms across the United States revealed statistically significant increases in student knowledge of genetics content. Methods describing our attempts to evaluate teachers' use of pedagogical techniques and bioethical decision-making skills are briefly addressed.
Developing, testing, and implementing a survey of scientist mentoring teachers as part of an RET: The GABI RET mentor survey.

NASA Astrophysics Data System (ADS)

Davey, B.

2017-12-01

The impacts of mentoring in education have been well established. Mentors have a large impact on their mentees and have been show to affect mentee attitudes towards learning, interest in subjects, future success, and more. While mentoring has a well-documented impact on the mentees, mentoring also has an impact on the mentors themselves. However, little has been studied empirically about these impacts. When we looked for a validated instrument that measured the impact of mentoring on the scientists working with the teachers, we found many anecdotal reports but no instruments that meet our specific needs. To this end, we developed, tested, and implemented our own instrument for measuring the impacts of mentoring on our scientist mentors. Our instrument contained both quantitative and qualitative items designed to reveal the effects of mentoring in two areas: 1) cognitive domain (mentoring, teaching, understanding K-12) and 2) affective domain (professional, personal, participation). We first shared our survey with experts in survey development and mentoring, gathered their feedback, and incorporated their suggestions into our instrument. We then had a subsection of our mentors complete the survey and then complete it again three to four days later (test-retest). Our survey has a high correlation for the test-retest quantitative items (0.93) and a high correlation (0.90) between the three reviewers of the qualitative items. From our findings, we feel we have a validated instrument (face, content, and contruct validity) that answers our research questions reliably. Our contribution to the study of mentoring of science teachers reveals a broad range of impacts on the mentors themselves including an improved understanding of the challenges of classroom teaching, a recognition of the importance of scientists working with science teachers, an enhanced ability to communicate their research and findings, and an increased interest and excitement for their own work.
Science Shorts: Sort It out

ERIC Educational Resources Information Center

Adams, Barbara

2007-01-01

Many children enjoy collecting items such as seashells, state quarters, and trading cards. Asking students to think about the ways in which similar items differ, how objects can be grouped by a common characteristic, and how groups can be subsets of a larger category leads to an understanding of fundamental mathematics and science concepts: sets,…
An analysis of high school students' perceptions and academic performance in laboratory experiences

NASA Astrophysics Data System (ADS)

Mirchin, Robert Douglas

This research study is an investigation of student-laboratory (i.e., lab) learning based on students' perceptions of experiences using questionnaire data and evidence of their science-laboratory performance based on paper-and-pencil assessments using Maryland-mandated criteria, Montgomery County Public Schools (MCPS) criteria, and published laboratory questions. A 20-item questionnaire consisting of 18 Likert-scale items and 2 open-ended items that addressed what students liked most and least about lab was administered to students before labs were observed. A pre-test and post-test assessing laboratory achievement were administered before and after the laboratory experiences. The three labs observed were: soda distillation, stoichiometry, and separation of a mixture. Five significant results or correlations were found. For soda distillation, there were two positive correlations. Student preference for analyzing data was positively correlated with achievement on the data analysis dimension of the lab rubric. A student preference for using numbers and graphs to analyze data was positively correlated with achievement on the analysis dimension of the lab rubric. For the separating a mixture lab data the following pairs of correlations were significant. Student preference for doing chemistry labs where numbers and graphs were used to analyze data had a positive correlation with writing a correctly worded hypothesis. Student responses that lab experiences help them learn science positively correlated with achievement on the data dimension of the lab rubric. The only negative correlation found related to the first result where students' preference for computers was inversely correlated to their performance on analyzing data on their lab report. Other findings included the following: students like actual experimental work most and the write-up and analysis of a lab the least. It is recommended that lab science instruction be inquiry-based, hands-on, and that students be tested for lab content acquisition. The final conclusion of the study is that students expressed a preference for working in groups and working with materials and equipment as opposed to individual, non-group work and analyzing data.
[Brazilian psychosocial and operational research vis-à-vis the UNGASS targets].

PubMed

Bastos, Francisco Inácio; Hacker, Mariana A

2006-04-01

Items from the UNGASS Draft Declaration of Commitment on HIV/AIDS (2001) are analyzed. The Brazilian experience of new methods for testing and counseling among vulnerable populations, preventive methods controlled by women, prevention, psychosocial support for people living with HIV/AIDS, and mother-child transmission, is discussed. These items were put into operation in the form of keywords, in systematic searches within the standard biomedicine databases, also including the subdivisions of the Web of Science relating to natural and social sciences. The Brazilian experience relating to testing and counseling strategies has been consolidated through the utilization of algorithms aimed at estimating incidence rates and identifying recently infected individuals, testing and counseling for pregnant women, and application of quick tests. The introduction of alternative methods and new technologies for collecting data from vulnerable populations has been allowing speedy monitoring of the epidemic. Psychosocial support assessments for people living with HIV/AIDS have gained impetus in Brazil, probably as a result of increased survival and quality of life among these individuals. Substantial advances in controlling mother-child transmission have been observed. This is one of the most important victories within the field of HIV/AIDS in Brazil, but deficiencies in prenatal care still constitute a challenge. With regard to prevention methods for women, Brazil has only shown a halting response. Widespread implementation of new technologies for data gathering and management depends on investments in infrastructure and professional skills acquisition.
Selecting Items for Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Developing Science Virtual Test to Measure Students’ Critical Thinking on Living Things and Environmental Sustainability Theme

NASA Astrophysics Data System (ADS)

Akbar, M. N.; Firman, H.; Rusyati, L.

2017-02-01

Critical thinking is skill and ability to use of risk taking creativity to make a decision and knowledge as a result, analysis and synthesis that, evaluation, to acquire, information search, to develop thinking, as an individual aware of his or her own thinking. The aim of this study is to develop the science virtual test to measure students’ critical thinking on living things and environmental sustainability theme. The research method that is used in this research was descriptive research. The development of science virtual test item consist of five steps: (1) content analysis; (2) constructing the instrument (multiple choice) based on the elements of critical thinking by Inch; (3) validity judgment of the instrument by the expert; (4) legibility test of the instrument; (5) conducting the large field test. On the large field test was gained the results of validity and reliability of the test, difficulty index, discriminating power, and quality of distractor. The subjects of research were 8th grade students at International Junior High School in Bandung with 125 total of respondents. The coefficient alpha (α) was 0.747, the reliability of the test was categorized as ‘high’ and value of RXY correlation was 0.63 which mean that the validity of the test was categorized as ‘high’. These means that science virtual test can be used to measure student’s critical thinking with a good consistency. It is expected for other researcher to take this description as one of the basic information to be considered in developing science virtual test for improving students’ critical thinking by various kind of topic.
Do Biology Students Really Hate Math? Empirical Insights into Undergraduate Life Science Majors’ Emotions about Mathematics

PubMed Central

Wachsmuth, Lucas P.; Runyon, Christopher R.; Drake, John M.; Dolan, Erin L.

2017-01-01

Undergraduate life science majors are reputed to have negative emotions toward mathematics, yet little empirical evidence supports this. We sought to compare emotions of majors in the life sciences versus other natural sciences and math. We adapted the Attitudes toward the Subject of Chemistry Inventory to create an Attitudes toward the Subject of Mathematics Inventory (ASMI). We collected data from 359 science and math majors at two research universities and conducted a series of statistical tests that indicated that four AMSI items comprised a reasonable measure of students’ emotional satisfaction with math. We then compared life science and non–life science majors and found that major had a small to moderate relationship with students’ responses. Gender also had a small relationship with students’ responses, while students’ race, ethnicity, and year in school had no observable relationship. Using latent profile analysis, we identified three groups—students who were emotionally satisfied with math, emotionally dissatisfied with math, and neutral. These results and the emotional satisfaction with math scale should be useful for identifying differences in other undergraduate populations, determining the malleability of undergraduates’ emotional satisfaction with math, and testing effects of interventions aimed at improving life science majors’ attitudes toward math. PMID:28798211
The Australian Medical Schools Assessment Collaboration: benchmarking the preclinical performance of medical students.

PubMed

O'Mara, Deborah A; Canny, Ben J; Rothnie, Imogene P; Wilson, Ian G; Barnard, John; Davies, Llewelyn

2015-02-02

To report the level of participation of medical schools in the Australian Medical Schools Assessment Collaboration (AMSAC); and to measure differences in student performance related to medical school characteristics and implementation methods. Retrospective analysis of data using the Rasch statistical model to correct for missing data and variability in item difficulty. Linear model analysis of variance was used to assess differences in student performance. 6401 preclinical students from 13 medical schools that participated in AMSAC from 2011 to 2013. Rasch estimates of preclinical basic and clinical science knowledge. Representation of Australian medical schools and students in AMSAC more than doubled between 2009 and 2013. In 2013 it included 12 of 19 medical schools and 68% of medical students. Graduate-entry students scored higher than students entering straight from school. Students at large schools scored higher than students at small schools. Although the significance level was high (P < 0.001), the main effect sizes were small (4.5% and 2.3%, respectively). The time allowed per multiple choice question was not significantly associated with student performance. The effect on performance of multiple assessments compared with the test items as part of a single end-of-year examination was negligible. The variables investigated explain only 12% of the total variation in student performance. An increasing number of medical schools are participating in AMSAC to monitor student performance in preclinical sciences against an external benchmark. Medical school characteristics account for only a small part of overall variation in student performance. Student performance was not affected by the different methods of administering test items.

Medical students' perceptions of their learning environment during a mandatory research project.

PubMed

Möller, Riitta; Ponzer, Sari; Shoshan, Maria

2017-10-20

To explore medical students´ perceptions of their learning environment during a mandatory 20-week scientific research project. This cross-sectional study was conducted between 2011 and 2013. A total of 651 medical students were asked to fill in the Clinical Learning Environment, Supervision, and Nurse Teacher (CLES+T) questionnaire, and 439 (mean age 26 years, range 21-40, 60% females) returned the questionnaire, which corresponds to a response rate of 67%. The Mann-Whitney U test or the Kruskal-Wallis test were used to compare the research environments. The item My workplace can be regarded as a good learning environment correlated strongly with the item There were sufficient meaningful learning situations (r= 0.71, p<0.001). Overall satisfaction with supervision correlated strongly with the items interaction (r=0.78, p < 0.001), feedback (r=0.76, p<0.001), and a sense of trust (r=0.71, p < 0.001). Supervisors´ failures to bridge the gap between theory and practice or to explain intended learning outcomes were important negative factors. Students with basic science or epidemiological projects rated their learning environments higher than did students with clinical projects (χ 2 (3, N=437) =20.29, p<0.001). A good research environment for medical students comprises multiple meaningful learning activities, individual supervision with continuous feedback, and a trustful atmosphere including interactions with the whole staff. Students should be advised that clinical projects might require a higher degree of student independence than basic science projects, which are usually performed in research groups where members work in close collaboration.
Fighting for physics and Earth science in Florida's high schools

NASA Astrophysics Data System (ADS)

Cottle, Paul

2009-11-01

During its Spring 2009 session, the Florida Legislature considered a bill that would have suspended its comprehensive standardized test in high school science and substituted an end-of-course test in biology to satisfy the requirements of the No Child Left Behind (NCLB) Act. By doing so, the bill would have further deemphasized high school physics and Earth science in a state where physics courses are sometimes not available in high schools (even in International Baccalaureate programs) and where the state's own statistics say that only 16% of high school graduates have taken a physics course. A group of about one hundred science faculty from thirteen colleges and universities in Florida responded with a letter to Governor Crist and visits to legislators asking that the biology-only provisions be defeated (and they were). The group has now produced a white paper on high school science requirements that has been distributed to government and business leaders and been publicized via op-ed pieces and news items in several media outlets statewide. This poster will describe the situation in Florida and the faculty group's efforts. It will also compare Florida's high school requirements in science with those in the other SESAPS states.
Higher-Order Item Response Models for Hierarchical Latent Traits

ERIC Educational Resources Information Center

Huang, Hung-Yu; Wang, Wen-Chung; Chen, Po-Hsi; Su, Chi-Ming

2013-01-01

Many latent traits in the human sciences have a hierarchical structure. This study aimed to develop a new class of higher order item response theory models for hierarchical latent traits that are flexible in accommodating both dichotomous and polytomous items, to estimate both item and person parameters jointly, to allow users to specify…
Spacelab mission development tests

NASA Technical Reports Server (NTRS)

Dalton, B. P.

1978-01-01

The paper describes Spacelab Mission Development Test III (SMD III) whose principal scientific objective was to demonstrate the feasibility of conducting biological research in the Life Sciences Spacelab. The test also provided an opportunity to try out several items of Common Operational Research Equipment (CORE) hardware being developed for operational use in Shuttle/Spacelab, such as rodent and primate handling, transportation units, and a 'zero-g' surgical bench. Operational concepts planned for Spacelab were subjected to evaluation, including animal handling procedures, animal logistics, crew selection and training, and a 'remote' ground station concept. It is noted that all the objectives originally proposed for SMD III were accomplished
A preliminary study investigating the factors influencing STEM major selection by African American females

NASA Astrophysics Data System (ADS)

Ray, Tiffany Monique

The purpose of this study was to investigate the significant factors influencing STEM major selection by African American females. A quantitative research design with a qualitative component was employed. Ex post facto survey research was conducted utilizing an online questionnaire to collect data from participants. African American undergraduate females that had declared a major in STEM comprised the target population for the study. As a basis for comparison, a second data collection ensued. All non-African American undergraduate females majoring in STEM also received the survey instrument to determine if there was a significant difference between factors that influence STEM major selection between the two groups. The Social Cognitive Career Choice Model comprised the conceptual framework for this study. Frequencies and percentages illustrated the demographic characteristics of the sample, as well as the average influence levels of each of the items without regard for level of significance. The researcher conducted an independent samples t-test to compare the mean scores for undergraduate African American females majoring in STEM and non-African American females majoring in STEM on each influential factor on the survey instrument. The researcher coded responses to open-ended questions to generate themes and descriptions. The data showed that African American female respondents were very influenced by the following items: specific interest in the subject, type of work, availability of career opportunities after graduation, parent/guardian, precollege coursework in science, and introductory college courses. In addition, the majority of respondents were very influenced by each of the confidence factors. African American females were overwhelmingly not influenced by aptitude tests. African American females were more influenced than their non-African American female counterparts for the following factors: reputation of the university, college or department, high level of compensation in fields, religious leaders, precollege coursework in mathematics, confidence in mathematics ability, confidence in ability to be successful in mathematics in college, confidence in science ability, and confidence in ability to be successful in science in college. Non-African American females were more influenced than African American females by the precollege coursework in technology and the precollege STEM experience factors. Four themes emerged regarding the items that most influenced success in STEM for African American females: high level of compensation in the field, parents/legal guardians and family members, specific interest in the subject, and confidence in science and math ability. One theme emerged regarding the items that least influenced success in STEM majors for African American females: personal interactions with individuals excluding family members.
High School Class for Gifted Pupils in Physics and Sciences and Pupils' Skills Measured by Standard and Pisa Test

NASA Astrophysics Data System (ADS)

Djordjevic, G. S.; Pavlovic-Babic, D.

2010-01-01

The "High school class for students with special abilities in physics" was founded in Nis, Serbia (www.pmf.ni.ac.yu/f_odeljenje) in 2003. The basic aim of this project has been introducing a broadened curriculum of physics, mathematics, computer science, as well as chemistry and biology. Now, six years after establishing of this specialized class, and 3 years after the previous report, we present analyses of the pupils' skills in solving rather problem oriented test, as PISA test, and compare their results with the results of pupils who study under standard curricula. More precisely results are compared to the progress results of the pupils in a standard Grammar School and the corresponding classes of the Mathematical Gymnasiums in Nis. Analysis of achievement data should clarify what are benefits of introducing in school system track for gifted students. Additionally, item analysis helps in understanding and improvement of learning strategies' efficacy. We make some conclusions and remarks that may be useful for the future work that aims to increase pupils' intrinsic and instrumental motivation for physics and sciences, as well as to increase the efficacy of teaching physics and science.
Applying Item Response Theory Methods to Design a Learning Progression-Based Science Assessment

ERIC Educational Resources Information Center

Chen, Jing

2012-01-01

Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1)…
77 FR 19698 - Notice of Intent to Repatriate Cultural Items: Rochester Museum & Science Center, Rochester, NY

Federal Register 2010, 2011, 2012, 2013, 2014

2012-04-02

... Indian tribe, has determined that the cultural items meet the definition of both sacred objects and... Rochester Museum & Science Center that meet the definition of both sacred objects and objects of cultural.... Traditional religious leaders of the Seneca Nation of New York have identified these medicine faces as being...
The book availability study as an objective measure of performance in a health sciences library.

PubMed Central

Kolner, S J; Welch, E C

1985-01-01

In its search for an objective overall diagnostic evaluation, the University of Illinois Library of the Health Sciences' Program Evaluation Committee selected a book availability measure; it is easy to administer and repeat, results are reproducible, and comparable data exist for other academic and health sciences libraries. The study followed the standard methodology in the literature with minor modifications. Patrons searching for particular books were asked to record item(s) needed and the outcome of the search. Library staff members then determined the reasons for failures in obtaining desired items. The results of the study are five performance scores. The first four represent the percentage probability of a library's operating with ideal effectiveness; the last provides an overall performance score. The scores of the Library of the Health Sciences demonstrated no unusual availability problems. The study was easy to implement and provided meaningful, quantitative, and objective data. PMID:3995202
Spatial abilities, Earth science conceptual understanding, and psychological gender of university non-science majors

NASA Astrophysics Data System (ADS)

Black, Alice A. (Jill)

Research has shown the presence of many Earth science misconceptions and conceptual difficulties that may impede concept understanding, and has also identified a number of categories of spatial ability. Although spatial ability has been linked to high performance in science, some researchers believe it has been overlooked in traditional education. Evidence exists that spatial ability can be improved. This correlational study investigated the relationship among Earth science conceptual understanding, three types of spatial ability, and psychological gender, a self-classification that reflects socially-accepted personality and gender traits. A test of Earth science concept understanding, the Earth Science Concepts (ESC) test, was developed and field tested from 2001 to 2003 in 15 sections of university classes. Criterion validity was .60, significant at the .01 level. Spearman/Brown reliability was .74 and Kuder/Richardson reliability was .63. The Purdue Visualization of Rotations (PVOR) (mental rotation), the Group Embedded Figures Test (GEFT) (spatial perception), the Differential Aptitude Test: Space Relations (DAT) (spatial visualization), and the Bem Inventory (BI) (psychological gender) were administered to 97 non-major university students enrolled in undergraduate science classes. Spearman correlations revealed moderately significant correlations at the .01 level between ESC scores and each of the three spatial ability test scores. Stepwise regression analysis indicated that PVOR scores were the best predictor of ESC scores, and showed that spatial ability scores accounted for 27% of the total variation in ESC scores. Spatial test scores were moderately or weakly correlated with each other. No significant correlations were found among BI scores and other test scores. Scantron difficulty analysis of ESC items produced difficulty ratings ranging from 33.04 to 96.43, indicating the percentage of students who answered incorrectly. Mean score on the ESC was 34%, indicating that the non-majors tested exhibited many Earth science misconceptions and conceptual difficulties. A number of significant results were found when independent t-tests and correlations were conducted among test scores and demographic variables. The number of previous university Earth science courses was significantly related to ESC scores. Preservice elementary/middle majors differed significantly in several ways from other non-majors, and several earlier results were not supported. Results of this study indicate that an important opportunity may exist to improve Earth science conceptual understanding by focusing on spatial ability, a cognitive ability that has heretofore not been directly addressed in schools.
A Comparison of Alternate-Choice and True-False Item Forms Used in Classroom Examinations.

ERIC Educational Resources Information Center

Maihoff, N. A.; Mehrens, Wm. A.

A comparison is presented of alternate-choice and true-false item forms used in an undergraduate natural science course. The alternate-choice item is a modified two-choice multiple-choice item in which the two responses are included within the question stem. This study (1) compared the difficulty level, discrimination level, reliability, and…
An Item Gains and Losses Analysis of False Memories Suggests Critical Items Receive More Item-Specific Processing than List Items

ERIC Educational Resources Information Center

Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.

2006-01-01

In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Developmental performance of 5-year-old Bulgarian children-An example of translational neuroscience in practice.

PubMed

Yordanova, Ralitsa; Ivanov, Ivan

2018-04-25

Developmental testing is essential for early recognition of the various developmental impairments. The tools used should be composed of items that are age specific, adapted, and standardized for the population they are applied to. The achievements of neurosciences, medicine, psychology, pedagogy, etc. are applied in the elaboration of a comprehensive examination tool that should screen all major areas of development. The key age of 5 years permits identification of almost all major developmental disabilities leaving time for therapeutic intervention before school entrance. The aim of the research is to evaluate the developmental performance of 5-year-old Bulgarian children using the approach of translation neuroscience. A comprehensive test program was developed composed of 89 items grouped in the following domains: fine and gross motor development, coordination and balance, central motor neuron disturbances, language development and articulation, perception, attention and behavior, visual acuity, and strabismus. The overall sample comprises 434 children of mean age 63.5 months (SD-3.7). Male to female ratio is 1:1.02. From this group, 390 children are between 60 and 71 months of age. The children are examined in 51 kindergartens in 21 villages and 18 cities randomly chosen in southern Bulgaria. Eight children were excluded from the final analysis because they fulfilled less than 50% of the test items (7 children did not cooperate and 1 child was with autistic spectrum disorder). The items with abnormal response in less than 5% of the children are 43. The items with abnormal response in 6% to 35% of the children are 37. The items with high abnormal response (more than 35%) rate are only 9. The test is an example of a translational approach in neuroscience. On one hand, it is based on the results of several sciences studying growth and development from different perspective. On the other hand, the results from the present research may be implemented in other fields of child development-education, psychology, speech and language therapy, and intervention programs. © 2018 John Wiley & Sons, Ltd.
Preservice Teachers' Memories of Their Secondary Science Education Experiences

NASA Astrophysics Data System (ADS)

Hudson, Peter; Usak, Muhammet; Fančovičová, Jana; Erdoğan, Mehmet; Prokop, Pavol

2010-12-01

Understanding preservice teachers' memories of their education may aid towards articulating high-impact teaching practices. This study describes 246 preservice teachers' perceptions of their secondary science education experiences through a questionnaire and 28-item survey. ANOVA was statistically significant about participants' memories of science with 15 of the 28 survey items. Descriptive statistics through SPSS further showed that a teacher's enthusiastic nature (87%) and positive attitude towards science (87%) were regarded as highly memorable. In addition, explaining abstract concepts well (79%), and guiding the students' conceptual development with practical science activities (73%) may be considered as memorable secondary science teaching strategies. Implementing science lessons with one or more of these memorable science teaching practices may "make a difference" towards influencing high school students' positive long-term memories about science and their science education. Further research in other key learning areas may provide a clearer picture of high-impact teaching and a way to enhance pedagogical practices.
Evolution of a Test Item

ERIC Educational Resources Information Center

Spaan, Mary

2007-01-01

This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Validation of the Resilience Scale for Adolescents in Norwegian adolescents 13-18 years.

PubMed

Moksnes, Unni K; Haugan, Gørill

2018-03-01

Resilience is seen as a vital resource for coping and mental health in adolescents. However, there is no universally accepted theory or definition of resilience, leading to considerable challenges regarding how to operationalise and measure this construct. The study aimed at providing further knowledge of the psychometric properties (dimensionality, construct validity and internal consistency) of the 28-item version of the Resilience Scale for Adolescents (READ) in N = 1183 Norwegian adolescents, 13-18 years old. Dimensionality of READ was tested using confirmatory factor analysis (CFA). Convergent validity and reliability were tested using Pearson's correlation analysis, Cronbach's alpha and composite reliability. The CFA supported a modified, 20-item, five-factor structure with high reliability, supporting the dimensionality and internal consistency of the instrument. Convergent validity was confirmed where all factors correlated in expected directions with measures of sense of coherence, self-esteem, stress and depression. The psychometric properties of the READ need to be further evaluated in adolescents; however, the results indicate that a modified 20-item version of READ is adequate for assessing resilience in the present sample of Norwegian adolescents. © 2017 Nordic College of Caring Science.
Impact of an engineering design-based curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines

NASA Astrophysics Data System (ADS)

Marulcu, Ismail; Barnett, Michael

2016-01-01

Background: Elementary Science Education is struggling with multiple challenges. National and State test results confirm the need for deeper understanding in elementary science education. Moreover, national policy statements and researchers call for increased exposure to engineering and technology in elementary science education. The basic motivation of this study is to suggest a solution to both improving elementary science education and increasing exposure to engineering and technology in it. Purpose/Hypothesis: This mixed-method study examined the impact of an engineering design-based curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines. We hypothesize that the LEGO-engineering design unit is as successful as the inquiry-based unit in terms of students' science content learning of simple machines. Design/Method: We used a mixed-methods approach to investigate our research questions; we compared the control and the experimental groups' scores from the tests and interviews by using Analysis of Covariance (ANCOVA) and compared each group's pre- and post-scores by using paired t-tests. Results: Our findings from the paired t-tests show that both the experimental and comparison groups significantly improved their scores from the pre-test to post-test on the multiple-choice, open-ended, and interview items. Moreover, ANCOVA results show that students in the experimental group, who learned simple machines with the design-based unit, performed significantly better on the interview questions. Conclusions: Our analyses revealed that the design-based Design a people mover: Simple machines unit was, if not better, as successful as the inquiry-based FOSS Levers and pulleys unit in terms of students' science content learning.
75 FR 4589 - National Science Board; Sunshine Act Meetings; Notice

Federal Register 2010, 2011, 2012, 2013, 2014

2010-01-28

... 1235 Approval of December 2009 Minutes. Committee Chairman's Remarks. NSB Information Item: Access to LIGO Data by the Broader Community. NSB Information Item: DUSEL: Preliminary Design Effort. NSB... Research. NSB Action Item: Award for NEON: Observatory Design and Prototyping. Committee on Strategy and...

Students' Preference for Science Careers: International comparisons based on PISA 2006

NASA Astrophysics Data System (ADS)

Kjærnsli, Marit; Lie, Svein

2011-01-01

This article deals with 15-year-old students' tendencies to consider a future science-related career. Two aspects have been the focus of our investigation. The first is based on the construct called 'future science orientation', an affective construct consisting of four Likert scale items that measure students' consideration of being involved in future education and careers in science-related areas. Due to the well-known evidence for Likert scales providing culturally biased estimates, the aim has been to go beyond the comparison of simple country averages. In a series of regression and correlation analyses, we have investigated how well the variance of this construct in each of the participating countries can be accounted for by other Programme for International Student Assessment (PISA) student data. The second aspect is based on a question about students' future jobs. By separating science-related jobs into what we have called 'soft' and 'hard' science-related types of jobs, we have calculated and compared country percentages within each category. In particular, gender differences are discussed, and interesting international patterns have been identified. The results in this article have been reported not only for individual countries, but also for groups of countries. These cluster analyses of countries are based on item-by-item patterns of (residual values of) national average values for the combination of cognitive and affective items. The emerging cluster structure of countries has turned out to contribute to the literature of similarities and differences between countries and the factors behind the country clustering both in science education and more generally.
The Effect of the Position of an Item within a Test on the Item Difficulty Value.

ERIC Educational Resources Information Center

Rubin, Lois S.; Mott, David E. W.

An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
The effects of a visualization-centered curriculum on conceptual understanding and representational competence in high school biology

NASA Astrophysics Data System (ADS)

Wilder, Anna

The purpose of this study was to investigate the effects of a visualization-centered curriculum, Hemoglobin: A Case of Double Identity, on conceptual understanding and representational competence in high school biology. Sixty-nine students enrolled in three sections of freshman biology taught by the same teacher participated in this study. Online Chemscape Chime computer-based molecular visualizations were incorporated into the 10-week curriculum to introduce students to fundamental structure and function relationships. Measures used in this study included a Hemoglobin Structure and Function Test, Mental Imagery Questionnaire, Exam Difficulty Survey, the Student Assessment of Learning Gains, the Group Assessment of Logical Thinking, the Attitude Toward Science in School Assessment, audiotapes of student interviews, students' artifacts, weekly unit activity surveys, informal researcher observations and a teacher's weekly questionnaire. The Hemoglobin Structure and Function Test, consisting of Parts A and B, was administered as a pre and posttest. Part A used exclusively verbal test items to measure conceptual understanding, while Part B used visual-verbal test items to measure conceptual understanding and representational competence. Results of the Hemoglobin Structure and Function pre and posttest revealed statistically significant gains in conceptual understanding and representational competence, suggesting the visualization-centered curriculum implemented in this study was effective in supporting positive learning outcomes. The large positive correlation between posttest results on Part A, comprised of all-verbal test items, and Part B, using visual-verbal test items, suggests this curriculum supported students' mutual development of conceptual understanding and representational competence. Evidence based on student interviews, Student Assessment of Learning Gains ratings and weekly activity surveys indicated positive attitudes toward the use of Chemscape Chime software and the computer-based molecular visualization activities as learning tools. Evidence from these same sources also indicated that students felt computer-based molecular visualization activities in conjunction with other classroom activities supported their learning. Implications for instructional design are discussed.
Assessment of RFID Read Accuracy for ISS Water Kit

NASA Technical Reports Server (NTRS)

Chu, Andrew

2011-01-01

The Space Life Sciences Directorate/Medical Informatics and Health Care Systems Branch (SD4) is assessing the benefits Radio Frequency Identification (RFID) technology for tracking items flown onboard the International Space Station (ISS). As an initial study, the Avionic Systems Division Electromagnetic Systems Branch (EV4) is collaborating with SD4 to affix RFID tags to a water kit supplied by SD4 and studying the read success rate of the tagged items. The tagged water kit inside a Cargo Transfer Bag (CTB) was inventoried using three different RFID technologies, including the Johnson Space Center Building 14 Wireless Habitat Test Bed RFID portal, an RFID hand-held reader being targeted for use on board the ISS, and an RFID enclosure designed and prototyped by EV4.
Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

ERIC Educational Resources Information Center

Wang, Wei

2013-01-01

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Measuring Student Improvement in Lower- and Upper-Level University Climate Science Courses

NASA Astrophysics Data System (ADS)

Harris, S. E.; Taylor, S. V.; Schoonmaker, J. E.; Lane, E.; Francois, R. H.; Austin, P.

2011-12-01

What do university students know about climate? What do they learn in a climate course? On the second-to-last day of a course about global climate change, only 48% of our upper-level science students correctly answered a multiple-choice question about the greenhouse effect. The good news: improvement. Only 16% had answered correctly on the first day of class. The bad news: the learning opportunities we've provided appear to have missed more than half the class on a fundamental climate concept. To evaluate the effectiveness of instruction on student learning about climate, we have developed a prototype assessment tool, designed to be deployed as a low-stakes pre-post test. The items included were validated through student interviews to ensure that students interpret the wording and answer choices in the way we intend. This type of validated assessment, administered both at the beginning and end of term, with matched individuals, provides insight regarding the baseline knowledge with which our students enter a course, and the impact of that course on their learning. We administered test items to students in (1) an upper-level climate course for science majors and (2) a lower-level climate course open to all students. Some items were given to both groups, others to only one of the groups. Both courses use evidence-based pedagogy with active student engagement (clickers, small group activities, regular pre-class preparation). Our results with upper-level students show strong gains in student thinking (>70% of students who missed a question on the pre-test answered correctly on the post-test) about stock-and-flow (box model) problems, annual cycles in the Keeling curve, ice-albedo feedbacks, and isotopic fractionation. On different questions, lower-level students showed strong gains regarding albedo and blackbody emission spectra. Both groups show similar baseline knowledge and lower-than-expected gains on greenhouse effect fundamentals, and zero gain regarding the relative importance of different greenhouse gases. A larger percentage of upper-level students (compared to lower-level students) arrive with correct knowledge comparing different greenhouse gases, and explanations of annual cycles in the Keeling curve, but both groups show similar gains with instruction. Instructors can use feedback from these pre-post assessment results to iteratively modify and test the learning opportunities they provide. We aim to continue development and further validation of this tool such that it can be used in many university-level climate courses.
The Impact of Cooperative Quizzes in a Large Introductory Astronomy Course for Non Science Majors

NASA Astrophysics Data System (ADS)

Zeilik, Michael; Morris, Vicky J.

In Astronomy 101 at the University of New Mexico, we carried out a repeated-items experiment on quizzes and tests to investigate the impact of cooperative testing. This trial was the only change in a reformed course format that had been refined over previous semesters. Our research questions were: Did cooperative quizzes result in gains for the class overall? Did these gains "stick" within the semester? In the spring and fall semesters of 2000, students took quizzes individually and in cooperative learning teams, and tests individually. Normalized gain, , on the quizzes averaged about 0.4, and effect size about 0.8 (approximately a 10% increase in class mean score). Repeating selected quiz items on a subsequent test demonstrated that the gain was sustained over a month in both semesters. In addition, we compared demographics of UNM students with those of the National Astronomy Diagnostic Test project. We found that UNM students are similar to the national sample, except in ethnicity (more Hispanic American, fewer White). Based on these results, we judge that our cooperative quiz strategy will likely succeed in other "Astro 101" classes.
Test item linguistic complexity and assessments for deaf students.

PubMed

Cawthon, Stephanie

2011-01-01

Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
The influence of item order on intentional response distortion in the assessment of high potentials: assessing pilot applicants.

PubMed

Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander

2014-04-01

An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant. © 2013 International Union of Psychological Science.
The Selection of Test Items for Decision Making with a Computer Adaptive Test.

ERIC Educational Resources Information Center

Spray, Judith A.; Reckase, Mark D.

The issue of test-item selection in support of decision making in adaptive testing is considered. The number of items needed to make a decision is compared for two approaches: selecting items from an item pool that are most informative at the decision point or selecting items that are most informative at the examinee's ability level. The first…
Development and psychometric evaluation of an information literacy self-efficacy survey and an information literacy knowledge test.

PubMed

Tepe, Rodger; Tepe, Chabha

2015-03-01

To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. In this test-retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. The IL self-efficacy survey demonstrated good reliability (test-retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test-retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments.
A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.

ERIC Educational Resources Information Center

Lau, C. Allen; Wang, Tianyou

This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…
A Process for Reviewing and Evaluating Generated Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2016-01-01

Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…
What's in a Topic? Exploring the Interaction between Test-Taker Age and Item Content in High-Stakes Testing

ERIC Educational Resources Information Center

Banerjee, Jayanti; Papageorgiou, Spiros

2016-01-01

The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Recognizing and treating anal cancer: training medical students and physicians in Puerto Rico.

PubMed

Ortiz, Ana P; Guiot, Humberto M; Díaz-Miranda, Olga L; Román, Leticia; Palefsky, Joel; Colón-López, Vivian

2013-12-01

This training activity aimed at increasing the knowledge of anal cancer screening, diagnostic and treatment options in medical students and physicians, to determine the interest of these individuals in receiving training in the diagnosis and treatment of anal cancer, and to explore any previous training and/or experience with both anal cancer and clinical trials that these individuals might have. An educational activity (1.5 contact hours) was attended by a group of medical students, residents and several faculty members, all from the Medical Sciences Campus of the University of Puerto Rico (n = 50). A demographic survey and a 6-item pre- and post-test on anal cancer were given to assess knowledge change. Thirty-four participants (68%) answered the survey. Mean age was 29.6 +/- 6.6 years; 78.8% had not received training in anal cancer screening, 93.9% reported being interested in receiving anal cancer training, and 75.8% expressed an interest in leading or conducting a clinical trial. A significant increase in the test scores was observed after the educational activity (pre-test: 3.4 +/- 1.2; post-test: 4.7 +/- 0.71). Three of the items showed an increase in knowledge by the time the post-test was taken. The first of these items assessed the participants' knowledge regarding the existence of any guidelines for the screening/treatment of patients with human papillomavirus (HPV)-related anal disease. The second of these items attempted to determine whether the participants recognized that anal intraepithelial neoplasia (AIN) 2 is considered to be a high-grade neoplasia. The last of the 3 items was aimed at ascertaining whether or not the participants were aware that warty growths in the anus are not necessarily a manifestation of high-grade AIN. This educational activity increased the participants' knowledge of anal cancer and revealed, as well, that most of the participants were interested in future training and in collaborating in a clinical trial. Training physicians from Puerto Rico on anal cancer clinical trials is essential to encourage recruitment of Hispanic patients in these studies now that the guidelines in anal cancer screening and treatment are on their way to be defined.
Medical students’ perceptions of their learning environment during a mandatory research project

PubMed Central

Ponzer, Sari; Shoshan, Maria

2017-01-01

Objectives To explore medical students´ perceptions of their learning environment during a mandatory 20-week scientific research project. Methods This cross-sectional study was conducted between 2011 and 2013. A total of 651 medical students were asked to fill in the Clinical Learning Environment, Supervision, and Nurse Teacher (CLES+T) questionnaire, and 439 (mean age 26 years, range 21-40, 60% females) returned the questionnaire, which corresponds to a response rate of 67%. The Mann-Whitney U test or the Kruskal-Wallis test were used to compare the research environments. Results The item My workplace can be regarded as a good learning environment correlated strongly with the item There were sufficient meaningful learning situations (r= 0.71, p<0.001). Overall satisfaction with supervision correlated strongly with the items interaction (r=0.78, p < 0.001), feedback (r=0.76, p<0.001), and a sense of trust (r=0.71, p < 0.001). Supervisors´ failures to bridge the gap between theory and practice or to explain intended learning outcomes were important negative factors. Students with basic science or epidemiological projects rated their learning environments higher than did students with clinical projects (χ2(3, N=437)=20.29, p<0.001). Conclusions A good research environment for medical students comprises multiple meaningful learning activities, individual supervision with continuous feedback, and a trustful atmosphere including interactions with the whole staff. Students should be advised that clinical projects might require a higher degree of student independence than basic science projects, which are usually performed in research groups where members work in close collaboration. PMID:29056611
Overview of the Microgravity Science Glovebox (MSG)

NASA Technical Reports Server (NTRS)

Wright, Mary Etta

1999-01-01

MSG is a third generation glovebox for Microgravity Science investigations: SpaceLab Glovebox (GBX); Middeck/MIR Gloveboxes (M/MGBX); and GBX and M/MGBX developed by Bradford Engineering (NL). Previous flights have demonstrated utility of glovebox facilities: Contained environment enables broader range of science experiments; Affords better control of video and photographic imaging (a prime data source); Provides better environmental control than cabin atmosphere; and Useful for contingency operations. MSG developed in response to demands for increased work volume, increased capabilities and additional resources. MSG is multi-user facility to support a wide range of small science and technology investigations: Fluid physics; Combustion science; Material science; Biotechnology (cell culturing and protein crystal growth); Space processing; Fundamental physics; and Technology demonstrations. Topics included in this viewgraph are: MSG capabilities; MSG hardware items; MSG, GSE, and OSE items; MSG development approach; and Science utilization.
Measuring metacognitive ability based on science literacy in dynamic electricity topic

NASA Astrophysics Data System (ADS)

Warni; Sunyono; Rosidin

2018-01-01

This study aims to produce an instrument of metacognition ability assessment based on science literacy on theoretically and empirically feasible dynamic electrical material. The feasibility of the assessment instrument includes theoretical validity on material, construction, and language aspects, as well as empirical validity, reliability, difficulty, distinguishing, and distractor indices. The development of assessment instruments refers to the Dick and Carey development model which includes the preliminary study stage, initial product development, validation and revision, and piloting. The instrument was tested to 32 students of class IX in SMP Negeri 20 Bandar Lampung, using the design of One Group Pretest-Postest Design. The result shows that the metacognition ability assessment instrument based on science literacy is feasible theoretically with theoretical validity percentage of 95.44% and empirical validity of 43.75% for the high category, 43.75% for the medium category, and 12.50 % for low category questions; Reliability of assessment instruments of 0.83 high categories; Difficulty level of difficult item is about 31.25% and medium category is equal to 68.75%. Item that has very good distinguishing power is 12.50%, 62.50% for good stage, and medium category is 25.00%; As well as the duplexing function on a matter of multiple choice is 80.00% including good category and 20.00% for medium category.
Testing therapeutic potency of anticancer drugs in animal studies: a commentary.

PubMed

Den Otter, Willem; Steerenberg, Peter A; Van der Laan, Jan Willem

2002-04-01

Regulatory authorities for medicines in European countries deal with many applications for admission to the market of anticancer drugs. Each application must be supported by preclinical and clinical data, among which testing of the therapeutic activity of drugs in animals is important. Recently, the Committee for Proprietary Medicinal Products (CPMP) has released a note for guidance on the preclinical evaluation of anticancer medicinal products. This note provides only general statements regarding tests of anticancer drugs in rodents. This stimulates considerations on how to organize and how to evaluate these tests. In this article we describe our considerations regarding these items based on our experience with applications in The Netherlands since 1993. (c) 2002 Elsevier Science (USA).
Item validity vs. item discrimination index: a redundancy?

NASA Astrophysics Data System (ADS)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.

ISS Asset Tracking Using SAW RFID Technology

NASA Technical Reports Server (NTRS)

Schellhase, Amy; Powers, Annie

2004-01-01

A team at the NASA Johnson Space Center (JSC) is undergoing final preparations to test Surface Acoustic Wave (SAW) Radio Frequency Identification (RFID) technology to track assets aboard the International Space Station (ISS). Currently, almost 10,000 U.S. items onboard the ISS are tracked within a database maintained by both the JSC ground teams and crew onboard the ISS. This barcode-based inventory management system has successfully tracked the location of 97% of the items onboard, but its accuracy is dependant on the crew to report hardware movements, taking valuable time away from science and other activities. With the addition of future modules, the volume of inventory to be tracked is expected to increase significantly. The first test of RFID technology on ISS, which will be conducted by the Expedition 16 crew later this year, will evaluate the ability of RFID technology to track consumable items. These consumables, which include office supplies and clothing, are regularly supplied to ISS and can be tagged on the ground. Automation will eliminate line-of-sight auditing requirements, directly saving crew time. This first step in automating an inventory tracking system will pave the way for future uses of RFID for inventory tracking in space. Not only are there immediate benefits for ISS applications, it is a crucial step to ensure efficient logistics support for future vehicles and exploration missions where resupplies are not readily available. Following a successful initial test, the team plans to execute additional tests for new technology, expanded operations concepts, and increased automation.
Bibliography: Storage Stability of Semiperishable Subsistence Items

DTIC Science & Technology

1993-04-01

Quartermaster Food and Container Institute, 1962. Charalambous, G., ed., The Shelf Life of Food and Beverages , Proceedings of the 4th International Flavor...1972. 10 Corey, H., Texture in Foodstuffs, CRC Critical Reviews in Food Technology 1(2), 161-198, 1970. Delves-Broughton, J., Nisin and Its Uses as a...Food Science 39(3), 555-558, 1974. Sensory Evaluation Guide for Testing Food and Beverage Products, Sensory Evaluation Division of the Institute of Food
78 FR 50108 - Notice of Intent To Repatriate Cultural Item: Rochester Museum & Science Center, Rochester, NY

Federal Register 2010, 2011, 2012, 2013, 2014

2013-08-16

... that the cultural item listed in this notice meets the definition of a sacred object and an object of... definition of a sacred object and an object of cultural patrimony under 25 U.S.C. 3001. This notice is... Item(s) The one sacred object and object of cultural patrimony is a Chilkat blanket (27.92.1/AE 580...
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.

ERIC Educational Resources Information Center

Benson, Jeri; Wilson, Michael

Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
The effects of collaborative concept mapping on the achievement, science self-efficacy and attitude toward science of female eighth-grade students

NASA Astrophysics Data System (ADS)

Ledger, Antoinette Frances

This study sought to examine whether collaborative concept mapping would affect the achievement, science self-efficacy and attitude toward science of female eighth grade science students. The research questions are: (1) Will the use of collaborative concept mapping affect the achievement of female students in science? (2) Will the use of collaborative concept mapping affect the science self-efficacy of female students? (3) Will the use of collaborative concept mapping affect the attitudes of females toward science? The study was quasi-experimental and utilized a pretest-posttest design for both experimental and control groups. Eighth grade female and male students from three schools in a large northeastern school district participated in this study. The achievement test consisted of 10 multiple choice and two open-response questions and used questions from state-wide and national assessments as well as teacher-constructed items. A 29 item Likert type instrument (McMillan, 1992) was administered to measure science self-efficacy and attitude toward science. The study was of 12 weeks duration. During the study, experimental group students were asked to perform collaborative concept map construction in single sex dyads using specific terms designated by the classroom teacher and the researcher. During classroom visitations, student perceptions of collaborative concept mapping were collected and were used to provide insight into the results of the quantitative data analysis. Data from the pre and posttest instruments were analyzed for both experimental and control groups using t-tests. Additionally, the three teachers were interviewed and their perceptions of the study were also used to gain insight into the results of the study. The analysis of data showed that experimental group females showed significantly higher gains in achievement than control group females. An additional analysis of data showed experimental group males showed significantly greater gains in achievement than experimental group females. The analysis of science self-efficacy data showed that neither experimental nor control group females increased their scores pre to posttest, both showed small decreases in scores. However, the posttest scores of the experimental group females were significantly higher than the posttest scores of the control group females. The analysis of the attitude toward science survey data showed that the scores of the experimental group females did not change from pre to posttest. However, scores of the control group females declined from pre to posttest. (Abstract shortened by UMI.)
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

ERIC Educational Resources Information Center

Sahin, Alper; Anil, Duygu

2017-01-01

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
[Perceptions on item disclosure for the Korean medical licensing examination].

PubMed

Yang, Eunbae B

2015-09-01

This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
Evaluation of Colorado Learning Attitudes about Science Survey

NASA Astrophysics Data System (ADS)

Douglas, K. A.; Yale, M. S.; Bennett, D. E.; Haugan, M. P.; Bryan, L. A.

2014-12-01

The Colorado Learning Attitudes about Science Survey (CLASS) is a widely used instrument designed to measure student attitudes toward physics and learning physics. Previous research revealed a fairly complex factor structure. In this study, exploratory and confirmatory factor analyses were conducted on data from an undergraduate introductory physics course (n =3844 ) to determine whether a more parsimonious factor structure exists. Exploratory factor analysis results indicate that many of the items from the original CLASS have poor psychometric properties and could not be used in a revised factor structure. The cross validation showed acceptable fit statistics for a three factor model found in the exploratory factor analysis. This research suggests that a more optimum measurement of students' attitudes about physics and learning physics is obtained with a 15-item instrument, which describes the factors of personal application, personal effort, and problem solving. The proposed revised version of the CLASS offers researchers the opportunity to test a shortened version of the instrument that may be able to provide information about students' attitudes in the areas of personal application of physics, personal effort in a physics course, and approaches to problem solving.
Teaching Journalistic Texts in Science Classes: the Importance of Media Literacy

NASA Astrophysics Data System (ADS)

Ginosar, Avshalom; Tal, Tali

2017-11-01

This study employs a single framework for investigating both environmental journalistic texts published on news websites, and science teachers' choices of such texts for their teaching. We analyzed 188 environmental items published during 2 months in seven news websites to determine popularity of topics. Then, 64 science junior high school teachers responded to a closed questionnaire to identify their preferred topics for using in the classroom and patterns of using environmental news items. In a second, open-ended questionnaire, responded by 50 teachers, we investigated the teachers' media literacy in terms of identifying text types and writers of environmental news items. Good alignment was found between the published topics on the websites and teachers' choices, with somewhat different distribution of topics, which could be explained by curriculum requirements. Teachers' identification of text types and writer types was inaccurate, which implied that their media literacy is inadequate. We argue that media literacy is required for effective use of journalistic texts in science teaching.
A Review of Classical Methods of Item Analysis.

ERIC Educational Resources Information Center

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Modeling Item-Position Effects within an IRT Framework

ERIC Educational Resources Information Center

Debeer, Dries; Janssen, Rianne

2013-01-01

Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
Nevada's Climate Change High School Science Fair Network

NASA Astrophysics Data System (ADS)

Buck, P.

2012-12-01

The purpose of this 3 year project funded by NSF (GEO 1035049) is to increase the climate change science content knowledge and teaching effectiveness of in-service high school science teachers and increase the numbers of quality of high school geoscience projects competing in Nevada's three regional Intel ISEF (International Science & Engineering Fair) affiliated science fairs. In year 1 of the project participants consisted of six female and three male high school teachers from across Nevada. Eight of the participants were white and one was Asian. Five participants taught in Clark County, two taught in Owyhee, one taught in Elko and one taught in Spring Creek. Over 20% of the projects were noted (by the teachers) as being submitted by underrepresented students; however, this information is not reliable as most students did not provide this data themselves. Pre-and post- content tests were given. Teachers improved from an average of eight missed on the pre-test to an average of only four items missed on the post-test. Participants were also asked to evaluate their own teaching efficacy. In general, participants had a strong science efficacy. The item on which there was the most discrepancy among participants was on #10, the one stating that "The low achievement of some students cannot generally be blamed on their teachers." Most teachers completed an end of year program evaluation. All but one of the participants felt that the pace of the workshop was comfortable. All participants who used faculty mentors in helping their students rated their faculty mentors very highly. All participants rated the program content very highly in terms of clarity, organization, relevance, helpfulness and usefulness. All participants gave the program a very high rating overall and stated they would likely use the information to mentor future students and in instruction in future classes. The science fairs are the culmination of the program. Teachers were required to have at least one student submit a project related to climate change science in their regional fair. There were 28 projects submitted in 2011; of these there were 10 first place winners, 5 second place winners, and 1 third place winner. Over half of the projects entered in the regional science fairs received an award. The reported student science fair projects relating to climate change include, among others: comparing CO2 emissions in old and new cars, comparing travel by mass transit with travel by private car, studying how CO2 effects global warming, studying seedlings in a climate controlled environment, studying the effect of climate change on hurricanes, determining ammonia emission from bovine manure, and studying the effect of Dendroctonus brevicomis on the depopulation of Pinus edulis and Pinus ponderosa due to climate change.
Investigating Assessment Bias for Constructed Response Explanation Tasks: Implications for Evaluating Performance Expectations for Scientific Practice

NASA Astrophysics Data System (ADS)

Federer, Meghan Rector

Assessment is a key element in the process of science education teaching and research. Understanding sources of performance bias in science assessment is a major challenge for science education reforms. Prior research has documented several limitations of instrument types on the measurement of students' scientific knowledge (Liu et al., 2011; Messick, 1995; Popham, 2010). Furthermore, a large body of work has been devoted to reducing assessment biases that distort inferences about students' science understanding, particularly in multiple-choice [MC] instruments. Despite the above documented biases, much has yet to be determined for constructed response [CR] assessments in biology and their use for evaluating students' conceptual understanding of scientific practices (such as explanation). Understanding differences in science achievement provides important insights into whether science curricula and/or assessments are valid representations of student abilities. Using the integrative framework put forth by the National Research Council (2012), this dissertation aimed to explore whether assessment biases occur for assessment practices intended to measure students' conceptual understanding and proficiency in scientific practices. Using a large corpus of undergraduate biology students' explanations, three studies were conducted to examine whether known biases of MC instruments were also apparent in a CR instrument designed to assess students' explanatory practice and understanding of evolutionary change (ACORNS: Assessment of COntextual Reasoning about Natural Selection). The first study investigated the challenge of interpreting and scoring lexically ambiguous language in CR answers. The incorporation of 'multivalent' terms into scientific discourse practices often results in statements or explanations that are difficult to interpret and can produce faulty inferences about student knowledge. The results of this study indicate that many undergraduate biology majors frequently incorporate multivalent concepts into explanations of change, resulting in explanatory practices that were scientifically non-normative. However, use of follow-up question approaches was found to resolve this source of bias and thereby increase the validity of inferences about student understanding. The second study focused on issues of item and instrument structure, specifically item feature effects and item position effects, which have been shown to influence measures of student performance across assessment tasks. Results indicated that, along the instrument item sequence, items with similar surface features produced greater sequencing effects than sequences of items with dissimilar surface features. This bias could be addressed by use of a counterbalanced design (i.e., Latin Square) at the population level of analysis. Explanation scores were also highly correlated with student verbosity, despite verbosity being an intrinsically trivial aspect of explanation quality. Attempting to standardize student response length was one proposed solution to the verbosity bias. The third study explored gender differences in students' performance on constructed-response explanation tasks using impact (i.e., mean raw scores) and differential item function (i.e., item difficulties) patterns. While prior research in science education has suggested that females tend to perform better on constructed-response items, the results of this study revealed no overall differences in gender achievement. However, evaluation of specific item features patterns suggested that female respondents have a slight advantage on unfamiliar explanation tasks. That is, male students tended to incorporate fewer scientifically normative concepts (i.e., key concepts) than females for unfamiliar taxa. Conversely, females tended to incorporate more scientifically non-normative ideas (i.e., naive ideas) than males for familiar taxa. Together these results indicate that gender achievement differences for this CR instrument may be a result of differences in how males and females interpret and respond to combinations of item features. Overall, the results presented in the subsequent chapters suggest that as science education shifts toward the evaluation of fused scientific knowledge and practice (e.g., explanation), it is essential that educators and researchers investigate potential sources of bias inherent to specific assessment practices. This dissertation revealed significant sources of CR assessment bias, and provided solutions to address these problems.
Science News of the Year.

ERIC Educational Resources Information Center

Science News, 1981

1981-01-01

Reviews important science news stories of 1981 as reported in "Science News." Gives a one-sentence summary and volume and page references for each story. Groups items by topic including space and astronomy, archaeology and anthropology, technology, behavior, science and society, energy, environment, and specific science disciplines. (DC)
Findings from a novel approach to publication guideline revision: user road testing of a draft version of SQUIRE 2.0.

PubMed

Davies, Louise; Donnelly, Kyla Z; Goodman, Daisy J; Ogrinc, Greg

2016-04-01

The Standards for Quality Improvement Reporting Excellence (SQUIRE) Guideline was published in 2008 (SQUIRE 1.0) and was the first publication guideline specifically designed to advance the science of healthcare improvement. Advances in the discipline of improvement prompted us to revise it. We adopted a novel approach to the revision by asking end-users to 'road test' a draft version of SQUIRE 2.0. The aim was to determine whether they understood and implemented the guidelines as intended by the developers. Forty-four participants were assigned a manuscript section (ie, introduction, methods, results, discussion) and asked to use the draft Guidelines to guide their writing process. They indicated the text that corresponded to each SQUIRE item used and submitted it along with a confidential survey. The survey examined usability of the Guidelines using Likert-scaled questions and participants' interpretation of key concepts in SQUIRE using open-ended questions. On the submitted text, we evaluated concordance between participants' item usage/interpretation and the developers' intended application. For the survey, the Likert-scaled responses were summarised using descriptive statistics and the open-ended questions were analysed by content analysis. Consistent with the SQUIRE Guidelines' recommendation that not every item be included, less than one-third (n=14) of participants applied every item in their section in full. Of the 85 instances when an item was partially used or was omitted, only 7 (8.2%) of these instances were due to participants not understanding the item. Usage of Guideline items was highest for items most similar to standard scientific reporting (ie, 'Specific aim of the improvement' (introduction), 'Description of the improvement' (methods) and 'Implications for further studies' (discussion)) and lowest (<20% of the time) for those unique to healthcare improvement (ie, 'Assessment methods for context factors that contributed to success or failure' and 'Costs and strategic trade-offs'). Items unique to healthcare improvement, specifically 'Evolution of the improvement', 'Context elements that influenced the improvement', 'The logic on which the improvement was based', 'Process and outcome measures', demonstrated poor concordance between participants' interpretation and developers' intended application. User testing of a draft version of SQUIRE 2.0 revealed which items have poor concordance between developer intent and author usage, which will inform final editing of the Guideline and development of supporting supplementary materials. It also identified the items that require special attention when teaching about scholarly writing in healthcare improvement. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests

ERIC Educational Resources Information Center

van der Linden, Wim J.; Ariel, Adelaide; Veldkamp, Bernard P.

2006-01-01

Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

ERIC Educational Resources Information Center

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

2016-01-01

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Criterion-Referenced Test Items for Welding.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…

A Brief History of the Soil Science Society of America

NASA Astrophysics Data System (ADS)

Brevik, Eric C.

2013-04-01

The Soil Science Society of America (SSSA) was officially born on November 18, 1936 at the Mayflower Hotel in Washington, D.C. with Richard Bradfield as the first President. SSSA was created from the merger of the American Soil Survey Association and the Soils Section of American Society of Agronomy (ASA). Six sections were established: 1) physics, 2) chemistry, 3) microbiology, 4) fertility, 5) morphology, and 6) technology, and total membership was less than 200. The first issue of SSSA Journal, then called SSSA Proceedings, published 87 items totaling 526 pages. The first recorded bank balance for SSSA was at the end of the 1937-38 fiscal year, and showed the Society to be worth 1,300.03. The Soils Section of ASA became the official American section of the International Society of Soil Science in 1934, and the new SSSA inherited that distinction which it retains to this day. SSSA has grown significantly since those early days. The original six sections have grown to 11 divisions, and some of those divisions have changed their names to reflect changes occurring within soil science. For example, the original section 5, morphology, is now Division S05 - Pedology after spending many years under other names such as Division V - Soil Classification and Division S-5 - Soil Genesis, Morphology, and Classification. SSSA was incorporated in the State of Wisconsin, USA on 22 January, 1952. Several awards have been developed to recognize achievement in the field of soil science, including the SSSA Presidential Award, Don and Betty Kirkham Soil Physics Award, Emil Truog Soil Science Award, International Soil Science Award, Irrometer Professional Certification Service Award, L.R. Ahuja Ag Systems Modeling Award, Marion L. and Chrystie M. Jackson Soil Science Award, Soil Science Applied Research Award, Soil Science Distinguished Service Award, Soil Science Education Award, Soil Science Industry and Professional Leadership Award, Soil Science Research Award, and SSSA Early Career Professional Award. SSSA has also hosted the World Congress of Soil Science in 1960 and 2006. In 2010 SSSA membership was at 6,367, the third highest membership total in SSSA history. SSSAJ published 259 items totaling 2,201 pages. But unlike 1937, SSSAJ is no longer SSSA's only journal. In 2009 Journal of Environmental Quality published 272 items on 2,480 pages, Soil Survey Horizons (renamed Soil Horizons in 2012) published 26 items on 133 pages, Vadose Zone Journal published 116 items on 1,088 pages, and Journal of Natural Resources and Life Sciences Education published 48 items on 250 pages, giving Society journals a total of 721 items published on 6,132 pages. At the end of 2010 SSSA was worth 3,130,163. All of these numbers show significant achievement in the years since the Society's founding, but not all of those years have been rosy. For example, SSSA's membership dropped from an all-time high of 6,402 in 1985 to 5,319 in 2002 and the Society's net worth declined from 2,132,750 in 1999 to 984,866 in 2002. This period from the mid-1980s through the early 2000s has probably been the most challenging so far in SSSA's history. Many changes are also in store going into the future. Over the past few years SSSA has become increasingly independent from ASA. While the two societies (along with the Crop Science Society of America (CSSA)) still maintain close ties, the members of SSSA have expressed a desire to emphasize that soils are more than agronomic. One indication of this increasing independence can be seen in the annual meetings. SSSA met jointly with the Geological Society of America in 2008 and will meet with the Entomological Society of America in 2015. There are also plans for SSSA to meet independently of ASA and CSSA for the first time in 2018. Another indication is the recent rearrangement of the governing structures of ASA, CSSA, and SSSA.
Decomposing the interaction between retention interval and study/test practice: The role of retrievability

PubMed Central

Jang, Yoonhee; Wixted, John T.; Pecher, Diane; Zeelenberg, René; Huber, David E.

2012-01-01

Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially non-retrievable items. In two experiments, an initial test determined item retrievability. Retrievable or non-retrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical crossover interaction between retention interval and practice type. For retrievable items, however, the crossover interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For non-retrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially non-retrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and non-retrievable items. PMID:22304454
Decomposing the interaction between retention interval and study/test practice: the role of retrievability.

PubMed

Jang, Yoonhee; Wixted, John T; Pecher, Diane; Zeelenberg, René; Huber, David E

2012-01-01

Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially nonretrievable items. In two experiments, an initial test determined item retrievability. Retrievable or nonretrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical cross-over interaction between retention interval and practice type. For retrievable items, however, the cross-over interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For nonretrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially nonretrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and nonretrievable items.
Optimal Test Design with Rule-Based Item Generation

ERIC Educational Resources Information Center

Geerlings, Hanneke; van der Linden, Wim J.; Glas, Cees A. W.

2013-01-01

Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining…
Gender Differences in Scientific Literacy of HKPISA 2006: A Multidimensional Differential Item Functioning and Multilevel Mediation Study

NASA Astrophysics Data System (ADS)

Wong, Kwan Yin

The aim of this study is to investigate the effect of gender differences of 15-year-old students on scientific literacy and their impacts on students’ motivation to pursue science education and careers (Future-oriented Science Motivation) in Hong Kong. The data for this study was collected from the Program for International Student Assessment in Hong Kong (HKPISA). It was carried out in 2006. A total of 4,645 students were randomly selected from 146 secondary schools including government, aided and private schools by two-stage stratified sampling method for the assessment. HKPISA 2006, like most of other large-scale international assessments, presents its assessment frameworks in multidimensional subscales. To fulfill the requirements of this multidimensional assessment framework, this study deployed new approaches to model and investigate gender differences in cognitive and affective latent traits of scientific literacy by using multidimensional differential item functioning (MDIF) and multilevel mediation (MLM). Compared with mean score difference t-test, MDIF improves the precision of each subscales measure at item level and the gender differences in science performance can be accurately estimated. In the light of Eccles et al (1983) Expectancy-value Model of Achievement-related Choices (Eccles’ Model), MLM examines the pattern of gender effects on Future-oriented Science Motivation mediated through cognitive and affective factors. As for MLM investigation, Single-Group Confirmatory Factor Analysis (Single-Group CFA) was used to confirm the applicability and validity of six affective factors which was, originally prepared by OECD. These six factors are Science Self-concept, Personal Value of Science, Interest in Science Learning, Enjoyment of Science Learning, Instrumental Motivation to Learn Science and Future-oriented Science Motivation. Then, Multiple Group CFA was used to verify measurement invariance of these factors across gender groups. The results of Single-Group CFA confirmed that five out of the six affective factors except Interest in Science Learning had strong psychometric properties in the context of Hong Kong. Multiple-group CFA results also confirmed measurement invariance of these factors across gender groups. The findings of this study suggest that 15-year-old school boys consistently outperformed girls in most of the cognitive dimensions except identifying scientific issues. Similarly, boys have higher affective learning outcomes than girls. The effect sizes of gender differences in affective learning outcomes are relatively larger than that of cognitive one. The MLM study reveals that gender effects on Future-oriented Science Motivation mediate through affective factors including Science Self-concept, Enjoyment of Science Learning, Interest in Science Learning, Instrumental Motivation to Learn Science and Personal Value of Science. Girls are significantly affected by the negative impacts of these mediating factors and thus Future-oriented Science Motivation. The MLM results were consistent with the predications by Eccles’ Model. Overall, the CFA and MLM results provide strong support for cross-cultural validity of Eccles’ Model. In light of our findings, recommendations to reduce the gender differences in science achievement and Future-oriented Science Motivation are made for science education participants, teachers, parents, curriculum leaders, examination bodies and policy makers.
Using the Cumulative Common Log-Odds Ratio to Identify Differential Item Functioning of Rating Scale Items in the Exercise and Sport Sciences

ERIC Educational Resources Information Center

Penfield, Randall D.; Giacobbi, Peter R., Jr.; Myers, Nicholas D.

2007-01-01

One aspect of construct validity is the extent to which the measurement properties of a rating scale are invariant across the groups being compared. An increasingly used method for assessing between-group differences in the measurement properties of items of a scale is the framework of differential item functioning (DIF). In this paper we…
Cocoa High School's Academic Courses as Viewed by Their Consumers: A Field Study.

ERIC Educational Resources Information Center

Louwerse, F. H.

A 16-item self-report instrument (included in an appendix) was developed to determine the views held by students (N=1,004) concerning aspects of courses in 5 academic areas: English, foreign languages, mathematics, science, and social studies. Individual items reflected views concerning: understanding course requirments (2 items), teacher/student…
A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries

ERIC Educational Resources Information Center

Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.

2012-01-01

Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale…
Introducing a New Concept Inventory on Climate Change to Support Undergraduate Instruction, Teacher Education, Education Research, and Project Evaluation (Invited)

NASA Astrophysics Data System (ADS)

Morrow, C. A.; Monsaas, J.; Katzenberger, J.; Afolabi, C. Y.

2013-12-01

The Concept Inventory on Climate Change (CICC) is a new research-based, multiple-choice 'test' that provides a powerful new assessment tool for undergraduate instructors, teacher educators, education researchers, and project evaluators. This presentation will describe the features and the development process of the (CICC). This includes insights about how the development team (co-authors) integrated and augmented their multi-disciplinary expertise. The CICC has been developed in the context of a popular introductory undergraduate weather and climate course at a southeastern research university (N~400-500 per semester). The CICC is not a test for a grade, but is intended to be a useful measure of how well a given teaching and learning experience has succeeded in improving understanding about climate change and related climate concepts. The science content addressed by the CICC is rooted in the national consensus document, 'Climate Literacy: The Essential Principles of Climate Science'. The CICC has been designed to support undergraduate instruction, and may be valuable in comparable contexts that teach about climate change. CICC results can help to inform decisions about the effectiveness of teaching strategies by 1) flagging conceptual issues (PRE-instruction); and 2) detecting conceptual change (POST-instruction). Specific CICC items and their answer choices are informed by the research literature on common misunderstandings about climate and climate change. Each CICC item is rated on a 3-tier scale of the cognitive sophistication the item is calling for, and there is a balance among all three tiers across the full instrument. The CICC development process has involved data-driven changes to successive versions. Data sources have included item statistics from the administration of progressively evolved versions of the CICC in the weather and climate course, group interviews with students, and expert review by climate scientists, educators, and project evaluators based primarily in the US and Canada. The development team provided an exceptionally well integrated, multi-disciplinary expertise in climate science, climate education, education research, and psychometrics. The valuable integration of the team's expertise was driven by: 1) the prior interdisciplinary inclinations of key team members, which made it natural to openly inquire and learn across boundaries of expertise; and 2) the willingness of key team members to become respectful teachers of essential knowledge to other team members. These qualities, in combination with reviewer contributions, have brought the leading edges of natural and social science research together to produce the CICC. This work has been partially supported by a NASA award to the Georgia State University Research Foundation (NNX09AL69G).
Criterion-Referenced Test Items for Small Engines.

ERIC Educational Resources Information Center

Herd, Amon

This notebook contains criterion-referenced test items for testing students' knowledge of small engines. The test items are based upon competencies found in the Missouri Small Engine Competency Profile. The test item bank is organized in 18 sections that cover the following duties: shop procedures; tools and equipment; fasteners; servicing fuel…
An Investigation of the Impact of Guessing on Coefficient α and Reliability

PubMed Central

2014-01-01

Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
The validation of science virtual test to assess 7th grade students’ critical thinking on matter and heat topic (SVT-MH)

NASA Astrophysics Data System (ADS)

Sya’bandari, Y.; Firman, H.; Rusyati, L.

2018-05-01

The method used in this research was descriptive research for profiling the validation of SVT-MH to measure students’ critical thinking on matter and heat topic in junior high school. The subject is junior high school students of 7th grade (13 years old) while science teacher and expert as the validators. The instruments that used as a tool to obtain the data are rubric expert judgment (content, media, education) and rubric of readability test. There are four steps to validate SVT-MH in 7th grade Junior High School. These steps are analysis of core competence and basic competence based on Curriculum 2013, expert judgment (content, media, education), readability test and trial test (limited and larger trial test). The instrument validation resulted 30 items that represent 8 elements and 21 sub-elements to measure students’ critical thinking based on Inch in matter and heat topic. The alpha Cronbach (α) is 0.642 which means that the instrument is sufficient to measure students’ critical thinking matter and heat topic.
Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations

NASA Astrophysics Data System (ADS)

Nehm, Ross H.; Ha, Minsu; Mayfield, Elijah

2012-02-01

This study explored the use of machine learning to automatically evaluate the accuracy of students' written explanations of evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate students in response to two different evolution instruments (the EGALT-F and EGALT-P) that contained prompts that differed in various surface features (such as species and traits). We tested human-SIDE scoring correspondence under a series of different training and testing conditions, using Kappa inter-rater agreement values of greater than 0.80 as a performance benchmark. In addition, we examined the effects of response length on scoring success; that is, whether SIDE scoring models functioned with comparable success on short and long responses. We found that SIDE performance was most effective when scoring models were built and tested at the individual item level and that performance degraded when suites of items or entire instruments were used to build and test scoring models. Overall, SIDE was found to be a powerful and cost-effective tool for assessing student knowledge and performance in a complex science domain.
Comparing different instructed-refreshing schedules: evidence for cumulative, forward-order refreshing of verbal lists.

PubMed

Vergauwe, Evie

2018-04-23

Refreshing is one of the mechanisms proposed to maintain information in human working memory. The mechanism is assumed to operate serially, boosting the items of a memory list one after the other. In the current study, we test the most straightforward implementation of serial refreshing, by which refreshing spontaneously reproduces the order of presentation, starting with the first memory item and cycling through the list in a forward fashion, to support short-term memory of a list. Therefore, we examined verbal serial recall performance under different instructed-refreshing schedules that varied in their similarity to cumulative, forward-order refreshing. This was done by manipulating whether instructed refreshing started with the first memory item, and whether instructed refreshing proceeded in forward order through the list. We expected recall performance to be poorer as participants were required to think of the list items in a way that was more dissimilar to what they would have done spontaneously. However, across four experiments, we observed that recall performance was not drastically affected by the nature of instructed refreshing and thus, we did not find any evidence that cumulative, forward-order refreshing supports serial verbal WM performance. © 2018 New York Academy of Sciences.
Tagging insulin in microgravity

NASA Technical Reports Server (NTRS)

Dobeck, Michael; Nelson, Ronald S.

1992-01-01

Knowing the exact subcellular sites of action of insulin in the body has the potential to give basic science investigators a basis from which a cause and cure for this disease can be approached. The goal of this project is to create a test reagent that can be used to visualize these subcellular sites. The unique microgravity environment of the Shuttle will allow the creation of a reagent that has the possibility of elucidating the subcellular sites of action of insulin. Several techniques have been used in an attempt to isolate the sites of action of items such as insulin. One of these is autoradiography in which the test item is obtained from animals fed radioactive materials. What is clearly needed is to visualize individual insulin molecules at their sites of action. The insulin tagging process to be used on G-399 involves the conjugation of insulin molecules with ferritin molecules to create a reagent that will be used back on Earth in an attempt to elucidate the sites of action of insulin.
Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André

2016-01-01

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Development and psychometric evaluation of an information literacy self-efficacy survey and an information literacy knowledge test*

PubMed Central

Tepe, Rodger; Tepe, Chabha

2015-01-01

Objective To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. Methods In this test–retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. Results The IL self-efficacy survey demonstrated good reliability (test–retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test–retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). Conclusions This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments. PMID:25517736
Integrating Test-Form Formatting into Automated Test Assembly

ERIC Educational Resources Information Center

Diao, Qi; van der Linden, Wim J.

2013-01-01

Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…
Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2013-01-01

Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
How arousal affects younger and older adults' memory binding.

PubMed

Nashiro, Kaoru; Mather, Mara

2011-01-01

A number of recent studies have shown that associative memory for within-item features is enhanced for emotionally arousing items, whereas arousal-enhanced binding is not seen for associations between distinct items (for a review, see Mather, 2007, Perspectives on Psychological Science, 2, 33-52). The costs and benefits of arousal in memory binding have been examined for younger adults but not for older adults. The present experiment examined whether arousal would enhance younger and older adults' within-item and between-item memory binding. The results revealed that arousal improved younger adults' within-item memory binding but not that of older adults. Arousal worsened both groups' between-item memory binding.

Development of a scale to measure diabetes self-management behaviors among older Koreans with type 2 diabetes, based on the seven domains identified by the American Association of Diabetes Educators.

PubMed

Seo, Kyoungsan; Song, Misoon; Choi, Suyoung; Kim, Se-An; Chang, Sun Ju

2017-04-01

The purpose of this study was to develop the Diabetes Self-Management Behavior for Older Koreans (DSMB-O). This scale is based on the seven relevant domains that have been identified by the American Association of Diabetes Educators (AADE) and is adjusted for sociocultural and age-related characteristics. Four phases were used to develop of the DSMB-O as a criterion-referenced measure. In phases 1 and 2, the DSMB-O adopted the AADE's seven domains and established a self-report questionnaire using a small number of items that are applicable to older Koreans. In phase 3, the DSMB-O was formulated with 16 preliminary items, including seven subitems. By assessing the content validity, 14 items (including five subitems) were selected. The final phase involved evaluating the DSMB-O's psychometric properties, including test-retest reliability, content validity, and criterion-related validity, using data from 150 older Koreans with type 2 diabetes. The coefficients of agreement and Cohen's Kappa for the test-retest reliability test ranged from 0.32 to 1.0 and -0.07 to 1.0, respectively. For the content validity, the values of both the item- and scale-level content validity indices were 1.0. The scores from the DSMB-O were positively correlated with the scores from the Korean version of the Summary of Diabetes Self-Care Activities Questionnaire. The DSMB-O is short and easy for older Koreans to use, as well as having acceptable levels of reliability and validity. Hence, the DSMB-O can be a useful tool to evaluate diabetes self-management behaviors in older Koreans with type 2 diabetes. © 2016 Japan Academy of Nursing Science.
A Procedure To Detect Test Bias Present Simultaneously in Several Items.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

A statistical procedure is presented that is designed to test for unidirectional test bias existing simultaneously in several items of an ability test, based on the assumption that test bias is incipient within the two groups' ability differences. The proposed procedure--Simultaneous Item Bias (SIB)--is based on a multidimensional item response…
An Item Response Theory Model for Test Bias.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
The Swedish P-CAT: modification and exploration of psychometric properties of two different versions.

PubMed

Selan, Denis; Jakobsson, Ulf; Condelius, Anna

2017-09-01

The aim of this study was to further investigate the psychometric properties (with focus on construct validity and scale function) of the Swedish version of the Person-centred Care Assessment Tool (P-CAT) in a sample consisting of staff working in elderly care units (N = 142). The aim was also to further develop and psychometrically test a modified, noncontext-specific version of the instrument (mP-CAT) in a sample consisting of staff working in primary health care or within home care for older people (N = 182). Principal component analysis with varimax rotation initially suggested a three-factor solution for the P-CAT, explaining 55.96% of variance. Item 13 solely represented one factor wherefore this solution was rejected. A final 2-factor solution, without item 13, had a cumulative explained variance of 50.03%. All communalities were satisfactory (>0.3), and alpha values for both first factor (items 1-6, 11) and second factor (items 7-10, 12) were found to be acceptable. Principal component analysis with varimax rotation suggested a final 2-factor solution for the mP-CAT explaining 46.15% of the total variance with communalities ranging from 0.263 to 0.712. Cronbach's α for both factors was found to be acceptable (>0.7). This study suggests a 2-factor structure for the P-CAT and an exclusion of item 13. The results indicated that the modified noncontext-specific version, mP-CAT, seems to be a valid measure. Further psychometric testing of the mP-CAT is however needed in order to establish the instrument's validity and reliability in various contexts. © 2016 Nordic College of Caring Science.
Measuring adolescent science motivation

NASA Astrophysics Data System (ADS)

Schumm, Maximiliane F.; Bogner, Franz X.

2016-02-01

To monitor science motivation, 232 tenth graders of the college preparatory level ('Gymnasium') completed the Science Motivation Questionnaire II (SMQ-II). Additionally, personality data were collected using a 10-item version of the Big Five Inventory. A subsequent exploratory factor analysis based on the eigenvalue-greater-than-one criterion, extracted a loading pattern, which in principle, followed the SMQ-II frame. Two items were dropped due to inappropriate loadings. The remaining SMQ-II seems to provide a consistent scale matching the findings in literature. Nevertheless, also possible shortcomings of the scale are discussed. Data showed a higher perceived self-determination in girls which seems compensated by their lower self-efficacy beliefs leading to equality of females and males in overall science motivation scores. Additionally, the Big Five personality traits and science motivation components show little relationship.
Validation study of the Colorado Learning Attitudes about Science Survey at a Hispanic-serving institution

NASA Astrophysics Data System (ADS)

Sawtelle, Vashti; Brewe, Eric; Kramer, Laird

2009-12-01

The Colorado Learning Attitudes about Science Survey (CLASS) has been widely acknowledged as a useful measure of student cognitive attitudes about science and learning. The initial University of Colorado validation study included only 20% non-Caucasian student populations. In this Brief Report we extend their validation to include a predominately under-represented minority population. We validated the CLASS instrument at Florida International University, a Hispanic-serving institution, by interviewing students in introductory physics classes using a semistructured protocol, examining students’ responses on the CLASS item statements, and comparing them to the items’ intended meaning. We find that in our predominately Hispanic population, 94% of the students’ interview responses indicate that the students interpret the CLASS items correctly, and thus the CLASS is a valid instrument. We also identify one potentially problematic item in the instrument which one third of the students interviewed consistently misinterpreted.
Using Reliability and Item Analysis to Evaluate a Teacher-Developed Test in Educational Measurement and Evaluation

ERIC Educational Resources Information Center

Quaigrain, Kennedy; Arhin, Ato Kwamina

2017-01-01

Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
Audio Adapted Assessment Data: Does the Addition of Audio to Written Items Modify the Item Calibration?

ERIC Educational Resources Information Center

Snyder, James

2010-01-01

This dissertation research examined the changes in item RIT calibration that occurred when adding audio to a set of currently calibrated RIT items and then placing these new items as field test items in the modified assessments on the NWEA MAP test platform. The researcher used test results from over 600 students in the Poway School District in…
The establisment of an achievement test for determination of primary teachers’ knowledge level of earthquake

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aydin, Süleyman, E-mail: yupul@hotmail.com; Haşiloğlu, M. Akif, E-mail: mehmet.hasiloglu@hotmail.com; Kunduraci, Ayşe, E-mail: ayse-kndrc@hotmail.com

In this study it was aimed to improve an academic achievement test to establish the students’ knowledge about the earthquake and the ways of protection from earthquakes. In the method of this study, the steps that Webb (1994) was created to improve an academic achievement test for a unit were followed. In the developmental process of multiple choice test having 25 questions, was prepared to measure the pre-service teachers’ knowledge levels about the earthquake and the ways of protection from earthquakes. The multiple choice test was presented to view of six academics (one of them was from geographic field andmore » five of them were science educator) and two expert teachers in science Prepared test was applied to 93 pre-service teachers studying in elementary education department in 2014-2015 academic years. As a result of validity and reliability of the study, the test was composed of 20 items. As a result of these applications, Pearson Moments Multiplication half-reliability coefficient was found to be 0.94. When this value is adjusted according to Spearman Brown reliability coefficient the reliability coefficient was set at 0.97.« less
Scientists Reflect on Why They Chose to Study Science

NASA Astrophysics Data System (ADS)

Venville, Grady; Rennie, Léonie; Hanbury, Colin; Longnecker, Nancy

2013-12-01

A concern commonly raised in literature and in media relates to the declining proportions of students who enter and remain in the `science pipeline', and whether many countries, including Australia and New Zealand, have enough budding scientists to fill research and industry positions in the coming years. In addition, there is concern that insufficient numbers of students continue in science to ensure an informed, scientifically literate citizenry. The aim of the research presented in this paper was to survey current Australian and New Zealand scientists to explore their reasons for choosing to study science. An online survey was conducted via a link to SurveyGizmo. The data presented are from 726 respondents who answered 22 forced-choice items and an open-ended question about the reasons they chose to study science. The quantitative data were analysed using t tests and analyses of variance followed by Duncan's multiple range tests, and the qualitative data were analysed thematically. The quantitative data showed that the main reasons scientists reported choosing to study science were because they were interested in science and because they were good at science. Secondary school science classes and one particular science teacher also were found to be important factors. Of much less importance were the prestige of science and financial considerations. The qualitative data expanded on these findings and showed that passion for science and/or curiosity about the world were important factors and also highlighted the importance of recreational pursuits, such as camping when a child. In the words of one respondent, `People don't go into science for the money and glory. It's passion for knowledge and science that always attracted me to the field'.
Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

ERIC Educational Resources Information Center

Sachse, Karoline A.; Haag, Nicole

2017-01-01

Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…
Matching the Grade 8 TIMSS Item Pool to the Ontario Curriculum.

ERIC Educational Resources Information Center

Lawson, Alexandra; Bordignon, Catherine; Nagy, Philip

2002-01-01

Studied the match between the Ontario (Canada) eighth grade curriculum for 1997 and the item pool of the Third International Mathematics and Science Study (TIMSS) and analyzed the matching process itself. Findings show that the 1997 curriculum is a better match to the TIMSS item pool, achieving the better match by enlarging the curriculum and…
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

ERIC Educational Resources Information Center

Aybek, Eren Can; Demirtasli, R. Nukhet

2017-01-01

This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
An Effect Size Measure for Raju's Differential Functioning for Items and Tests

ERIC Educational Resources Information Center

Wright, Keith D.; Oshima, T. C.

2015-01-01

This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah

2011-01-01

Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Influence of Fallible Item Parameters on Test Information During Adaptive Testing.

ERIC Educational Resources Information Center

Wetzel, C. Douglas; McBride, James R.

Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…
A Guide to Item Banking in Education. (Third Edition).

ERIC Educational Resources Information Center

Naccarato, Richard W.

The current status of banks of test items existing across the United States was determined through a survey conducted between September and December 1987. Item "bank" in this context does not imply that the test items are available in computerized form, but simply that "deposited" test items can be withdrawn for use. Emphasis…
Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.

PubMed

Chen, Senlin; Zhu, Xihe; Kang, Minsoo

2017-05-01

A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.
Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

ERIC Educational Resources Information Center

Baghaei, Purya; Ravand, Hamdollah

2016-01-01

In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
Experiments in materials science from household items

NASA Technical Reports Server (NTRS)

Spiegel, F. Xavier

1993-01-01

Everyday household items are used to demonstrate some unique properties of materials. A coat hanger, rubber band, balloon, and corn starch have typical properties which we often take for granted but can be truly amazing.

Machine Shop. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This drafting criterion-referenced test item bank is keyed to the machine shop competency profile developed by industry and education professionals in Missouri. The 16 references used for drafting the test items are listed. Test items are arranged under these categories: orientation to machine shop; performing mathematical calculations; performing…
Rescuing Computerized Testing by Breaking Zipf's Law.

ERIC Educational Resources Information Center

Wainer, Howard

2000-01-01

Suggests that because of the nonlinear relationship between item usage and item security, the problems of test security posed by continuous administration of standardized tests cannot be resolved merely by increasing the size of the item pool. Offers alternative strategies to overcome these problems, distributing test items so as to avoid the…
Science News of the Year.

ERIC Educational Resources Information Center

Science News, 1987

1987-01-01

Provides a review of science news stories reported in "Science News" during 1987. References each item to the volume and page number in which the subject was addressed. Contains references on astronomy, behavior, biology, biomedicine, chemistry, earth sciences, environment, mathematics and computers, paleontology and anthropology, physics, science…
An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

ERIC Educational Resources Information Center

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
The Science Camp Model based on maker movement and tinkering activity for developing concept of electricity in middle school students to meet standard evaluation of ordinary national educational test (O-NET)

NASA Astrophysics Data System (ADS)

Chamrat, Suthida

2018-01-01

The standard evaluation of Thai education relies excessively on the Ordinary National Educational Test, widely known as O-NET. However, a focus on O-Net results can lead to unsatisfactory teaching practices, especially in science subjects. Among the negative consequences, is that schools frequently engage in "cramming" practices in order to elevate their O-NET scores. Higher education, which is committed to generating and applying knowledge by socially engaged scholars, needs to take account of this situation. This research article portrays the collaboration between the faculty of education at Chiang Mai University and an educational service area to develop the model of science camp. The activities designed for the Science Camp Model were based on the Tinkering and Maker Movement. Specifically, the Science Camp Model was designed to enhance the conceptualization of electricity for Middle School Students in order to meet the standard evaluation of the Ordinary National Educational Test. The hands-on activities consisted of 5 modules which were simple electrical circuits, paper circuits, electrical measurement roleplay motor art robots and Force from Motor. The data were collected by 11 items of Electricity Socratic-based Test adapted from cumulative published O-NET tests focused on the concept of electricity concept. The qualitative data were also collected virtually via Flinga.com. The results indicated that students after participating in 5modules of science camp based on the Maker Movement and tinkering activity developed average percentage of test scores from 33.64 to 65.45. Gain score analysis using dependent t-test compared pretest and posttest mean scores. The p value was found to be statistically significant (less than 0.001). The posttest had a considerably higher mean score compared with the pretest. Qualitative data also indicated that students could explain the main concepts of electrical circuits, and the transformation of electrical energy to mechanical energy. The schools were satisfied, and expressed greater confidence in the Science Camp Model as an alternative way to improve Standard Evaluation of Ordinary National Educational Test.
Do the Guideline Violations Influence Test Difficulty of High-Stake Test?: An Investigation on University Entrance Examination in Turkey

ERIC Educational Resources Information Center

Atalmis, Erkan Hasan

2016-01-01

Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
Preservice and Inservice Science Teachers' Responses and Reasoning about the Nature of Science

ERIC Educational Resources Information Center

Buaraphan, Khajornsak

2009-01-01

An adequate understanding of the nature of science (NOS) is essential for science teachers. The Myths of Science Questionnaire (MOSQ) consisting of 14 items, which comprised both optional and written types of response, was utilized to explore 113 Thai preservice and 101 inservice science teachers' understanding and reasoning about the NOS,…
Science Teachers' Thinking about the Nature of Science: A New Methodological Approach to Its Assessment

ERIC Educational Resources Information Center

Vazquez-Alonso, Angel; Garcia-Carmona, Antonio; Manassero-Mas, Maria Antonia; Bennassar-Roig, Antoni

2013-01-01

This paper describes Spanish science teachers' thinking about issues concerning the nature of science (NOS) and the relationships connecting science, technology, and society (STS). The sample consisted of 774 in-service and pre-service teachers. The participants responded to a selection of items from the Questionnaire of Opinions on Science,…
Item difficulty and item validity for the Children's Group Embedded Figures Test.

PubMed

Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S

1994-02-01

The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
Weapon Performance Testing and Analysis: The MODI-PAC Round, the Number 4 Lead-Shot Round, and the Flying Baton

DTIC Science & Technology

1976-01-01

items. The items tested were the MODI-PAC, a proprietary item of Reming)on Arms Company, a standard 12 - gauge round of No. 4 lead shot, and an...to refrain from testing this item. Therefore, the final selection of items for testing were (1) the MODI-PAC, (2) a standard 12 - gauge shotgun round of...The first item evaluated was the MODI-PAC5. The MOQ1-PAC which standsfor “modified impact “ is a 12 - gauge shotgun shell loaded with approximately 320
Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.

ERIC Educational Resources Information Center

Commons, C., Ed.; Martin, P., Ed.

Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…
Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.

ERIC Educational Resources Information Center

Commons, C., Ed.; Martin, P., Ed.

The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…
Interactions Between Item Content And Group Membership on Achievement Test Items.

ERIC Educational Resources Information Center

Linn, Robert L.; Harnisch, Delwyn L.

The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.

ERIC Educational Resources Information Center

Hertz, Norman R.; Chinn, Roberta N.

This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement.

PubMed

McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H

2018-01-23

Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item PRISMA diagnostic test accuracy checklist provides specific guidance for reporting of systematic reviews. The PRISMA diagnostic test accuracy guideline can facilitate the transparent reporting of reviews, and may assist in the evaluation of validity and applicability, enhance replicability of reviews, and make the results from systematic reviews of diagnostic test accuracy studies more useful.
An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Han, Kyung T.

2012-01-01

Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

ERIC Educational Resources Information Center

Arendasy, Martin E.; Sommer, Markus

2012-01-01

The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot

ERIC Educational Resources Information Center

Magis, David; Facon, Bruno

2013-01-01

Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
The Need for Computer Science

ERIC Educational Resources Information Center

Margolis, Jane; Goode, Joanna; Bernier, David

2011-01-01

Broadening computer science learning to include more students is a crucial item on the United States' education agenda, these authors say. Although policymakers advocate more computer science expertise, computer science offerings in high schools are few--and actually shrinking. In addition, poorly resourced schools with a high percentage of…
Puzzle test: A tool for non-analytical clinical reasoning assessment.

PubMed

Monajemi, Alireza; Yaghmaei, Minoo

2016-01-01

Most contemporary clinical reasoning tests typically assess non-automatic thinking. Therefore, a test is needed to measure automatic reasoning or pattern recognition, which has been largely neglected in clinical reasoning tests. The Puzzle Test (PT) is dedicated to assess automatic clinical reasoning in routine situations. This test has been introduced first in 2009 by Monajemi et al in the Olympiad for Medical Sciences Students.PT is an item format that has gained acceptance in medical education, but no detailed guidelines exist for this test's format, construction and scoring. In this article, a format is described and the steps to prepare and administer valid and reliable PTs are presented. PT examines a specific clinical reasoning task: Pattern recognition. PT does not replace other clinical reasoning assessment tools. However, it complements them in strategies for assessing comprehensive clinical reasoning.

Developing an Engineering Design Process Assessment using Mixed Methods.

PubMed

Wind, Stefanie A; Alemdar, Meltem; Lingle, Jeremy A; Gale, Jessica D; Moore, Roxanne A

Recent reforms in science education worldwide include an emphasis on engineering design as a key component of student proficiency in the Science, Technology, Engineering, and Mathematics disciplines. However, relatively little attention has been directed to the development of psychometrically sound assessments for engineering. This study demonstrates the use of mixed methods to guide the development and revision of K-12 Engineering Design Process (EDP) assessment items. Using results from a middle-school EDP assessment, this study illustrates the combination of quantitative and qualitative techniques to inform item development and revisions. Overall conclusions suggest that the combination of quantitative and qualitative evidence provides an in-depth picture of item quality that can be used to inform the revision and development of EDP assessment items. Researchers and practitioners can use the methods illustrated here to gather validity evidence to support the interpretation and use of new and existing assessments.
Findings from a novel approach to publication guideline revision: user road testing of a draft version of SQUIRE 2.0

PubMed Central

Davies, Louise; Donnelly, Kyla Z; Goodman, Daisy J; Ogrinc, Greg

2016-01-01

Background The Standards for Quality Improvement Reporting Excellence (SQUIRE) Guideline was published in 2008 (SQUIRE 1.0) and was the first publication guideline specifically designed to advance the science of healthcare improvement. Advances in the discipline of improvement prompted us to revise it. We adopted a novel approach to the revision by asking end-users to ‘road test’ a draft version of SQUIRE 2.0. The aim was to determine whether they understood and implemented the guidelines as intended by the developers. Methods Forty-four participants were assigned a manuscript section (ie, introduction, methods, results, discussion) and asked to use the draft Guidelines to guide their writing process. They indicated the text that corresponded to each SQUIRE item used and submitted it along with a confidential survey. The survey examined usability of the Guidelines using Likert-scaled questions and participants’ interpretation of key concepts in SQUIRE using open-ended questions. On the submitted text, we evaluated concordance between participants’ item usage/interpretation and the developers’ intended application. For the survey, the Likert-scaled responses were summarised using descriptive statistics and the open-ended questions were analysed by content analysis. Results Consistent with the SQUIRE Guidelines’ recommendation that not every item be included, less than one-third (n=14) of participants applied every item in their section in full. Of the 85 instances when an item was partially used or was omitted, only 7 (8.2%) of these instances were due to participants not understanding the item. Usage of Guideline items was highest for items most similar to standard scientific reporting (ie, ‘Specific aim of the improvement’ (introduction), ‘Description of the improvement’ (methods) and ‘Implications for further studies’ (discussion)) and lowest (<20% of the time) for those unique to healthcare improvement (ie, ‘Assessment methods for context factors that contributed to success or failure’ and ‘Costs and strategic trade-offs’). Items unique to healthcare improvement, specifically ‘Evolution of the improvement’, ‘Context elements that influenced the improvement’, ‘The logic on which the improvement was based’, ‘Process and outcome measures’, demonstrated poor concordance between participants’ interpretation and developers’ intended application. Conclusions User testing of a draft version of SQUIRE 2.0 revealed which items have poor concordance between developer intent and author usage, which will inform final editing of the Guideline and development of supporting supplementary materials. It also identified the items that require special attention when teaching about scholarly writing in healthcare improvement. PMID:26263916
Assessing Student Learning about the Earth through the InTeGrate Project

NASA Astrophysics Data System (ADS)

Gilbert, L. A.; Iverson, E. A. R.; Steer, D. N.; Birnbaum, S. J.; Manduca, C. A.

2016-12-01

InTeGrate, a five-year community-based project comprised of faculty in the sciences and other disciplines, educational specialists, and evaluation experts at diverse institutions, instills learning about Earth in the context of societal issues through teaching materials developed into 2-3 week modules or courses. Materials were tested by over 135 materials authors and faculty interested in using these materials in undergraduate courses at a range of institution types across the US in geoscience, engineering, humanities, and social science courses. To assess impact on student learning, the InTeGrate project has collected student work from over 4,600 students enrolled in courses using these materials. To evaluate the influence of the materials on learning gains related to geoscience literacy, a set of 8 multiple choice items were developed, tested, and then administered in the first and last week of class in approximately 180 courses. The items were developed by 14 community members with assessment expertise and address content and concepts in the Earth, Climate, Atmosphere, and Ocean Science literacy documents. In a sample of 2,023 paired first and last week responses, students exhibit a 10% normalized gain (equivalent to 1 point of a 12 point total) regardless of their initial score. Students in the lowest quartile at the beginning of the course demonstrate the highest gains (4th quartile gain of 1.8) versus the higher quartile where a ceiling effect is present. In addition, a free-response essay was administered in the last week of the course which tests students' understanding for how Earth system interactions influence people's ability to make decisions about global societal challenges. Analysis of these essays demonstrates a strong relationship between the InTeGrate content and the subject matter of the student essay. These preliminary findings suggest that the use of InTeGrate materials increases students' understanding of geoscience literacies and the materials give students a topical hook for connecting learning about Earth to societal challenges.
[Difference analysis among majors in medical parasitology exam papers by test item bank proposition].

PubMed

Jia, Lin-Zhi; Ya-Jun, Ma; Cao, Yi; Qian, Fen; Li, Xiang-Yu

2012-04-30

The quality index among "Medical Parasitology" exam papers and measured data for students in three majors from the university in 2010 were compared and analyzed. The exam papers were formed from the test item bank. The alpha reliability coefficients of the three exam papers were above 0.70. The knowledge structure and capacity structure of the exam papers were basically balanced. But the alpha reliability coefficients of the second major was the lowest, mainly due to quality of test items in the exam paper and the failure of revising the index of test item bank in time. This observation demonstrated that revising the test items and their index in the item bank according to the measured data can improve the quality of test item bank proposition and reduce the difference among exam papers.
The Role of Item Models in Automatic Item Generation

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis

2012-01-01

Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Tests of methods for evaluating bibliographic databases: an analysis of the National Library of Medicine's handling of literatures in the medical behavioral sciences.

PubMed

Griffith, B C; White, H D; Drott, M C; Saye, J D

1986-07-01

This article reports on five separate studies designed for the National Library of Medicine (NLM) to develop and test methodologies for evaluating the products of large databases. The methodologies were tested on literatures of the medical behavioral sciences (MBS). One of these studies examined how well NLM covered MBS monographic literature using CATLINE and OCLC. Another examined MBS journal and serial literature coverage in MEDLINE and other MBS-related databases available through DIALOG. These two studies used 1010 items derived from the reference lists of sixty-one journals, and tested for gaps and overlaps in coverage in the various databases. A third study examined the quality of the indexing NLM provides to MBS literatures and developed a measure of indexing as a system component. The final two studies explored how well MEDLINE retrieved documents on topics submitted by MBS professionals and how online searchers viewed MEDLINE (and other systems and databases) in handling MBS topics. The five studies yielded both broad research outcomes and specific recommendations to NLM.
The Dominance Concept Inventory: A Tool for Assessing Undergraduate Student Alternative Conceptions about Dominance in Mendelian and Population Genetics.

PubMed

Abraham, Joel K; Perez, Kathryn E; Price, Rebecca M

2014-01-01

Despite the impact of genetics on daily life, biology undergraduates understand some key genetics concepts poorly. One concept requiring attention is dominance, which many students understand as a fixed property of an allele or trait and regularly conflate with frequency in a population or selective advantage. We present the Dominance Concept Inventory (DCI), an instrument to gather data on selected alternative conceptions about dominance. During development of the 16-item test, we used expert surveys (n = 12), student interviews (n = 42), and field tests (n = 1763) from introductory and advanced biology undergraduates at public and private, majority- and minority-serving, 2- and 4-yr institutions in the United States. In the final field test across all subject populations (n = 709), item difficulty ranged from 0.08 to 0.84 (0.51 ± 0.049 SEM), while item discrimination ranged from 0.11 to 0.82 (0.50 ± 0.048 SEM). Internal reliability (Cronbach's alpha) was 0.77, while test-retest reliability values were 0.74 (product moment correlation) and 0.77 (intraclass correlation). The prevalence of alternative conceptions in the field tests shows that introductory and advanced students retain confusion about dominance after instruction. All measures support the DCI as a useful instrument for measuring undergraduate biology student understanding and alternative conceptions about dominance. © 2014 J. K. Abraham et al. CBE—Life Sciences Education © 2014 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
HPV, Cervical Cancer and Pap Test Related Knowledge Among a Sample of Female Dental Students in India.

PubMed

Doshi, Dolar; Reddy, B Srikanth; Karunakar, P; Deshpande, Kopparesh

2015-01-01

The present study was designed to ascertain knowledge about HPV, cervical cancer (CC) and the Pap test among female dental students of Panineeya Institute of Dental Sciences and Hospital, Hyderabad, India. A self-administered questionnaire covering demographic details, knowledge relating to human papilloma virus (HPV) (8 items), cervical cancer (4 items) and the Pap smear (6 items) was employed. Responses were coded as "True, False and Don't Know". Mean and standard deviation (SD) for correct answers and levels of knowledge were determined. Based on the year of study, significant differences in knowledge of HPV were noted for questions on symptoms (p=0.01); transmission from asymptomatic partners (p=0.002); treatment with antibiotics (p=0.002); start of sexual activity (p=0.004); and recommended age for HPV vaccination (p=0.01). For knowledge regarding CC, significance was observed for the age group being affected (p=0.008) and symptoms of the disease in early stages (p=0.001). Indications for Pap smear tests like symptoms' of vaginal discharge (p=0.002), marital status (p=0.01) and women with children (p=0.02) had significant difference based on the year of study. Based on religion, transmission of HPV via pregnancy, HPV related diseases except CC and preventive measures except condom use and oral contraceptives showed significant differences. However, significant variation with religion was observed only for two preventive measures of CC (Pap test; p=0.004) and HPV vaccination (p=0.003). Likewise, only the frequency of Pap test showed a significant difference for religion (p=0.001). This study emphasizes the lack of awareness with regard to HPV, CC and screening with pap smear even among health professionals. Hence, regular health campaigns are essential to reduce the disease burden.
75 FR 61779 - National Science Board: Sunshine Act Meetings; Notice

Federal Register 2010, 2011, 2012, 2013, 2014

2010-10-06

...:30 p.m. to 3 p.m. SUBJECT MATTER: Review of NSB Action Item (NSB/CPP-10-63) (Deep Underground Science... National Science Board Web site http://www.nsf.gov/nsb for additional information and schedule updates...
Person Response Functions and the Definition of Units in the Social Sciences

ERIC Educational Resources Information Center

Engelhard, George, Jr.; Perkins, Aminah F.

2011-01-01

Humphry (this issue) has written a thought-provoking piece on the interpretation of item discrimination parameters as scale units in item response theory. One of the key features of his work is the description of an item response theory (IRT) model that he calls the logistic measurement function that combines aspects of two traditions in IRT that…
Bayesian Analysis of Item Response Curves. Research Report 84-1. Mathematical Sciences Technical Report No. 132.

ERIC Educational Resources Information Center

Tsutakawa, Robert K.; Lin, Hsin Ying

Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…
Developing a Machine-Supported Coding System for Constructed-Response Items in PISA. Research Report. ETS RR-17-47

ERIC Educational Resources Information Center

Yamamoto, Kentaro; He, Qiwei; Shin, Hyo Jeong; von Davier, Mattias

2017-01-01

Approximately a third of the Programme for International Student Assessment (PISA) items in the core domains (math, reading, and science) are constructed-response items and require human coding (scoring). This process is time-consuming, expensive, and prone to error as often (a) humans code inconsistently, and (b) coding reliability in…
Item Review and the Rearrangement Procedure: Its Process and Its Results

ERIC Educational Resources Information Center

Papanastasiou, Elena C.

2005-01-01

Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive…
A Model-Based Method for Content Validation of Automatically Generated Test Items

ERIC Educational Resources Information Center

Zhang, Xinxin; Gierl, Mark

2016-01-01

The purpose of this study is to describe a methodology to recover the item model used to generate multiple-choice test items with a novel graph theory approach. Beginning with the generated test items and working backward to recover the original item model provides a model-based method for validating the content used to automatically generate test…
Integration of Basic and Clinical Science Courses in US PharmD Programs

PubMed Central

Talukder, Rahmat M.; Taheri, Reza; Blanchard, Nicholas

2016-01-01

Objective. To determine the current status of and faculty perceptions regarding integration of basic and clinical science courses in US pharmacy programs. Methods. A 25-item survey instrument was developed and distributed to 132 doctor of pharmacy (PharmD) programs. Survey data were analyzed using Mann-Whitney U test or Kruskal-Wallis test. Thematic analysis of text-based comments was performed using the constant comparison method. Results. One hundred twelve programs responded for a response rate of 85%. Seventy-eight (70%) offered integrated basic and clinical science courses. The types of integration included: full integration with merging disciplinary contents (n=25), coordinated delivery of disciplinary contents (n=50), and standalone courses with integrated laboratory (n=3). Faculty perceptions of course integration were positive. Themes that emerged from text-based comments included positive learning experiences as well as the challenges, opportunities, and skepticism associated with course integration. Conclusion. The results suggest wide variations in the design and implementation of integrated courses among US pharmacy programs. Faculty training and buy-in play a significant role in successful implementation of curricular integration. PMID:28179715
Integration of Basic and Clinical Science Courses in US PharmD Programs.

PubMed

Islam, Mohammed A; Talukder, Rahmat M; Taheri, Reza; Blanchard, Nicholas

2016-12-25

Objective. To determine the current status of and faculty perceptions regarding integration of basic and clinical science courses in US pharmacy programs. Methods. A 25-item survey instrument was developed and distributed to 132 doctor of pharmacy (PharmD) programs. Survey data were analyzed using Mann-Whitney U test or Kruskal-Wallis test. Thematic analysis of text-based comments was performed using the constant comparison method. Results. One hundred twelve programs responded for a response rate of 85%. Seventy-eight (70%) offered integrated basic and clinical science courses. The types of integration included: full integration with merging disciplinary contents (n=25), coordinated delivery of disciplinary contents (n=50), and standalone courses with integrated laboratory (n=3). Faculty perceptions of course integration were positive. Themes that emerged from text-based comments included positive learning experiences as well as the challenges, opportunities, and skepticism associated with course integration. Conclusion. The results suggest wide variations in the design and implementation of integrated courses among US pharmacy programs. Faculty training and buy-in play a significant role in successful implementation of curricular integration.
Optimal Bayesian Adaptive Design for Test-Item Calibration.

PubMed

van der Linden, Wim J; Ren, Hao

2015-06-01

An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
State Assessment Program Item Banks: Model Language for Request for Proposals (RFP) and Contracts

ERIC Educational Resources Information Center

Swanson, Leonard C.

2010-01-01

This document provides recommendations for request for proposal (RFP) and contract language that state education agencies can use to specify their requirements for access to test item banks. An item bank is a repository for test items and data about those items. Item banks are used by state agency staff to view items and associated data; to…
The Impact of Receiving the Same Items on Consecutive Computer Adaptive Test Administrations.

ERIC Educational Resources Information Center

O'Neill, Thomas; Lunz, Mary E.; Thiede, Keith

2000-01-01

Studied item exposure in a computerized adaptive test when the item selection algorithm presents examinees with questions they were asked in a previous test administration. Results with 178 repeat examinees on a medical technologists' test indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate…
Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly

ERIC Educational Resources Information Center

Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.

2013-01-01

Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…

Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing

ERIC Educational Resources Information Center

Gierl, Mark J.; Lai, Hollis; Li, Johnson

2013-01-01

The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…
The Nature of Science as Viewed by Science Teachers in Najran District, Saudi Arabia

ERIC Educational Resources Information Center

Saif, Abdulsalam Dale Amer

2016-01-01

This study aims to investigate the views of Saudi Science Teachers in Najran district about the nature of science (NOS). A questionnaire of fourteen items was developed and administered to a sample of 83 science teachers. The questionnaire covers five aspects of the nature of science which are: scientific theories and models; role of scientists;…
Technology Use in Science Instruction (TUSI): Aligning the Integration of Technology in Science Instruction in Ways Supportive of Science Education Reform

ERIC Educational Resources Information Center

Campbell, Todd; Abd-Hamid, Nor Hashidah

2013-01-01

This study describes the development of an instrument to investigate the extent to which technology is integrated in science instruction in ways aligned to science reform outlined in standards documents. The instrument was developed by: (a) creating items consistent with the five dimensions identified in science education literature, (b)…
Fermilab Friends for Science Education Store

Science.gov Websites

Refunds Fermilab Refund Policy: Refunds are allowed for 30 days after you purchase your product. Please send an email to ffse-store@fnal.gov with your name, item(s), and the date of purchase, and return
Validity issues in the evaluation of a measure of science and mathematics teacher knowledge

NASA Astrophysics Data System (ADS)

Talbot, Robert M., III

2011-12-01

This study investigates the reliability and validity of an instrument designed to measure science and mathematics teachers' strategic knowledge . Strategic knowledge is conceptualized as a construct that is related to pedagogical knowledge and is comprised of two dimensions: Flexible Application (FA) and Student Centered Instruction (SCI). The FA dimension describes how a science teacher invokes, applies and modifies her instructional repertoire in a given teaching context. The SCI dimension describes how a science teacher conceives of a given situation as an opportunity for active engagement with the students. The Flexible Application of Student-Centered Instruction (FASCI) survey instrument was designed to measure science teachers' strategic knowledge by eliciting open-ended responses to scenario-based items. This study addresses the following overarching question: What are some potential issues pertaining to the validity of measures of science and mathematics teacher knowledge? Using a validity argument framework, different sources of evidence are identified, collected, and evaluated to examine support for a set or propositions related to the intended score interpretation and instrument use: FASCI scores can be used to compare and distinguish the strategic knowledge of novice science and mathematics teachers in the evaluation of teacher education programs. Three separate but related studies are presented and discussed. These studies focus on the reliability of FASCI scores, the effect of adding specific science content to the scenario-based items, and the observation of strategic knowledge in teaching practice. Serious issues were found with the reliability of scores from the FASCI instrument. It was also found that adding science content to the scenario-based items has an effect on FASCI scores, but not for the reason hypothesized. Finally, it was found that more evidence is needed to make stronger claims about the relationship between FASCI scores and novice teachers' practice. In concluding this work, a set of four recommendations are presented for others who are engaged in similar measure development efforts. These recommendations focus on the areas of construct definition, item design and development, rater recruitment and training, and the validation process.
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

ERIC Educational Resources Information Center

Guo, Rui; Zheng, Yi; Chang, Hua-Hua

2015-01-01

An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
The promise and challenge of including multimedia items in medical licensure examinations: some insights from an empirical trial.

PubMed

Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank

2010-10-01

The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

PubMed

Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

2016-01-01

The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Do Biology Students Really Hate Math? Empirical Insights into Undergraduate Life Science Majors' Emotions about Mathematics.

PubMed

Wachsmuth, Lucas P; Runyon, Christopher R; Drake, John M; Dolan, Erin L

2017-01-01

Undergraduate life science majors are reputed to have negative emotions toward mathematics, yet little empirical evidence supports this. We sought to compare emotions of majors in the life sciences versus other natural sciences and math. We adapted the Attitudes toward the Subject of Chemistry Inventory to create an Attitudes toward the Subject of Mathematics Inventory (ASMI). We collected data from 359 science and math majors at two research universities and conducted a series of statistical tests that indicated that four AMSI items comprised a reasonable measure of students' emotional satisfaction with math. We then compared life science and non-life science majors and found that major had a small to moderate relationship with students' responses. Gender also had a small relationship with students' responses, while students' race, ethnicity, and year in school had no observable relationship. Using latent profile analysis, we identified three groups-students who were emotionally satisfied with math, emotionally dissatisfied with math, and neutral. These results and the emotional satisfaction with math scale should be useful for identifying differences in other undergraduate populations, determining the malleability of undergraduates' emotional satisfaction with math, and testing effects of interventions aimed at improving life science majors' attitudes toward math. © 2017 L.P. Wachsmuth et al. CBE—Life Sciences Education © 2017 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
The Nature of Science Instrument-Elementary (NOSI-E): Using Rasch principles to develop a theoretically grounded scale to measure elementary student understanding of the nature of science

NASA Astrophysics Data System (ADS)

Peoples, Shelagh

The purpose of this study was to determine which of three competing models will provide, reliable, interpretable, and responsive measures of elementary students' understanding of the nature of science (NOS). The Nature of Science Instrument-Elementary (NOSI-E), a 28-item Rasch-based instrument, was used to assess students' NOS understanding. The NOS construct was conceptualized using five construct dimensions (Empirical, Inventive, Theory-laden, Certainty and Socially & Culturally Embedded). The competing models represent three internal models for the NOS construct. One postulate is that the NOS construct is unidimensional where one latent construct explains the relationship between the 28 items of the NOSI-E. Alternatively, the NOS construct is composed of five independent unidimensional constructs (the consecutive approach). Lastly, the NOS construct is multidimensional and composed of five inter-related but separate dimensions. A validity argument was developed that hypothesized that the internal structure of the NOS construct is best represented by the multidimensional Rasch model. Four sets of analyses were performed in which the three representations were compared. These analyses addressed five validity aspects (content, substantive, generalizability, structural and external) of construct validity. The vast body of evidence supported the claim that the NOS construct is composed of five separate but inter-related dimensions that is best represented by the multidimensional Rasch model. The results of the multidimensional analyses indicated that the items of the five subscales were of excellent technical quality, exhibited no differential item functioning (based on gender), had an item hierarchy that conformed to theoretical expectations; and together formed subscales of reasonable reliability (> 0.7 on each subscale) that were responsive to change in the construct. Theory-laden scores from the multidimensional model predicted students' science achievement with scores from all five NOS dimensions significantly predicting students' perceptions of the constructivist nature of their classroom learning environment. The NOSI-E instrument is a theoretically grounded scale that can measure elementary students' NOS understanding and appears suitable for use in science education research.
Item Analysis in Introductory Economics Testing.

ERIC Educational Resources Information Center

Tinari, Frank D.

1979-01-01

Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)
Development and evaluation of a thermochemistry concept inventory for college-level general chemistry

NASA Astrophysics Data System (ADS)

Wren, David A.

The research presented in this dissertation culminated in a 10-item Thermochemistry Concept Inventory (TCI). The development of the TCI can be divided into two main phases: qualitative studies and quantitative studies. Both phases focused on the primary stakeholders of the TCI, college-level general chemistry instructors and students. Each phase was designed to collect evidence for the validity of the interpretations and uses of TCI testing data. A central use of TCI testing data is to identify student conceptual misunderstandings, which are represented as incorrect options of multiple-choice TCI items. Therefore, quantitative and qualitative studies focused heavily on collecting evidence at the item-level, where important interpretations may be made by TCI users. Qualitative studies included student interviews (N = 28) and online expert surveys (N = 30). Think-aloud student interviews (N = 12) were used to identify conceptual misunderstandings used by students. Novice response process validity interviews (N = 16) helped provide information on how students interpreted and answered TCI items and were the basis of item revisions. Practicing general chemistry instructors (N = 18), or experts, defined boundaries of thermochemistry content included on the TCI. Once TCI items were in the later stages of development, an online version of the TCI was used in expert response process validity survey (N = 12), to provide expert feedback on item content, format and consensus of the correct answer for each item. Quantitative studies included three phases: beta testing of TCI items (N = 280), pilot testing of the a 12-item TCI (N = 485), and a large data collection using a 10-item TCI ( N = 1331). In addition to traditional classical test theory analysis, Rasch model analysis was also used for evaluation of testing data at the test and item level. The TCI was administered in both formative assessment (beta and pilot testing) and summative assessment (large data collection), with items performing well in both. One item, item K, did not have acceptable psychometric properties when the TCI was used as a quiz (summative assessment), but was retained in the final version of the TCI based on the acceptable psychometric properties displayed in pilot testing (formative assessment).
Life sciences payload definition and integration study, task C and D. Volume 4: Preliminary equipment item specification catalog

NASA Technical Reports Server (NTRS)

1973-01-01

A specification catalog to define the equipment to be used for conducting life sciences experiments in a space laboratory is presented. The specification sheets list the purpose of the equipment item, and any specific technical requirements which can be identified. The status of similar hardware for ground use is stated with comments regarding modifications required to achieve spaceflight qualified hardware. Pertinent sketches, commercial catalog sheets, or drawings of the applicable equipment are included.
Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating. Research Report. ETS RR-12-09

ERIC Educational Resources Information Center

Li, Yanmei

2012-01-01

In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…
Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

PubMed

Sinharay, Sandip

2017-09-01

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
Development and Validation of the ACSI: Measuring Students' Science Attitudes, Pro-Environmental Behaviour, Climate Change Attitudes and Knowledge

ERIC Educational Resources Information Center

Dijkstra, E. M.; Goedhart, M. J.

2012-01-01

This article describes the development and validation of the Attitudes towards Climate Change and Science Instrument. This 63-item questionnaire measures students' pro-environmental behaviour, their climate change knowledge and their attitudes towards school science, societal implications of science, scientists, a career in science and the urgency…
The School Science Attitude Survey: A New Instrument for Measuring Attitudes towards School Science

ERIC Educational Resources Information Center

Kennedy, JohnPaul; Quinn, Frances; Taylor, Neil

2016-01-01

There have been many attempts over the last five decades to measure students' attitudes towards school science. Many of these studies investigated attitudes towards limited aspects of science and utilized large numbers of items to draw snapshot summaries of the educational landscape. An understanding of attitudes towards science, and how these…
The Views of Turkish Science Teachers about Gender Equity within Science Education

ERIC Educational Resources Information Center

Idin, Sahin; Dönmez, Ismail

2017-01-01

The aim of this study was to investigate Turkish Science teachers' views about gender equity in the scope of science education. This study was conducted with the quantitative methodology. Within this scope, a 35-item 5-point Likert scale survey was developed to determine Science teachers' views concerning gender equity issues. 160 Turkish Science…
Spanish Secondary-School Science Teachers' Beliefs about Science-Technology-Society (STS) Issues

ERIC Educational Resources Information Center

Vazquez-Alonso, Angel; Garcia-Carmona, Antonio; Manassero-Mas, Maria Antonia; Bennassar-Roig, Antoni

2013-01-01

This study analyzes the beliefs about science-technology-society, and other Nature of Science (NOS) themes, of a large sample (613) of Spanish pre- and in-service secondary education teachers through their responses to 30 items of the Questionnaire of Opinions on Science, Technology and Society. The data were processed by means of a multiple…
Spanish Students' Conceptions about NOS and STS Issues: A Diagnostic Study

ERIC Educational Resources Information Center

Vázquez-Alonso, Ángel; García-Carmona, Antonio; Manassero-Mas, María Antonia; Bennàssar-Roig, Antoni

2014-01-01

Spanish students' beliefs on themes of Science-Technology-Society (STS) and nature of science (NOS) are assessed. The sample consisted of 1050 science and non-science students who had concluded their pre-university education (18-19 years old). Each participant anonymously answered 30 items drawn from the Questionnaire of Opinions on Science,…

A Bayesian Method for the Detection of Item Preknowledge in CAT. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.

ERIC Educational Resources Information Center

McLeod, Lori D.; Lewis, Charles; Thissen, David.

With the increased use of computerized adaptive testing, which allows for continuous testing, new concerns about test security have evolved, one being the assurance that items in an item pool are safeguarded from theft. In this paper, the risk of score inflation and procedures to detect test takers using item preknowledge are explored. When test…
Effect of bibliographical classification on the impact factor of science- and engineering-based journals.

PubMed

Foo, Jong Yong Abdiel

2009-01-01

The simplest and widely used assessment of academic research and researchers is the journal impact factor (JIF). However, the JIF may exhibit patterns that are skewed towards journals that publish high number of non-research items and short turnover research. Moreover, there are concerns as the JIF is often used as a comparison for journals from different disciplines. In this study, the JIF computation of eight top ranked journals from four different subject categories was analyzed. The analysis reveals that most of the published items (>65%) in the science disciplines were nonresearch items while fewer such items (<22%) were observed in engineering-based journals. The single regression analysis confirmed that there is correlation (R(2) > or = .99) in the number of published items or citations received over the two-year period used in the JIF calculation amongst the eight selected journals. A weighted factor computation is introduced to compensate for the smaller journals and journals that publish longer turnover research. It is hoped that the approach can provide a comprehensive assessment of the quality of a journal regardless of the disciplinary field.
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

ERIC Educational Resources Information Center

Kim, Jihye; Oshima, T. C.

2013-01-01

In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
Item Response Theory Models for Performance Decline during Testing

ERIC Educational Resources Information Center

Jin, Kuan-Yu; Wang, Wen-Chung

2014-01-01

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

PubMed

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Samejima Items in Multiple-Choice Tests: Identification and Implications

ERIC Educational Resources Information Center

Rahman, Nazia

2013-01-01

Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as "Samejima items") if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers…
Computerized Numerical Control Test Item Bank.

ERIC Educational Resources Information Center

Reneau, Fred; And Others

This guide contains 285 test items for use in teaching a course in computerized numerical control. All test items were reviewed, revised, and validated by incumbent workers and subject matter instructors. Items are provided for assessing student achievement in such aspects of programming and planning, setting up, and operating machines with…
Validation of the early childhood attitude toward women in science scale (ECWiSS): A pilot administration

NASA Astrophysics Data System (ADS)

Mulkey, Lynn M.

The intention of this research was to measure attitudes of young children toward women scientists. A 27-item instrument, the Early Childhood Women in Science Scale (ECWiSS) was validated in a test case of the proposition that differential socialization predicts entry into the scientific talent pool. Estimates of internal consistency indicated that the scale is highly reliable. Known groups and correlates procedures, employed to determine the validity of the instrument, revealed that the scale is able to discriminate significant differences between groups and distinguishes three dimensions of attitude (role-specific self-concept, home-related sex-role conflict, and work-related sex-role conflict). Results of the analyses also confirmed the anticipated pattern of correlations with measures of another construct. The findings suggest the utility of the ECWiSS for measurement of early childhood attitudes in models of the ascriptive and/or meritocratic processes affecting recruitment to science and more generally in program and curriculum evaluation where attitude toward women in science is the construct of interest.
Science curiosity in learning environments: developing an attitudinal scale for research in schools, homes, museums, and the community

NASA Astrophysics Data System (ADS)

Weible, Jennifer L.; Toomey Zimmerman, Heather

2016-05-01

Although curiosity is considered an integral aspect of science learning, researchers have debated how to define, measure, and support its development in individuals. Prior measures of curiosity include questionnaire type scales (primarily for adults) and behavioral measures. To address the need to measure scientific curiosity, the Science Curiosity in Learning Environments (SCILE) scale was created and validated as a 12-item scale to measure scientific curiosity in youth. The scale was developed through (a) adapting the language of the Curiosity and Exploration Inventory-II [Kashdan, T. B., Gallagher, M. W., Silvia, P. J., Winterstein, B. P., Breen, W. E., Terhar, D., & Steger, M. F. (2009). The curiosity and exploration inventory-II: Development, factor structure, and psychometrics. Journal of Research in Personality, 43(6), 987-998] for youth and (b) crafting new items based on scientific practices drawn from U.S. science standards documents. We administered a preliminary set of 30 items to 663 youth ages 8-18 in the U.S.A. Exploratory and confirmatory factor analysis resulted in a three-factor model: stretching, embracing, and science practices. The findings indicate that the SCILE scale is a valid measure of youth's scientific curiosity for boys and girls as well as elementary, middle school, and high school learners.
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

ERIC Educational Resources Information Center

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

2013-01-01

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Robust Scale Transformation Methods in IRT True Score Equating under Common-Item Nonequivalent Groups Design

ERIC Educational Resources Information Center

He, Yong

2013-01-01

Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…
Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd; Gerritz, Kalle

1990-01-01

Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Item Structural Properties as Predictors of Item Difficulty and Item Association.

ERIC Educational Resources Information Center

Solano-Flores, Guillermo

1993-01-01

Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
Investigating Item Exposure Control Methods in Computerized Adaptive Testing

ERIC Educational Resources Information Center

Ozturk, Nagihan Boztunc; Dogan, Nuri

2015-01-01

This study aims to investigate the effects of item exposure control methods on measurement precision and on test security under various item selection methods and item pool characteristics. In this study, the Randomesque (with item group sizes of 5 and 10), Sympson-Hetter, and Fade-Away methods were used as item exposure control methods. Moreover,…
Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

ERIC Educational Resources Information Center

Lee, Woo-yeol; Cho, Sun-Joo

2017-01-01

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Item Pool Design for an Operational Variable-Length Computerized Adaptive Test

ERIC Educational Resources Information Center

He, Wei; Reckase, Mark D.

2014-01-01

For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
Life sciences payload definition and integration study. Volume 3: Preliminary equipment item specification catalog for the carry-on laboratories. [for Spacelab

NASA Technical Reports Server (NTRS)

1974-01-01

All general purpose equipment items contained in the final carry-on laboratory (COL) design concepts are described in terms of specific requirements identified for COL use, hardware status, and technical parameters such as weight, volume, power, range, and precision. Estimated costs for each item are given, along with projected development times.
Analyzing Item Generation with Natural Language Processing Tools for the "TOEIC"® Listening Test. Research Report. ETS RR-17-52

ERIC Educational Resources Information Center

Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin

2017-01-01

In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…
An Analysis of Factors Affecting the Difficulty of Dialogue Items in TOEFL Listening Comprehension. TOEFL Research Reports, 51.

ERIC Educational Resources Information Center

Nissan, Susan; And Others

One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

PubMed

Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

2017-01-01

The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.

The beneficial effect of testing: an event-related potential study

PubMed Central

Bai, Cheng-Hua; Bridger, Emma K.; Zimmer, Hubert D.; Mecklinger, Axel

2015-01-01

The enhanced memory performance for items that are tested as compared to being restudied (the testing effect) is a frequently reported memory phenomenon. According to the episodic context account of the testing effect, this beneficial effect of testing is related to a process which reinstates the previously learnt episodic information. Few studies have explored the neural correlates of this effect at the time point when testing takes place, however. In this study, we utilized the ERP correlates of successful memory encoding to address this issue, hypothesizing that if the benefit of testing is due to retrieval-related processes at test then subsequent memory effects (SMEs) should resemble the ERP correlates of retrieval-based processing in their temporal and spatial characteristics. Participants were asked to learn Swahili-German word pairs before items were presented in either a testing or a restudy condition. Memory performance was assessed immediately and 1-day later with a cued recall task. Successfully recalling items at test increased the likelihood that items were remembered over time compared to items which were only restudied. An ERP subsequent memory contrast (later remembered vs. later forgotten tested items), which reflects the engagement of processes that ensure items are recallable the next day were topographically comparable with the ERP correlate of immediate recollection (immediately remembered vs. immediately forgotten tested items). This result shows that the processes which allow items to be more memorable over time share qualitatively similar neural correlates with the processes that relate to successful retrieval at test. This finding supports the notion that testing is more beneficial than restudying on memory performance over time because of its engagement of retrieval processes, such as the re-encoding of actively retrieved memory representations. PMID:26441577
A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating

PubMed Central

Michaelides, Michalis P.

2010-01-01

Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230
A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

PubMed

Michaelides, Michalis P

2010-01-01

Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
On the Relationship Between Classical Test Theory and Item Response Theory: From One to the Other and Back.

PubMed

Raykov, Tenko; Marcoulides, George A

2016-04-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
A Historical Investigation into Item Formats of ACS Exams and Their Relationships to Science Practices

ERIC Educational Resources Information Center

Brandriet, Alexandra; Reed, Jessica J.; Holme, Thomas

2015-01-01

The release of the "NRC Framework for K-12 Science Education" and the "Next Generation Science Standards" has important implications for classroom teaching and assessment. Of particular interest is the implementation of science practices in the chemistry classroom, and the definitions established by the NRC makes these…
From Access to Success: Identity Contingencies & African-American Pathways to Science

ERIC Educational Resources Information Center

Brown, Bryan A.; Henderson, J. Bryan; Gray, Salina; Donovan, Brian; Sullivan, Shayna

2013-01-01

We conducted a mixed-methodological study of matriculation issues for African-American students in science. The project compares the experiences of students currently majoring in science (N = 304) with the experiences of those who have succeeded in earning science degrees (N = 307). Using a 57-item Likert scale questionnaire, participants were…
Collecting Artifacts

ERIC Educational Resources Information Center

Coffey, Natalie

2004-01-01

Fresh out of college, the author had only a handful of items worthy of displaying, which included some fossils she had collected in her paleontology class. She had binders filled with great science information, but kids want to see "real" science, not paper science. Then it came to her: she could fill the shelves with science artifacts with the…
Locally Dependent Linear Logistic Test Model with Person Covariates

ERIC Educational Resources Information Center

Ip, Edward H.; Smits, Dirk J. M.; De Boeck, Paul

2009-01-01

The article proposes a family of item-response models that allow the separate and independent specification of three orthogonal components: item attribute, person covariate, and local item dependence. Special interest lies in extending the linear logistic test model, which is commonly used to measure item attributes, to tests with embedded item…
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items

ERIC Educational Resources Information Center

Penfield, Randall D.

2006-01-01

This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?

ERIC Educational Resources Information Center

Jackson, Evelyn W.; And Others

1994-01-01

Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).

ERIC Educational Resources Information Center

Australian Council for Educational Research, Hawthorn.

This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

ERIC Educational Resources Information Center

Lee, Yi-Hsuan; Zhang, Jinming

2010-01-01

This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Electronics. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Davis, Diane, Ed.

This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…
Auto Mechanics. Criterion-Referenced Test (CRT) Item Bank.

ERIC Educational Resources Information Center

Tannehill, Dana, Ed.

This document contains 546 criterion-referenced multiple choice and true or false test items for a course in auto mechanics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 35 units covering the…
Developing a Strategy for Using Technology-Enhanced Items in Large-Scale Standardized Tests

ERIC Educational Resources Information Center

Bryant, William

2017-01-01

As large-scale standardized tests move from paper-based to computer-based delivery, opportunities arise for test developers to make use of items beyond traditional selected and constructed response types. Technology-enhanced items (TEIs) have the potential to provide advantages over conventional items, including broadening construct measurement,…
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

PubMed Central

2016-01-01

Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
Assessing cultural validity in standardized tests in stem education

NASA Astrophysics Data System (ADS)

Gassant, Lunes

This quantitative ex post facto study examined how race and gender, as elements of culture, influence the development of common misconceptions among STEM students. Primary data came from a standardized test: the Digital Logic Concept Inventory (DLCI) developed by Drs. Geoffrey L. Herman, Michael C. Louis, and Craig Zilles from the University of Illinois at Urbana-Champaign. The sample consisted of a cohort of 82 STEM students recruited from three universities in Northern Louisiana. Microsoft Excel and the Statistical Package for the Social Sciences (SPSS) were used for data computation. Two key concepts, several sub concepts, and 19 misconceptions were tested through 11 items in the DLCI. Statistical analyses based on both the Classical Test Theory (Spearman, 1904) and the Item Response Theory (Lord, 1952) yielded similar results: some misconceptions in the DLCI can reliably be predicted by the Race or the Gender of the test taker. The research is significant because it has shown that some misconceptions in a STEM discipline attracted students with similar ethnic backgrounds differently; thus, leading to the existence of some cultural bias in the standardized test. Therefore the study encourages further research in cultural validity in standardized tests. With culturally valid tests, it will be possible to increase the effectiveness of targeted teaching and learning strategies for STEM students from diverse ethnic backgrounds. To some extent, this dissertation has contributed to understanding, better, the gap between high enrollment rates and low graduation rates among African American students and also among other minority students in STEM disciplines.
Using psychological constructs from the MUSIC Model of Motivation to predict students' science identification and career goals: results from the U.S. and Iceland

NASA Astrophysics Data System (ADS)

Jones, Brett D.; Sahbaz, Sumeyra; Schram, Asta B.; Chittum, Jessica R.

2017-05-01

We investigated students' perceptions related to psychological constructs in their science classes and the influence of these perceptions on their science identification and science career goals. Participants included 575 middle school students from two countries (334 students in the U.S. and 241 students in Iceland). Students completed a self-report questionnaire that included items from several measures. We conducted correlational analyses, confirmatory factor analyses, and structural equation modelling to test our hypotheses. Students' class perceptions (i.e. empowerment, usefulness, success, interest, and caring) were significantly correlated with their science identification, which was correlated positively with their science career goals. Combining students' science class perceptions, science identification, and career goals into one model, we documented that the U.S. and Icelandic samples fit the data reasonably well. However, not all of the hypothesised paths were statistically significant. For example, only students' perceptions of usefulness (for the U.S. and Icelandic students) and success (for the U.S. students only) significantly predicted students' career goals in the full model. Theoretically, our findings are consistent with results from samples of university engineering students, yet different in some ways. Our results provide evidence for the theoretical relationships between students' perceptions of science classes and their career goals.
Reliability of the Client-Centeredness of Goal Setting (C-COGS) Scale in Acquired Brain Injury Rehabilitation.

PubMed

Doig, Emmah; Prescott, Sarah; Fleming, Jennifer; Cornwell, Petrea; Kuipers, Pim

2016-01-01

To examine the internal reliability and test-retest reliability of the Client-Centeredness of Goal Setting (C-COGS) scale. The C-COGS scale was administered to 42 participants with acquired brain injury after completion of multidisciplinary goal planning. Internal reliability of scale items was examined using item-partial total correlations and Cronbach's α coefficient. The scale was readministered within a 1-mo period to a subsample of 12 participants to examine test-retest reliability by calculating exact and close percentage agreement for each item. After examination of item-partial total correlations, test items were revised. The revised items demonstrated stronger internal consistency than the original items. Preliminary evaluation of test-retest reliability was fair, with an average exact percent agreement across all test items of 67%. Findings support the preliminary reliability of the C-COGS scale as a tool to evaluate and promote client-centered goal planning in brain injury rehabilitation. Copyright © 2016 by the American Occupational Therapy Association, Inc.
Item-Writing Guidelines for Physics

ERIC Educational Resources Information Center

Regan, Tom

2015-01-01

A teacher learning how to write test questions (test items) will almost certainly encounter item-writing guidelines--lists of item-writing do's and don'ts. Item-writing guidelines usually are presented as applicable across all assessment settings. Table I shows some guidelines that I believe to be generally applicable and two will be briefly…

Unidimensional Interpretations for Multidimensional Test Items

ERIC Educational Resources Information Center

Kahraman, Nilufer

2013-01-01

This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…
Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form

PubMed Central

Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.

2015-01-01

Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967
Repeated retrieval practice and item difficulty: does criterion learning eliminate item difficulty effects?

PubMed

Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A

2013-12-01

A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
Footprints of Fascination: Digital Traces of Public Engagement with Particle Physics on CERN's Social Media Platforms.

PubMed

Kahle, Kate; Sharon, Aviv J; Baram-Tsabari, Ayelet

2016-01-01

Although the scientific community increasingly recognizes that its communication with the public may shape civic engagement with science, few studies have characterized how this communication occurs online. Social media plays a growing role in this engagement, yet it is not known if or how different platforms support different types of engagement. This study sets out to explore how users engage with science communication items on different platforms of social media, and what are the characteristics of the items that tend to attract large numbers of user interactions. Here, user interactions with almost identical items on five of CERN's social media platforms were quantitatively compared over an eight-week period, including likes, comments, shares, click-throughs, and time spent on CERN's site. The most popular items were qualitatively analyzed for content features. Findings indicate that as audience size of a social media platform grows, the total rate of engagement with content tends to grow as well. However, per user, engagement tends to decline with audience size. Across all platforms, similar topics tend to consistently receive high engagement. In particular, awe-inspiring imagery tends to frequently attract high engagement across platforms, independent of newsworthiness. To our knowledge, this study provides the first cross-platform characterization of public engagement with science on social media. Findings, although focused on particle physics, have a multidisciplinary nature; they may serve to benchmark social media analytics for assessing science communication activities in various domains. Evidence-based suggestions for practitioners are also offered.
Footprints of Fascination: Digital Traces of Public Engagement with Particle Physics on CERN's Social Media Platforms

PubMed Central

Baram-Tsabari, Ayelet

2016-01-01

Although the scientific community increasingly recognizes that its communication with the public may shape civic engagement with science, few studies have characterized how this communication occurs online. Social media plays a growing role in this engagement, yet it is not known if or how different platforms support different types of engagement. This study sets out to explore how users engage with science communication items on different platforms of social media, and what are the characteristics of the items that tend to attract large numbers of user interactions. Here, user interactions with almost identical items on five of CERN's social media platforms were quantitatively compared over an eight-week period, including likes, comments, shares, click-throughs, and time spent on CERN's site. The most popular items were qualitatively analyzed for content features. Findings indicate that as audience size of a social media platform grows, the total rate of engagement with content tends to grow as well. However, per user, engagement tends to decline with audience size. Across all platforms, similar topics tend to consistently receive high engagement. In particular, awe-inspiring imagery tends to frequently attract high engagement across platforms, independent of newsworthiness. To our knowledge, this study provides the first cross-platform characterization of public engagement with science on social media. Findings, although focused on particle physics, have a multidisciplinary nature; they may serve to benchmark social media analytics for assessing science communication activities in various domains. Evidence-based suggestions for practitioners are also offered. PMID:27232498
Test Bias: An Objective Definition for Test Items.

ERIC Educational Resources Information Center

Durovic, Jerry J.

A test bias definition, applicable at the item-level of a test is presented. The definition conceptually equates test bias with measuring different things in different groups, and operationally equates test bias with a difference in item fit to the Rasch Model, greater than one, between groups. It is suggested that the proposed definition avoids…
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program

PubMed Central

2013-01-01

Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program.

PubMed

Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M

2013-03-04

Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Secondary science teachers' attitudes toward and beliefs about science reading and science textbooks

NASA Astrophysics Data System (ADS)

Yore, Larry D.

Science textbooks are dominant influences behind most secondary science instruction but little is known about teachers' approach to science reading. The purpose of this naturalistic study was to develop and validate a Science and Reading Questionnaire to assess secondary science teachers' attitudes toward science reading and their beliefs or informed opinions about science reading. A survey of 428 British Columbia secondary science teachers was conducted and 215 science teachers responded. Results on a 12-item Likert attitude scale indicated that teachers place high value on reading as an important strategy to promote learning in science and that they generally accept responsibility for teaching content reading skills to science students. Results on a 13-item Likert belief scale indicated that science teachers generally reject the text-driven model of reading, but they usually do not have well-formulated alternative models to guide their teaching practices. Teachers have intuitive beliefs about science reading that partially agree with many research findings, but their beliefs are fragmented and particularly sketchy in regard to the cognitive and metacognitive skills required by readers to learn from science texts. The findings for attitude, belief, and total scales were substantiated by further questions in the Science and Reading Questionnaire regarding classroom practice and by individual interviews and classroom observations of a 15-teacher subsample of the questionnaire respondents.
A motivational account of the undergraduate experience in science: brief measures of students' self-system appraisals, engagement in coursework, and identity as a scientist

NASA Astrophysics Data System (ADS)

Skinner, Ellen; Saxton, Emily; Currie, Cailin; Shusterman, Gwen

2017-11-01

As part of long-standing efforts to promote undergraduates' success in science, researchers have investigated the instructional strategies and motivational factors that promote student learning and persistence in science coursework and majors. This study aimed to create a set of brief measures that educators and researchers can use as tools to examine the undergraduate motivational experience in science classes. To identify key motivational processes, we drew on self-determination theory (SDT), which holds that students have fundamental needs - to feel competent, related, and autonomous - that fuel their intrinsic motivation. When educational experiences meet these needs, students engage more energetically and learn more, cumulatively contributing to a positive identity as a scientist. Based on information provided by 1013 students from 8 classes in biology, chemistry, and physics, we constructed conceptually focused and psychometrically sound survey measures of three sets of motivational factors: (1) students' appraisals of their own competence, autonomy, and relatedness; (2) the quality of students' behavioural and emotional engagement in academic work; and (3) students' emerging identities as scientists, including their science identity, purpose in science, and science career plans. Using an iterative confirmatory process, we tested short item sets for unidimensionality and internal consistency, and then cross-validated them. Tests of measurement invariance showed that scales were generally comparable across disciplines. Most importantly, scales and final course grades showed correlations consistent with predictions from SDT. These measures may provide a window on the student motivational experience for educators, researchers, and interventionists who aim to improve the quality of undergraduate science teaching and learning.
The Development and Validation of the Instructional Practices Log in Science: A Measure of K-5 Science Instruction

ERIC Educational Resources Information Center

Adams, Elizabeth L.; Carrier, Sarah J.; Minogue, James; Porter, Stephen R.; McEachin, Andrew; Walkowiak, Temple A.; Zulli, Rebecca A.

2017-01-01

The Instructional Practices Log in Science (IPL-S) is a daily teacher log developed for K-5 teachers to self-report their science instruction. The items on the IPL-S are grouped into scales measuring five dimensions of science instruction: "Low-level Sense-making," "High-level Sense-making," "Communication,"…
[Differential item functioning: a bibliometric analysis of journals published in Spanish].

PubMed

Guilera, Georgina; Gómez, Juana; Hidalgo, M Dolores

2006-11-01

Differential item functioning: a bibliometric analysis of journals published in Spanish. This study aims to provide an overview of scientific productivity with respect to articles published in Spanish on the issue of DIF. The documents included in the study were identified using the Psicodoc database, as well as the Science Citation Index and Social Science Citation Index from the Web of Science. The analyses carried out are focused mainly on presenting the frequencies and percentages of publications with respect to various bibliometric indicators. The results reveal that interest in the issue of DIF has increased, and that the universities are the most productive institutions. The majority of articles have been published in the journal Psicothema.
Comparison of student confidence and perceptions of biochemistry concepts using a team-based learning versus traditional lecture-based format.

PubMed

Gryka, Rebecca; Kiersma, Mary E; Frame, Tracy R; Cailor, Stephanie M; Chen, Aleda M H

To evaluate differences in student confidence and perceptions of biochemistry concepts using a team-based learning (TBL) format versus a traditional lecture-based format at two universities. Two pedagogies (TBL vs lecture-based) were utilized to deliver biochemistry concepts at two universities in a first-professional year, semester-long biochemistry course. A 21-item instrument was created and administered pre-post semester to assess changes in confidence in learning biochemistry concepts using Bandura's Social Cognitive Theory (eight items, 5-point, Likert-type) and changes in student perceptions of biochemistry utilizing the theory of planned behavior (TPB) domains (13 items, 7- point, Likert-type). Wilcoxon signed-rank tests were used to evaluate pre-post changes, and Mann Whitney U tests for differences between universities. All students (N=111) had more confidence in biochemistry concepts post-semester, but TBL students (N=53) were significantly more confident. TBL students also had greater agreement that they are expected to actively engage in science courses post-semester, according to the perceptions of biochemistry subscale. No other differences between lecture and TBL were observed post-semester. Students in a TBL course had greater gains in confidence. Since students often engage in tasks where they feel confident, TBL can be a useful pedagogy to promote student learning. Copyright © 2017 Elsevier Inc. All rights reserved.
Measuring Adolescent Science Motivation

ERIC Educational Resources Information Center

Schumm, Maximiliane F.; Bogner, Franz X.

2016-01-01

To monitor science motivation, 232 tenth graders of the college preparatory level ("Gymnasium") completed the Science Motivation Questionnaire II (SMQ-II). Additionally, personality data were collected using a 10-item version of the Big Five Inventory. A subsequent exploratory factor analysis based on the eigenvalue-greater-than-one…
78 FR 55299 - Agency Information Collection Activities: Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-10

... to address the effects of question design on survey estimates of public science knowledge and the...: Title: Experimentation with Factual Knowledge of Science Survey Items. OMB Approval Number: 3145-NEW.... 1862) authorizes the National Science foundation to ``initiate and support basic scientific research...
Estimating Total-test Scores from Partial Scores in a Matrix Sampling Design.

ERIC Educational Resources Information Center

Sachar, Jane; Suppes, Patrick

It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars

PubMed Central

Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel

2015-01-01

The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
Modeling Item-Level and Step-Level Invariance Effects in Polytomous Items Using the Partial Credit Model

ERIC Educational Resources Information Center

Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D.

2012-01-01

Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.

ERIC Educational Resources Information Center

Brutten, Sheila R.; And Others

A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing

ERIC Educational Resources Information Center

Yen, Yung-Chin; Ho, Rong-Guey; Liao, Wen-Wei; Chen, Li-Ju

2012-01-01

In a test, the testing score would be closer to examinee's actual ability when careless mistakes were corrected. In CAT, however, changing the answer of one item in CAT might cause the following items no longer appropriate for estimating the examinee's ability. These inappropriate items in a reviewable CAT might in turn introduce bias in ability…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.